PHPost Grabber is a PHP script for automatic "reposting" of posts/articles/content (news gathering). Automatically filling the site with news is a priority for those who want to make their website interesting to the user.
Regular filling of a site by actual content manually for many can turn out to be an unbearable task taking a lot of time! After all, even manually copying the text of one post from another site, clearing it from CSS-styles of design, its subsequent preparation/formatting for publication on its website, copying, optimizing, uploading to the server and inserting images/video clips into the article - taking a lot of time.
Automatic parsing of news from other sites collected with the help of PHP-grabber (PHP-parser) will help get rid of the need to manually fill the site with content and timely diversify the site a fresh news.
We can selled a grabber of articles on PHP for you, to which we gave the name PHPost Grabber. Well, and who knows the programming skills on PHP, he can try to write the parser posts himself using third-party PHP-packages with open source.
In content grabber the PHPost Grabber we used such third-party PHP-packages with open source as:
- adodb/adodb-php
- ashtokalo/php-translit
- bower-asset/jquery
- ezyang/htmlpurifier
- fxp/composer-asset-plugin
- jbroadway/urlify
- masterminds/html5
- npm-asset/bootstrap
- phpmailer/phpmailer
Management/update by third-party PHP packages is performed centrally with the help of the Composer package manager.
What PHPost Grabber can do?
Here are the basic features:
- Grabber is independent of the files of any engine and all the manipulations (processing/insertion/updating/deletion) with content are performed independently.
- Support for HTML5 content.
- Ability to use CURL + HTTP Proxy or SOCKS Proxy.
Forgery string HTTP_USER_AGENT. - As a source of links on target posts, a grabber can use RSS and Atom news feeds, or an XML sitemap ('Sitemap').
- The ability to regulate the number of posts being processed at once time.
- Search for a post on the attributes of the block.
- A separate search for the introductory text, in case of its placement in a separate block from the rest of the post.
- Configurable cleaning rules for HTMLPurifier (to remove CSS styles and unwanted HTML elements).
- Configurable cleanup rules for DOMDocument removeChild() (to remove blocks with multiple nested structure of other elements that are difficult to cut with preg_replace()).
- Configurable rules for clearing a post from a piece of a string found using stripos(), substr, and str_replace().
- Configurable rules for clearing a post from a piece of a string using str_replace().
- Configurable rules for processing a post using regular expressions preg_replace().
- Configurable rules for deleting/replacing unwanted words.
- Separation of the post into the introductory and the rest of the text, a configurable number of paragraphs in the introductory text.
- Several positions for placing a miniature in the introductory text.
- The ability to specify your HTML-code miniature in case of its absence in the post.
- Images are grabbed including those that are not delivered by direct links like "http://example.com/148594535414/resize/300x-1/quality/80/format/jpg" or "http://example.com/getimg.php ? Imgid = xxx ".
- Optimization of the stolen images using "jpegoptim" and "optipng".
- Processing (changing the width/height, color depth, etc.) of the gobbled images using "convert" or "mogrify" from the ImageMagick package. Not to be confused with the PHP extension "imagick"!!!
- MD5-hashing of image names or saving their original name.
- The presence of two modes of operation 'test' and 'prod', in the 'test' mode are created test HTML post files instead of inserting them into the database.
PHPost Grabber can be used for simultaneous content parsing from several sites. The robbery occurs in fully automatic mode without the need for user intervention.
Of course, that in some cases, the user may need to make some edits to the already edited post, such as changing the hint to the image, dividing the paragraph of the text into several if it seems large, adjust/supplement the layout of the tables, etc.
What are the system requirements for PHPost Grabber?
For a full-fledged PHPost Grabber, you will need:
- PHP: >= 5.6.0 or higher + permission to execute
exec()
andsystem()
functions in case the optimization of images is activated; 128 MB of RAM.
CMS: Joomla! >= 3.5.6
Optional: ImageMagick - if "post_img_convert" and/or "post_img_mogrify" is activated. Not to be confused with the PHP extension "imagick"!!!
At this point, the article grabber can automatically fill content sites created on the CMS Joomla! >= 3.5.6, but at the request of workers it can be "sharpened" and under the earlier versions of CMS Joomla!. As noted above, the grabber works independently of the files of any engine and performs all the manipulations (processing/insert/update/deletion) with the content on its own - it all depends on the MySQL table structure of one version of the Joomla! CMS.
In the future, other CMS engines support such as Wordpress, MODX, etc. can be implemented.
PHPost Grabber and Security
PHPost Grabber is written in accordance with the basic standard of writing PHP code PSR-1, all file operations performed by the grabber are subjected to thorough testing, and the application logic is aimed at getting maximum efficiency using a minimum of PHP code. PHP parser can work in the "open_basedir" environment.
Despite the fact that when we created PHPost Grabber we used some PHP-packages with open source code, all the main functionality of the grabber itself was implemented by us independently, and third-party PHP-packages only slightly complement it.
PHPost Grabber is a commercial software and is distributed with a closed source code because we can protect our work and our right to earn our own bread.
However, evil people who have the skills of hacking closed software can try to crack the protection of PHP code, which of course violates the terms, so to speak, the license agreement, without getting a full picture because of the renaming of all functions, constants and variables, and also complete absence comments to the code - but then it's a separate song. In short, who needs the source code PHP grabber, take and "extract" :)
Note that according to the results of the scan of our grabber on VirusTotal (https://www.virustotal.com/), some antiviruses can erroneously identify PHPost Grabber's closed sections of code as malicious software, for example (Detection index: 2/55):
Antivirus | Result | Update date |
---|---|---|
ESET-NOD32 | PHP/Kryptik.AE | 20170410 |
Fortinet | PHP/Kryptik.AE!tr | 20170410 |
However, we with full confidence declare about the complete absence in our grabber of any deliberate vulnerabilities or malicious sections of code, and the false operation of some antiviruses can only be explained by their biased attitude to using a combination of PHP-functions "urldecode()
" and "Eval()
" or "eval(base64_decode(...))
".
How much PHPost Grabber cost?
The base cost of a grabber without any restriction on the time of its use, the domain name or IP-address of the site on which the parser will work = 100 $.
As a bonus to the grabber added its free installation and configuration, as well technical support within one year from the date of its acquisition.