Plain Google datacenters list, ban and Tor
Only those, returning 200 header:
Only those, returning 200 header:
Ever thought of using keywords that already gave you some traffic to promote site page instead of keywords it was originally designed for? I did. So what I needed was a way to get the SE terms that gave some traffic to a WP based site. Of course, the first thought was to get them from external statistics sources - Awstats, for instance. However, even though I found a way to make Awstats separate terms by search engine, I never managed to make changes to the configuration file (stupid hostgator support)… So I just made a WP plugin to add the terms SE separated into the DB and add those terms to the beginning of the post in the user-defined format. So you install it, check the needed format (or disable adding those lines at all, if you just need an insight into the keywords or plan to improve the plugin by adding XMLRPC method to download keywords from external script). Download, free, of course Read the rest of this entry »
I’ve recently made a wordpress plugin for generating really big sites, like 30k pages in some minutes, it works so that it first adds all the keywords post titles into the database with no post_content and then when a page is displayed it generates content. Actually it is using Google for making that so I just worried not to be banned if I let Googlebot request a lot of pages at once thus causing excessive parsing of SE. To say here is a little PHP crawler to request all pages within a domain, just for lulz
Read the rest of this entry »