I apologize for the long outage of the Patent Baristas web site.  For the past two weeks, the site has been unavailable — even to the point were I could not even log in to try to figure out why I was constantly receiving the message “Error establishing a database connection.”

The web site was being bombarded with requests from a Chinese search engine/malware site called sogou.com, my logs showed lots of requests and were filling rapidly. This robot is nasty and somehow sniffs internet traffic and tries to access it.  Not only that, it even intentionally ignores my robots.txt (which tells spiders what pages not to crawl) and directly crawls all pages explicitly disallowed.

These requests were holding the database connection open and eventually overwhelming the connection pool causing the WordPress database connection error.

In case you want all the gory details, for spiders that won’t abide by the robots.txt protocol, it’s safer to block them via a .htaccess and mod_rewrite.  In your .htaccess file, include the following code:

RewriteEngine on
Options +FollowSymlinks
RewriteBase /

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Sogou
RewriteRule ^.*$ – [F]

Here’s hoping this blockade helps.

[With thanks to WP Dude Neil Matthews in Newcastle,  England, for his assistance in squashing this bug!]

  Print This Post   Email This Post

One Comment

  1. There are a lot of nasty bugs out there. See:

    http://searchenginewatch.com/3641174

    http://dhrubraaj.wordpress.com/2010/08/05/prevent-specific-spiders-from-crawling-our-pages/

    I had these added to my blocked list while I was at it.

    Ed.