All day I’ve been watching the online visitor count suddenly jump way up, but without the referrals that would indicate a link from a high volume site. It’s happened four or five times, and the cause in each case has been a web crawler trying to load every page at ...
A robot from Digg.com has been rapidly running through everything at LGF, including images, with multiple hits per second. It’s doing this despite the following lines in our robots.txt file: User-agent: * Crawl-delay: 600 This rule is supposed to limit the amount of hits from all robots to no more than one every ...
Let’s check the PHP error log, shall we, and see what kind of spambots we’ve caught in our trap? When I installed our new spambot blocking code yesterday, I made sure to log all email script accesses that didn’t pass the token verification procedure. This lets me collect a list of ...
LGF, Internet, Blogosphere, Technology, Spam, Bots, captcha, jQuery, Javascript, Security
Here’s an update to yesterday’s report on the spambot infestation at LGF, attacking our contact form and “email an article” form: both forms have been available and active since yesterday afternoon, and not one spam email has gotten through since I installed the token-based method (using the jQuery Javascript library) ...
LGF, Internet, Blogosphere, Technology, Spam, Bots, captcha, jQuery, Javascript, Security
Oh brother. This morning a spambot of some kind finally got past the rather weak Javascript obfuscation I was using to hide the address of our contact form script, and my Inbox was filled with hundreds of porn/gambling spam emails, sent directly through the script using proxy IP addresses of ...
LGF, Internet, Blogosphere, Technology, Spam, Bots, captcha
LGF was lousy with web bots this morning, using “zombie” machines compromised by viruses, crawling around the site like crazy, probably looking for email addresses to add to spam lists. They’re not finding any, of course, but they’re running up our bandwidth for no reason, so Stinky Beaumont and I ...
LGF, Technical Info, Blogosphere, Traffic, Statistics, Web Crawlers, Bots
One of the nice things about having your web server logs stored in a database is that you can easily see where the traffic is coming from, on a real-time basis. For example, by running this query: SELECT ip, COUNT(*) AS count, referrer, useragent FROM `log` WHERE created >= ‘2007-05-28 ...
LGF, Technical Info, Blogosphere, Traffic, Statistics, Web Crawlers, Bots