Buggy Diggbot Breaks the Rules

Charles Johnsonfollow me on twitter
Mon Apr 28, 2008 at 12:36 pm PDT • Views: 339

A robot from Digg.com has been rapidly running through everything at LGF, including images, with multiple hits per second. It’s doing this despite the following lines in our robots.txt file:

User-agent: *
Crawl-delay: 600

This rule is supposed to limit the amount of hits from all robots to no more than one every ten minutes, and the Digg crawler is blatantly ignoring it. From the files it’s requesting, it looks as if it’s out of control and misreading something.

Now blocked. The IP address of the ill-behaved bot: 64.191.203.34.

UPDATE at 4/28/08 1:47:15 pm:

Analyzed the logs some more and figured it out — there’s nothing nefarious going on. The crawler seems to have a bug; it is not correctly reading the BASE tag in our pages, and it was trying to find images in a nonexistent directory as a result. I fixed the problem by adding a couple of mod_rewrite rules to our htaccess file, and the Digg crawler is now unblocked.

(It was breaking the robots.txt rule, though, as it thrashed around trying to find files that didn’t exist.)

Advertisement

86 comments

^ back to top ^

Name:

Pass:

Register Forgot Your Password? Re-send Confirmation (To log in, cookies must be enabled in your browser!)

Turn off ads by subscribing!
For about 33 cents a day, our subscription option turns off all advertisements at LGF!
Read more...


► LGF Headlines

  • Loading...

► Tweeted Articles

  • Loading...

► Tweeted Pages

  • Loading...

► Top 10 Comments

  • Loading...

► Bottom Comments

  • Loading...

► Recent Comments

  • Loading...

► Tools/Info

► Tag Cloud

► Contact

You must have Javascript enabled to use the contact form.
Your email:

Subject:

Message:


Messages may be published in our weblog, unless you request otherwise.
Tech Note:
Using the Contact Form

More Partners

Compare Electricity Prices in your area. Texas Electricity is deregulated; you have the right to choose Texas Electric Rates from among many Texas Electric Companies.

Strangely filling.

TwitterFacebook
LGF Pages
Recent Pages

researchok
'I Was Looking Forward to a Quiet Old Age': Instead, Etta Shiber, Helped Smuggle Stranded Allied Soldiers To Freedom
4 hours, 44 minutes ago
Views: 61 • Comments: 0
Tweets: 1 • Rating: 0

Daniel Ballard
Late Afternoon Light-Kalanchoe
12 hours, 25 minutes ago
Views: 100 • Comments: 0
Tweets: 0 • Rating: 4

MikeySDCA
Colin Powell Endorsed Same-Sex Marriage Once It Was Safe, More Evidence He's Hardly a Great Leader.
12 hours, 28 minutes ago
Views: 130 • Comments: 1
Tweets: 0 • Rating: 1

Eclectic Infidel
City College of San Francisco Budget Update
13 hours, 17 minutes ago
Views: 120 • Comments: 0
Tweets: 0 • Rating: 0

Michael McBacon
Kansas governor signs 'Shariah bill' to ban Islamic law
17 hours, 48 minutes ago
Views: 231 • Comments: 6
Tweets: 0 • Rating: 5

Aigle
National Geographic Traveler Veers Off Track
1 day, 17 hours ago
Views: 452 • Comments: 7
Tweets: 0 • Rating: -5

MichaelJ
Apple TV Slated to Debut in December?
1 day, 19 hours ago
Views: 227 • Comments: 0
Tweets: 0 • Rating: 1

Ascher
Israeli Who Saved Turk on Everest: You Never Abandon a Friend - Israel News, Ynetnews
1 day, 20 hours ago
Views: 299 • Comments: 1
Tweets: 0 • Rating: 3

Haywood Jabloeme
The Harrassment of Patterico & Its Roots in Left-Wing Activism
1 day, 20 hours ago
Views: 521 • Comments: 2
Tweets: 0 • Rating: 4

Curt
Brian Banks: (Video) Falsely accused of rape speaks out
1 day, 22 hours ago
Views: 273 • Comments: 2
Tweets: 0 • Rating: 5

 Frank says:

Help! I'm a rock!