LGF

more options

  

Advertisement

Reading links...

  

Link address:
Link title:
Description: 
Remaining:

Buggy Diggbot Breaks the Rules

Mon, Apr 28, 2008 at 12:36:30 pm PST

A robot from Digg.com has been rapidly running through everything at LGF, including images, with multiple hits per second. It’s doing this despite the following lines in our robots.txt file:

User-agent: *
Crawl-delay: 600

This rule is supposed to limit the amount of hits from all robots to no more than one every ten minutes, and the Digg crawler is blatantly ignoring it. From the files it’s requesting, it looks as if it’s out of control and misreading something.

Now blocked. The IP address of the ill-behaved bot: 64.191.203.34.

UPDATE at 4/28/08 1:47:15 pm:

Analyzed the logs some more and figured it out — there’s nothing nefarious going on. The crawler seems to have a bug; it is not correctly reading the BASE tag in our pages, and it was trying to find images in a nonexistent directory as a result. I fixed the problem by adding a couple of mod_rewrite rules to our htaccess file, and the Digg crawler is now unblocked.

(It was breaking the robots.txt rule, though, as it thrashed around trying to find files that didn’t exist.)

86 comments

  • Comments are open and unmoderated, and do not necessarily reflect the views of Little Green Footballs.
  • Obscene, abusive, silly, or annoying remarks may be deleted, but the fact that particular comments remain on the site in no way constitutes an endorsement of their views by Little Green Footballs.
  • Posts that contain phone numbers, addresses, or other personal information will also be deleted, as will posts that consist only of a variation on the word, "First!"
  • Comments that advocate violence will be cause for immediate banning with no appeal.
  • REMEMBER: posting comments at LGF is a privilege, not a right. Abuse that privilege, and your account will be blocked.

Hide comments | Jump to bottom

#1 chinesearithmetic 4/28/08 12:37:32 pm 1

Digg Hits Bottom, Digs

#2 Charlie Martel 4/28/08 12:37:37 pm 0

My God!

#3 Charlie Martel 4/28/08 12:38:02 pm 0

What does all this mean? Seriously, I'm not a techie

#4 Meremortal 4/28/08 12:38:30 pm 0

Bad bot, no doughnut.

#5 Bubbaman 4/28/08 12:38:54 pm 0

The bigger question is why?

#6 Charles 4/28/08 12:39:26 pm 0

re: #3 Charlie Martel

What does all this mean? Seriously, I'm not a techie

It looks like the crawler is trying to make a snapshot of the entire site.

#7 zmdavid 4/28/08 12:39:46 pm 0

re: #3 Charlie Martel

What does all this mean? Seriously, I'm not a techie


Digg is downloading large chunks of LGF, sapping resources and slowing us down.

#8 jcm 4/28/08 12:40:13 pm 2

Don't trust the digg robot.

#9 marwan's daughter 4/28/08 12:40:23 pm 0

Ah, the intellectual fountain known as Digg.

Where stories against radical Islam, the excesses of the left, and even Scientology are buried by bury brigades.

I never liked it.

#10 joncelli 4/28/08 12:41:05 pm 0

re: #6 Charles

Is this a weird sort of DOS attack or do they just want to grab the site and do something with it?

#11 bosforus 4/28/08 12:41:13 pm 0

Big Digg is ripping LGF off!

#12 galloping granny 4/28/08 12:41:15 pm 0

re: #6 Charles

It looks like the crawler is trying to make a snapshot of the entire site.

Strange. I wonder why. I hate robots that don't follow the rules.

#13 Thanos 4/28/08 12:41:27 pm 0

Sure you don't mean 10 minutes? (600 seconds = 10 minutes)

#14 winston06 4/28/08 12:41:44 pm 0

re: #6 Charles

why would they wanna do it?

#15 jcm 4/28/08 12:41:48 pm 0

re: #9 marwan's daughter

Ah, the intellectual fountain known as Digg.

Where stories against radical Islam, the excesses of the left, and even Scientology are buried by bury brigades.

I never liked it.

Well we do find out what is important to pale pasty residents of mama's basement.

#16 ted 4/28/08 12:42:00 pm 0

BLAMMO !

#17 winston06 4/28/08 12:42:10 pm 0

re: #5 Bubbaman

Digg.com is infested with leftie kidz.

#18 Lawrence Schmerel 4/28/08 12:42:22 pm 0

But, why?

#20 Charles 4/28/08 12:42:41 pm 0

re: #10 joncelli

Is this a weird sort of DOS attack or do they just want to grab the site and do something with it?

No, it's not an attack. I don't know what they're doing, but it's going through every file it can find and requesting nonexistent files as well. Acting a lot like a spam scraper, in fact.

#21 Kragar (Proud to be Kafir) 4/28/08 12:43:21 pm 0

I never liked Digg

#22 doppelganglander 4/28/08 12:44:11 pm 0

re: #14 winston06

why would they wanna do it?

Perhaps to cherry-pick at their leisure.

#23 zmdavid 4/28/08 12:45:25 pm 0

Theft is the sincerest form of flattery.

#24 ted 4/28/08 12:45:56 pm 0

Robots can be nasty.

[Link: www.youtube.com...]

#25 EC Marm 4/28/08 12:46:25 pm 1

What if that robot collected all of the intelligence of the great minds at lgf? It could take over the world.
"I don't think you want to post that, Dave"

#26 Lawrence Schmerel 4/28/08 12:46:26 pm 2

I have a copy of the entire internet downloaded on a 3.5 floppy disc.

#27 MI DB 4/28/08 12:46:49 pm 0

I Diggbot

#28 OldLineTexan 4/28/08 12:48:25 pm 1

The Three Laws:

A robot may not injure a liberal cause or, through inaction, allow a liberal cause to come to harm.
A robot must not obey orders given to it by right-wing blogs, except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

#29 Charles 4/28/08 12:48:28 pm 0

Looked through the logs some more and I don't think there's anything nefarious going on.

I think it's out of control; it seems to be confused by the mod_rewrite trick I described a few days ago, and isn't properly using the BASE tag.

The crawler is using the PEAR HTTP_Request class, and it may have a bug.

[deleted] 4/28/08 12:49:43 pm 0
#31 HelloDare 4/28/08 12:50:00 pm 0

Kick the bot in the nuts.

#32 bosforus 4/28/08 12:50:33 pm 0

re: #29 Charles

Danger! Danger!

#33 snowcrash 4/28/08 12:50:39 pm 0

It's not a bug, it's a feature.

#34 Charles 4/28/08 12:50:51 pm 0

Sorry, shmuli - no phone numbers allowed. If you want to link to a whois lookup that's OK.

#35 Lawrence Schmerel 4/28/08 12:50:52 pm 0

The internet is out of control! Run for your lives!

#36 Cartman 4/28/08 12:51:07 pm 0

re: #26 Lawrence Schmerel

I have a copy of the entire internet downloaded on a 3.5 floppy disc.

You sure you got it all? It took me two floppies.

#37 jcm 4/28/08 12:51:11 pm 0

OT

Flying Pig Moment!Ex-Pink Floyd Bassist Roger Waters Unleashes School Bus-Sized Pro-Obama Inflated Pig at Coachella.

#38 Kragar (Proud to be Kafir) 4/28/08 12:51:29 pm 0

re: #36 Cartman

You sure you got it all? It took me two floppies.

Got to zip it

#39 shmuli 4/28/08 12:51:54 pm 0

Charles --

Sorry. Will do it right next time.

Did not mean to make more work for you.

Shmuli

#40 Cartman 4/28/08 12:52:58 pm 0

re: #38 Kragar (Proud to be Kafir)

Got to zip it

Sheesh. I knew I forgot something.

#41 zmdavid 4/28/08 12:53:35 pm 0

re: #27 MI DB

I Diggbot

DB - Diggbot?

Michigan Diggbot?

#42 Lawrence Schmerel 4/28/08 12:53:53 pm 0

re: #36 Cartman

I haven't finished downloading littlegreenfootballs.com, yet.

#43 The Other Les 4/28/08 12:54:15 pm 0

re: #31 HelloDare

Kick the bot in the nuts.

Take him to Detroit!

[Link: www.youtube.com...]

#44 CyanSnowHawk 4/28/08 12:54:45 pm 0

re: #42 Lawrence Schmerel

I haven't finished downloading littlegreenfootballs.com, yet.

Does someone keep blocking your robot?

#45 Kragar (Proud to be Kafir) 4/28/08 12:54:50 pm 0

re: #37 jcm

OT

Flying Pig Moment!Ex-Pink Floyd Bassist Roger Waters Unleashes School Bus-Sized Pro-Obama Inflated Pig at Coachella.

Saw this report on it

the swine floated into the night sky. Waters said sadly and comically, "That's my pig."

Never to be seen again, hopefully BHO pulls the same act

#46 yochanan 4/28/08 12:55:06 pm 0

re: #37 jcm

OT

Flying Pig Moment!Ex-Pink Floyd Bassist Roger Waters Unleashes School Bus-Sized Pro-Obama Inflated Pig at Coachella.

short bus?

#47 Cartman 4/28/08 12:55:09 pm 0

re: #42 Lawrence Schmerel

I haven't finished downloading littlegreenfootballs.com, yet.

Bypass the Tag Storm, and it goes a lot quicker.

;)

#48 Lawrence Schmerel 4/28/08 12:55:39 pm 0

re: #44 CyanSnowHawk

Do you know something that I don't?

#49 Kulhwch 4/28/08 12:55:54 pm 0

re: #20 Charles

Acting a lot like a spam scraper, in fact.

That's what they get for buying their droids from Sand People.

}:)     [ ... so to speak ... ]

#50 taxfreekiller 4/28/08 12:55:54 pm -1

Most likely they want to put up a mirror site and lure lgf posters there
to get the e-mail address of posters here and do not nice things.

I, tfk say carefull of this.

#51 Kragar (Proud to be Kafir) 4/28/08 12:56:00 pm 0

re: #47 Cartman

Bypass the Tag Storm, and it goes a lot quicker.

;)

Plus, turn off any porn filters when you get to the throbbing memo

#52 taxfreekiller 4/28/08 12:56:39 pm 0

Charles,

re-direct these things to say the

NSA site.

#53 Lawrence Schmerel 4/28/08 12:58:05 pm 0

re: #47 Cartman

You can get really stuck in that Tag Storm.

#54 rabidsquirrel 4/28/08 12:58:22 pm 1

re: #28 OldLineTexan

The Three Laws:

A robot may not injure a liberal cause or, through inaction, allow a liberal cause to come to harm.
A robot must not obey orders given to it by right-wing blogs, except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

If only Asimov knew the horrible truth.

#55 Cartman 4/28/08 12:58:25 pm 0

re: #51 Kragar (Proud to be Kafir)

Plus, turn off any porn filters when you get to the throbbing memo

True dat. The floppy drive doesn't like the throbbing memo.

#56 RememberSekhmet? 4/28/08 12:59:01 pm 0

Hey guys, can you please knock it off?

#57 rabidsquirrel 4/28/08 1:00:10 pm 0

re: #55 Cartman

True dat. The floppy drive doesn't like the throbbing memo.

I almost feel compelled to make a 'hard disk' joke...

#58 Da_Beerfreak 4/28/08 1:00:21 pm 0

re: #29 Charles

Looked through the logs some more and I don't think there's anything nefarious going on.

I think it's out of control; it seems to be confused by the mod_rewrite trick I described a few days ago, and isn't properly using the BASE tag.

The crawler is using the PEAR HTTP_Request class, and it may have a bug.

So in other words, your custom code for the site has driven this poor Diggbot insane.
[insert evil laughter here]

#59 CyanSnowHawk 4/28/08 1:00:25 pm 0

re: #48 Lawrence Schmerel

Do you know something that I don't?

In a thread about a runaway robot trying to download all of LGF, and Charles blocking the IP of said robot, you make a joke about not being finished downloading LGF, and you miss that reference?

I'm glad I'm not the only one that does things like that.

#60 taxfreekiller 4/28/08 1:01:04 pm 0

Understand, lgf's is hurting a loon commie, one the msm and other power mad commies like Soros wants in the White House, its not
a game to these power hungry loons.

#61 Cygnus 4/28/08 1:02:52 pm 0

re: #19 bosforus

Never trust a robot.

Now, that is one weird sci-fi series. Never heard of it before.

#62 Cartman 4/28/08 1:03:27 pm 0

re: #58 Da_Beerfreak

So in other words, your custom code for the site has driven this poor Diggbot insane.
[insert evil laughter here]

Considering the bot's source, it was probably 3/4 of the way there already.

#63 Ben Hur 4/28/08 1:05:16 pm 0

IDF: Palestinian family killed by terrorist bomb

Wasn't Israel.

But the damage is already done.

#64 The Other Les 4/28/08 1:05:36 pm 0

re: #60 taxfreekiller

Understand, lgf's is hurting a loon commie, one the msm and other power mad commies like Soros wants in the White House, its not
a game to these power hungry loons.

I can't repeat and emphasize enough the fact that for this bunch:

POWER IS LIFE!

#65 Lawrence Schmerel 4/28/08 1:05:39 pm 0

re: #59 CyanSnowHawk

I have no idea what you are talking about. I am going back to work. I'm going to copy this sucker if I have to do it one page at a time.

#66 hayseed 4/28/08 1:06:10 pm 0

re: #61 Cygnus

Now, that is one weird sci-fi series. Never heard of it before.

I have an Angela Android

adult ears

#67 taxfreekiller 4/28/08 1:06:27 pm 0

Charles send to the address, this

[Link: www.youtube.com...]

#68 Silhouette 4/28/08 1:06:37 pm 0

re: #49 Kulhwch

That's what they get for buying their droids from Sand People.

Jawas, jawas, jawas, jawas!

Uh, did I just expose my inner-geek?

#69 jcm 4/28/08 1:06:38 pm 0

re: #63 Ben Hur

IDF: Palestinian family killed by terrorist bomb

Wasn't Israel.

But the damage is already done.

I told junior not to bring work home.

#70 Cartman 4/28/08 1:06:55 pm 1

Well, time to take a spin on the new Harley. Better get it in while I can. It's supposed to snow here tomorrow. Global warming my ass. Hrrumph.

#71 Fat Jolly Penguin 4/28/08 1:08:18 pm 0

re: #68 Silhouette

Jawas, jawas, jawas, jawas!

Uh, did I just expose my inner-geek?

Nah, you'd be doing that if you asked if the Diggbot speaks Bocce.

/

#72 Ben Hur 4/28/08 1:11:16 pm 0

re: #63 Ben Hur

IDF: Palestinian family killed by terrorist bomb

Wasn't Israel.

But the damage is already done.

It's amazing how fast YNET goes back and edits and changes Ali Waked's articles once the truth comes out.

This has happened more than just a few times.

He reports what his "souces" tell him, then it comes out that it was complete BS, and they quickly and quietly change the article online.

Is that normal?

#73 LeftJustAintRight 4/28/08 1:14:21 pm 0

Union Bots Never do more work than coded
Same with Union Workers
Charles
Unionize the bots and every 10 minutes they will need a 20 minute break
LOL

#74 Cap'n DOC 4/28/08 1:18:02 pm 0

re: #26 Lawrence Schmerel

Are you an Al Gore Foolower?

#75 Lawrence Schmerel 4/28/08 1:21:17 pm 0

re: #74 Cap'n DOC

You know, you have Al Gore to thank for the fact that you can use your computer to ask me that question.

#76 Charles 4/28/08 1:48:48 pm 0

See update above - nothing bad about the Diggbot, it has a bug. And it was ignoring robots.txt.

#77 Lawrence Schmerel 4/28/08 2:14:27 pm 0

re: #76 Charles

You sure know how to ruin a good comment thread.

#78 madjadbad 4/28/08 2:26:30 pm 0

Interesting. I think Digg maintains a mirror site of the submitted links because if the article becomes popular, the traffic deluge often causes the original link's server to have a heart attack.
It sounds like the mirroring code may be buggy.
I did a search at Digg for "littlegreenfootballs" and it looks like almost every story on LGF has been dugg at least once, but the ones with the most diggs in the last 2 days only have 32 diggs. None of them made the front page.
I don't know how many diggs it takes before they mirror the story.

#79 NomadOfNorad 4/28/08 2:30:36 pm 0

On a different note... I heard a coupla months back about a new rival to Digg, that did the same sorts of vote-for-a-site stuff that Digg does, but didn't have some of the flaws of Digg... but I couldn't remember the name of the place. So, I did a Google search for "digg rival" and it turns out there's more than one wanna-be Digg out there.

I think Mixx might have been the one I'd heard about, though, but there's also one called NewsPond, and another called Reddit, and it seems Netscape (now calling themselves Propeller) have come back as a Digg competeter, too. Even Yahoo! seem to be getting into the act with their My Web 2.0. And there's another thing called Drupal or Drigg or something of the sort, apparently.

Wow!

I suppose LGF should add at least one of these new guys to the set of buttons we can click on to call attention to specific articles like we do now with Digg and de.lic.ious and stuff. Problem is, where do you stop in adding these guys? (There are probably even more of them out there that I haven't found in that cursory search.)

#80 BingoBunny 4/28/08 2:39:38 pm 1

Damn robots.. next they'll want the vote.

#81 Spiny Norman 4/28/08 2:40:23 pm 0
(It was breaking the robots.txt rule, though, as it thrashed around trying to find files that didn’t exist.)

Sounds like in MiB when the Edgar-Bug is thrashing around smashing display cases in the jewelry store looking for the galaxy: Where is it?!?!

Seems funny to me...

#82 Miles 4/28/08 2:51:43 pm 0

Charles, you should read Michael Crichton's Prey, if you haven't, yet.

We are building our own demise...

#83 Mathew1977 4/28/08 3:03:23 pm 0

I've found Digg to be rife with robots: their users.

#84 NomadOfNorad 4/28/08 3:09:24 pm 0

re: #83 Mathew1977

Hehe!

#85 pittboy 4/28/08 4:59:12 pm 0

I hope it didn't scare the gerbils.

#86 Mathew1977 4/30/08 12:55:09 am 0

re: #84 NomadOfNorad

Glad you liked it.


This entry has been archived.
Comments are closed.

^ back to top ^

log in
Name:
Pass:

Register (closed) Forgot Your Password? My Account Re-send Confirmation (To log in, cookies must be enabled in your browser!)
You must have Javascript enabled to use the contact form.
Your email:

Subject:

Message:


Messages may be published in our weblog, unless you request otherwise.
Tech Note:
Using the Contact Form

now playing

Recently Played

Skiing through the revolving door of life.

Hosting Matters