thistlechaser: (Sleepy Ken)
[personal profile] thistlechaser
There is such a thing as too much of a good thing. In the last 13 days, Google's spider has been to my site (Catlove) 473 times, and has used 41.64 MB of bandwidth. The next most-visiting spider came 29 times and used 1.11 MB bandwidth.

As far as I can tell, robots.txt will only keep a spider out or let it in. I thought I recalled that you could restrict them to like "once a month" or something like that, but I can't find any reference to that now...

Can you tell I'm really bored?

Date: 2003-11-13 11:47 am (UTC)
From: [identity profile] alchemia.livejournal.com
Really, yikes! Google only uses about 200K each month on intertexius. I've it set though to only index the main page and not follow any links past that.

How much of your site do you want Google to archive? Just hte main page? Pages but not images? etc.? You can be very specific.

The robots exclusion protocol can be found here, it works for google and other search engines:
http://www.robotstxt.org/wc/exclusion.html

Google specific info is here: http://www.google.com/webmasters/faq.html

If you need further help beyond that, ask =)

Date: 2003-11-13 12:57 pm (UTC)
From: [identity profile] thistle-chaser.livejournal.com
Yep, I read through those two links before posting. Thanks though!

How much of your site do you want Google to archive?

I'm fine with them crawling it all, but 473 visits in 13 days? That's just seriously excessive...

Date: 2003-11-13 06:31 pm (UTC)
From: [identity profile] mousapelli.livejournal.com
thanks for the info and the link, i was having the same sort of issue but couldn't identify the problem. Stupid automated internet systems. I hope the robot.txt thing fixes it.

Profile

thistlechaser: (Default)
thistlechaser

July 2025

S M T W T F S
  1234 5
6 789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 25th, 2026 06:16 am
Powered by Dreamwidth Studios