What 's SEO News

Saturday, February 3, 2007

Blocking robots

Webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a robots meta tag.

When a search engine visits a site, the robots.txt located in the root folder is the first file crawled. The robots.txt file is then parsed, and only pages not disallowed will be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled.

Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches.

No comments:

SEO Chat Forum