Contrary to popular belief, the search engine spiders sent out by the major search engines do not have to search everything on a site. You can actually technically keep a search engine spider away from a page by instructed it through a certain robots meta tag or a file not to come near the page.
Webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine’s database by using a robots meta tag. If for some reason you do not want a search engine spider to crawl a page you do have the means to do so.
When a search engine visits a site, the robots.txt located in the root folder is the first file crawled. The robots.txt file is then parsed, and only pages not disallowed will be crawled. However this is not always fool proof. Search engine spiders have a habit of going away from a page and then coming back and looking at the page a second time later. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wished crawled.
Pages that most...