You can specify which sections of your site you would like search
engines and web crawlers to index, and which sections they should
ignore. To do this, you create directives in a robots.txt file, and
place the robots.txt file in your public_html document root directory.
The directives used in a robots.txt file are straightforward and easy to understand. The most commonly used directives are User-agent, Disallow, and Crawl-delay. Here are some examples:
User-agent: * Disallow: /
User-agent: * Disallow:
(or just create an empty "/robots.txt" file, or don't use one at all)
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
User-agent: BadBot Disallow: /
User-agent: Google Disallow: User-agent: * Disallow: /
User-agent: * Disallow: /~joe/stuff/Alternatively you can explicitly disallow all disallowed pages:
User-agent: * Disallow: /~joe/junk.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html
The robots.txt website has compiled a list of user-agents by which you can specify above, to know more please visit their webpage here.
Well, first of all, the website might be private and you do not want the files to appear on search engines. Secondly – you might need to protect your site from unwanted bots, which may significantly increase your site’s server resource consumption by flooding it with their requests.