When a Robot visits a Web site, it looks for a robots.txt file in the root directory.
If the file exists, this is analysed for entries such as:
User-agent: *
Disallow: /
These entries tell the robot not to visit parts of the site.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
The example above tells all robots not to visit the /cgi-bin/ and /tmp/ directories.
Note: A separate "Disallow" line is required for every URL prefix that is to be excluded. (For example, you cannot say "Disallow: /cgi-bin/ /tmp/").
You may not have blank lines in a record, as they are used to delimit multiple records.
Regular expression are not supported in either the user-agent or disallow lines. (For example, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif".)
The '*' in the user-agent field is a special value meaning "any robot".
Everything not explicitly disallowed may be retrieved by the robot.
Here follow some examples:
To exclude all robots from the entire server:
User-agent: *
Disallow: /
To allow all robots complete access to the entire server:
User-agent: *
Disallow:
To exclude all robots from part of the server:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/
To exclude a single robot:
User-agent: BadBot
Disallow: /
To allow a single robot:
User-agent: WebCrawler
Disallow:
User-agent: *
Disallow: /
To exclude a single file:
User-agent: *
Disallow: /dir/private.html