You will need to create a text file called
robots.txt and place it at the root level of your server; you can include
syntax in this file to tell robots that they are barred from
accessing all or certain parts of your server. Well-behaved robots that
adhere to the robots exclusion standard will search for this file upon
visiting your site.
Here's an example of what your robots.txt
could include:
User-agent: *
Disallow: /tmp
Disallow: /personal/topsecret
In the first line, the asterisk indicates
that these limitations are directed at all robots; you could also include
the names of robots here if you only wanted to allow or disallow specific
ones.
The second and third lines instruct robots
that all URLs on the site matching the pattern /tmp or /personal/topsecret
should not be visited.
To see how web sites use the robots.txt
file, point your browser at any top level site, for instance:
http://www.whitehouse.gov/robots.txt
http://www.sun.com/robots.txt
To create your own robots.txt file,
use a basic text editor (rather than a word processor) and follow the
examples that you find on other web sites.
There is also a META Tag to control access to
webpages at the individual file level. Not surprisingly it is called
the Robots META Tag.