Robots.txt

When SharePoint is used for public facing websites, there are a lot of files and locations that should not be crawled by Search Engines.  Most Search Engines respect the rules defined in a special file called robots.txt to identify areas that should not be crawled.

The Search Engines expect to find a robots.txt file at the root of the site, e.g. http://blog.eardley.org.uk/robots.txt

When a robots.txt file is defined for SharePoint there are several locations that should be excluded as they implicitly require authentication to be accessed.  An example for a SharEPoint robots.txt file is as follows:

User-Agent: *

Disallow: /_Layouts/

Disallow: /SiteAssets/

Disallow: /Lists/

Disallow: /_catalogs/

Disallow: /WorkflowTasks/

The content of the file is split into two sections:

  • User-Agent – This identifies particular browsers/engines that the rules apply to
  • Disallow – This defines the rules that the Search Engines read to identify the locations that should not be indexed

Further information regarding robots.txt can be found at http://www.robotstxt.org/orig.html

Leave a Reply

Your email address will not be published.