Robots.txt

9 Apr 2015 One-minute read Al Eardley
On-Prem
SharePoint

When SharePoint is used for public facing websites, there are a lot of files and locations that should not be crawled by Search Engines. Most Search Engines respect the rules defined in a special file called robots.txt to identify areas that should not be crawled.

The Search Engines expect to find a robots.txt file at the root of the site, e.g. https://blog.eardley.org.uk/robots.txt

When a robots.txt file is defined for SharePoint there are several locations that should be excluded as they implicitly require authentication to be accessed. An example for a SharEPoint robots.txt file is as follows:

User-Agent: *

Disallow: /_Layouts/

Disallow: /SiteAssets/

Disallow: /Lists/

Disallow: /_catalogs/

Disallow: /WorkflowTasks/

The content of the file is split into two sections:

User-Agent – This identifies particular browsers/engines that the rules apply to
Disallow – This defines the rules that the Search Engines read to identify the locations that should not be indexed

Further information regarding robots.txt can be found at http://www.robotstxt.org/orig.html

Comment on this post: