The Robots.txt File
The robots.txt file is a simple (optional) text file that contains directives for compliant search engines. It basically tells search engines what they can and what they can't index. When a search engine visits your site it looks for the robots.txt file in the root of the site, and if present it reads the directives, and if not present it carries on and indexes your site. You should note that you can only have one robots.txt file per site and it should be in the root directory i.e. for my site the correct location is: www.build-your-website.co.uk/robots.txt and not: www.build-your-website.co.uk/newsletter/robots.txt What it can DoUsing simple directives you can:
Why You Should Use It.
FormatYou should note that unless explicitly disallowed search engines are allowed to crawl your site/pages. The file consists of a series of record entry with each entry being delimited by a new line. Each record consists of a User agent directive followed by a series ( 1 or more) disallow directives. Examples: This single record entry stops all search engines from indexing the site:
This two record entry stops all the search engine called badrobot from indexing the site, and all search engines from indexing the directories called apps and private:
You should not that the sitemap directive is independent of user agent and so no user agent directive is required. The sitemap directive can appear anywhere in the robots.txt file.
Robots.txt File Creation and CheckingYou can create a robots.txt file using any text editor like notepad. However if you are uneasy about it there are a number of free online tools that will create the file for you and others that will check the syntax to make sure there are no errors.
You should also Use Google Webmaster Tools (Webmaster tools overview) for checking the robots.txt file will perform as expected. This I feel is a must because an incorrectly configured robots.txt file could be disastrous to your site.
Resources:
|
|||||
|