March 2015

Saturday 21 March 2015

Know robots.txt to Allow or Disallow search engines to crawl your website

Unknown 9:46 pm Allow or Disallow search engines , googlebot , how to crawl your website , robots , robots.txt , search engines , what is robots.txt No comments :

Robots.txt is a text file saved in website root folder and it is used for direction to search engines which files or folders permitted to crawl and which is restricted, in this post I will explain to you, best practices to make a robots.txt file and some of its commands to allow and disallow search engine crawlers to view. All the search engines follow your robots.txt file direction and allowed to index pages allowed by your robots.txt file.

Know robots.txt to Allow or Disallow search engines to crawl your website

Search engine come on your website and check your robots.txt first and then goto allowed directories pages never visit any restricted area.

You can use following syntax to allow search engines to crawl your whole website:

Syntax to allow:

User-agent: *
Allow: /

Now, if you want that search engines will not crawl your website and not index your website links , you can use below syntax fot that:

Syntax to disallow:

User-agent: *
Disallow: /

To Disallow particular directory use below syntax, With this command, you can disallow given folders path for all search engines :

User-agent: *
Disallow: /admin/
Disallow: /includes/

Disallow: /private/

To disallow particular file:

User-agent: *
Disallow: /includes/config.php

With above command, search engines forced to ignore config.php file to index.

Robots meta: <meta name="robots" content="noindex">

You can also disallow robots to index files using meta tags in your website. As we all know that meta is machine parsable not displayed on pages so this meta give instruction to search engine robots.

Disallow / allow particular search engine bot to crawl:

Robots.txt file allow you to give crawling rights to your favorite search engine bots and disallow others by search engine bot name complete and up to date search engines bot list available here.

User-agent: Googlebot
Disallow: /

Using this command you are disallowing Googlebot to crawl and index your website.

I hope you like this post on robots.txt file its very useful for your website and please don’t forget to give us your feedback in comment box and do share with your friends.

Subscribe to: Posts ( Atom )

SeoAx

Saturday 21 March 2015

Know robots.txt to Allow or Disallow search engines to crawl your website

Popular Posts

Archive

Get latest updates on your email.

Total Pageviews

Google SEO starter guide

Tags

Contact Form