Dont let crawl

Search engines are major source of internet traffic these days. If you have started a new blog or site, you might expect search engines index your pages correctly.

One of the big mistake while setting up the website/blog is you start everything on the live server and let search engines crawl your site before you have some real content on the live site.

Let me explain you with my own example.

I wanted to create a web blog using Wordpress (open source content management system). I just downloaded it and uploaded on the server. I installed it and i was so happy to see everything worked without flaw.

There were some default contents/posts/pages created while i installed wordpress.  I forget to remove/modify them and went to sleep. Next day i started to put some real contents on the blog but i was too late. Google already indexed my pages (those dummy test pages created by wordpress).

See below sample search result from Google.


wp-search.png

These default dummy contents dont make sense to any  site and search engines already indexed them before you have a time to remove them. Because of these nonsense contents search engines can rank your site in a negative way. Sometimes it may take a month to remove these dummy links from the search engines. So its always good idea to not let any search engines crawl your site until you are ready.

I dont mean that you cant start building your site on the live server. There is a way not to let search engines index your site even crawler reachers your site.

robots.txt is a simple file but its uses is a lot.

robots.txt file can be used to block the crawlers from reaching your entire site or portion of your site.

To block the entire site put the following entry on the robots.txt file which must be placed on the web root folder. Most content managment systems like joomla, wordpress creates these files while installation process.

To prevent robots from crawling your entire site, place the following robots.txt file in your web server root:

User-agent: *
Disallow: /

Bottom line is if you are creating a site on the live server from the start, your first job is to create a robots.txt file that blocks all search engine spiders(robots).

After you finish building make sure you allow search engines robots to index your site.

blog comments powered by Disqus