Google’s John Mueller answers a question about using robots.txt to block special files, including .css and .htacess.
This topic was discussed in some detail in the latest edition of the Ask Google Webmasters video series on YouTube.
Here is the question that was submitted:
“Regarding robots.txt, should I ‘disallow: /*.css$’, ‘disallow: /php.ini’, or even ‘disallow: /.htaccess’?”
In response, Mueller says Google can’t stop site owners from disallowing those files. Although it’s certainly not recommended.
“No. I can’t disallow you from disallowing those files. But that sounds like a bad idea. You mention a few special cases so let’s take a look.”
In some cases blocking special files is simply redundant, although in other cases it could seriously impact Googlebot’s ability to crawl a site.
Here’s an explanation of what will happen when each type of special file is blocked.
Crawling CSS is absolutely critical as it allows Googlebot to properly render pages.
Site owners may feel it’s necessary to block CSS files so the files don’t get indexed on their own, but Mueller says that usually doesn’t happen.
Google needs the file regardless, so even if a CSS file ends up getting indexed it won’t do as much harm as blocking it would.
This is Mueller’s response:
“‘*.css’ would block all CSS files. We need to be able to access CSS files so that we can properly render your pages.
This is critical so that we can recognize when a page is mobile-friendly, for example.
CSS files generally won’t get indexed on their own, but we need to be able to crawl them.”
Using robots.txt to block php.ini isn’t necessary because it’s not a file that can be readily accessed anyway.
This file should be locked down, which prevents even Googlebot from accessing it. And that’s perfectly fine.
Blocking PHP is redundant, as Mueller explains:
“You also mentioned PHP.ini – this is a configuration file for PHP. In general, this file should be locked down, or in a special location so nobody can access it.
And if nobody can access it then that includes Googlebot too. So, again, no need to disallow crawling of that.”
Like PHP, .htaccess is a locked down file. That means it can’t be accessed externally, even by Googlebot.
It does not need to be disallowed because it can’t be crawled in the first place.
“Finally, you mentioned .htaccess. This is a special control file that cannot be accessed externally by default. Like other locked down files you don’t need to explicitly disallow it from crawling since it cannot be accessed at all.”
Mueller capped off the video with a few short words on how site owners should go about creating a robots.txt file.
Site owners tend to run into problems when they copy another site’s robots.txt file and use it as their own.
Mueller advises against that. Instead, think critically about which parts of your site you do not want to be crawled and only disavow those.
“My recommendation is to not just reuse someone else’s robots.txt file and assume it’ll work. Instead, think about which parts of your site you really don’t want to have crawled and just disallow crawling of those.”
Matt Southern has been the lead news writer at Search Engine Journal since 2013. With a degree in communications, Matt has an uncanny ability to make the most complex subject matter easy to understand. When he’s not ferociously following and covering the search industry, he’s busy writing SEO-friendly copy that converts.