SEO

500 Response on Robots.txt Fetch Can Impact Rich Results

Published

4 years ago

March 18, 2022

500 Response on Robots.txt Fetch Can Impact Rich Results

Google’s John Mueller received feedback about a bug in how Search Console validates rich results. Google will drop images from rich results because of an error in how a CDN that hosts the images handles a request for a non-existent robots.txt. The bug that was discovered was in how search console and Google’s rich results test will fail to alert the publisher of the error and subsequently give the structured data a successful validation.

A bug in the context of programming is when a software program behaves in an unexpected manner. A bug isn’t always a problem in the coding but as in this case, it could be a failure to anticipate an issue which in turn leads to unintended results, like this one.

The publisher asking the question tried to use Google’s tools to diagnose the reason why their rich results were disappearing and was surprised to find that they were of no use for this particular error.

While this issue was affecting the recipe rich results image preview in Google’s recipe rich results, this problem could also be an issue for other situations as well.

So it’s good to be aware of this problem as it might surface in other ways.

Recipe Rich Results Image Previews Disappeared

The person asking the question provided a background of what happened.

He related what happened:

“We ran into a bit of a tiger trap, I would say, in terms of rich recipe results.
We have hundreds of thousands of recipes which are indexed and there’s lots of traffic coming through from the recipe gallery.
And then… over a period of time it stopped.
And all of the meta data checked out and Google search Console was saying …this is all rich recipe content, it’s all good, it can be shown.
We finally noticed that in the preview, when you preview the result, the image was missing.
Advertisement

And it seems that there was a change at Google and that if a robots.txt was required in order for images to be retrieved, then nothing we could see in the tools was actually saying anything was invalid.
And so it’s a bit awkward right, when you check something to say “is this a valid rich recipe result?” and it says yea, it’s great, it’s absolutely great, we’ve got all the metadata.
And you check all the URLs and all the images are right, but it turns out behind the scenes, there was a new requirement that you have a robots.txt.”

John Mueller asked:

“How do you mean that you had to have a robots.txt?”

The person asking the question responded:

“What we found is, if you requested the robots.txt from our CDN, it gave you like a 500.
When we put a robots.txt there, immediately the previews started appearing correctly.
Advertisement

And that involves crawling and putting it onto a static site, I think.
So we operationally, we found adding that robots.txt did the job.”

John Mueller nodded his head and said:

“Yeah, okay.
So from our point of view, it’s not that a robots.txt file is required. But it has to have a proper result code.
So if you don’t have on, it should return 404.
If you do have one, then we can obviously read that.
Advertisement

But if you return a server error for the robots.txt file, then our systems will assume that maybe there is an issue with the server and we won’t crawl.
And that’s kind of something that’s been like that since the beginning.
But these kinds of issues where especially when you are on a CDN and it’s on a separate hostname, sometimes that’s really hard to spot.
And I imaging the rich results test, at least as far as I know, it focuses on the content that is on the HTML page.
So the JSON-LD markup that you have there, it probably doesn’t check to see if the images are actually fetchable.
And then if they can’t be fetched then, of course, we can’t use them in the carousel, too.
Advertisement

So that might be something that we need to figure out how to highlight better.”

500 Error Response for CDN Robots.txt Can Cause Issues

This is one of those show stopping SEO problems that are hard to diagnose but can cause a lot of negative issues as the person asking the question noted.

Normally a crawl for a robots.txt that is non-existent should result in a server response code of 404, which means that the robots.txt does not exist.

So if the request for a robots.txt file is generating a 500 response code then that’s an indication that something on the server or the CMS is misconfigured.

The short term solution is to upload a robots.txt file.

But it might be a good idea to dive into the CMS or server to check what the underlying issue is.

500 Response Code for a Robots.txt Fetch

The negative consequences for the recipes rich results preview because of a CDN that returns a 500 error response might be a rare issue.

A 500 server error response code sometimes happens when there is something unexpected or missing in the code and the server responds by ending the code processing and throwing the 500 response code.

For example, if you edit a PHP file and forget to indicate the end of a section of code then that might cause the server to give up processing the code and throw a 500 response.

Whatever the reason for the error response when Google tried to fetch the robots.txt, this is a good issue to keep in mind for that rare situation when it happens to you.