Connect with us

NEWS

How an SEO Fixed a Weird Crawled Currently Not Indexed Issue via @sejournal, @martinibuster

Published

on

A technical SEO published a case study of how he solved a curious Crawled Currently Not Indexed problem on his site. While the solution he found might not be universal to others experiencing this problem, his method for identifying the problem and solving it presents a useful walkthrough for solving technical SEO problems.

What happened to his site indexing was really weird. But his solution was straightforward and makes sense.

I discovered a description of this problem on a tweet by Adam Gent (@Adoubleagent)

Advertisement

Continue Reading Below

Crawled – Currently Not Indexed

There are many anecdotal reports of Crawled Currently Not Indexed on Facebook, Twitter and even in John Mueller’s Office-hours hangouts.

In a recent Office-hours hangout someone asked why Google Search Console (GSC) was showing Crawled Not Indexed but when you click through they turn out to be indexed. John Mueller answered that it’s just a lag between reports.

And in another Office-hours hangout John Mueller pointed out that it’s entirely normal for a site to have many page not be indexed.

He noted:

Advertisement

“…if you have a smaller site and you’re seeing a significant part of your pages are not being indexed, then I would take a step back and try to reconsider the overall quality of the website and not focus so much on technical issues for those pages.

The other thing to keep in mind with regards to indexing, is it’s completely normal that we don’t index everything off of the website.

And over time, when you get to like 200 pages on your website and we index 180 of them, then that percentage gets a little bit smaller.”

Advertisement

Continue Reading Below

While both of those are good reasons to explain why the Crawled Not Indexed issue is happening to some people, that is not the reason Adam Gent discovered.

Adam Gent discovered an entirely different problem that appeared to be an algorithm issue at Google itself. There was nothing wrong with the site itself, the problem was with Google’s indexing.

Advertisement

Why Crawled – Currently Not Indexed

Adam reviewed the GSC Index Coverage report and discovered that Google was crawling and indexing his feeds as if they were HTML pages.

He took random words from those pages and did a site: search with those words and discovered that the feed page content was indeed indexed.

To make matters worse, Google had apparently canonicalized the content on the RSS feed over the actual web page, accounting for why the real web pages were crawled but not indexed.

The RSS feed Was Generated by WordPress

An odd thing about this case is that when you look at the feed page it renders like a web page and not how an XML file usually renders.

Screenshot of Cache of RSS Feed

Screenshot of a cached RSS page

Screenshot of a cached RSS page

I might be wrong but that doesn’t look like a normal RSS feed. It looks like an HTML page.

Advertisement

Advertisement

Continue Reading Below

Although the underlying code really is XML that’s not  how most feeds normally look.

Could that have played a role in why Google chose to canonicalize the feed?

It’s hard to understand how that could happen because there are so many signals like internal linking that under usual circumstances would cause Google to favor the HTML pages as canonical.

How Adam Fixed the Problem

After Adam figured out what happened he removed those WordPress generated feed pages, submitted the feed URLs for a crawl and then 404’d the pages.

After those pages were dropped from the index he next submitted the correct URLs to Google and within a few days the problem was fixed.

Advertisement

Advertisement

Continue Reading Below

What Caused the Problem?

Adam wrote that the problem appears to be on Google’s side.

I asked around and someone told me that apparently a few years ago Google started indexing feeds but that he thought this problem had been fixed.

I’m not an expert on XML but it seems unusual that the feed resembles an HTML page instead of the normal XML layout that shows up without HTML styling.

The feed doesn’t look normal so it seems like that whatever is making it look like that might be an underlying cause.

Advertisement

Regardless, if you’re having Crawled Currently Not Indexed problems, this is one more thing to check in case it’s also happening to you.

Advertisement

Continue Reading Below

Citation

Read the original post that walks through solving the problem:

A Curious Case of Canonicalization

Searchenginejournal.com

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address