This advice was provided on Reddit in response to a thread in r/TechSEO where someone was looking for help with hreflang.
”I have an international site that I’m managing for a client that has pages being indexed for different languages. The problem is that some pages are still in english (ex: example.com/ru still has english content).
Since the X-default is english, there’s no need to have these other URLs, yet they are unfortunately being created and indexed as new content gets added, with the only difference being the URL.
Reading Google’s guidelines (if I understood correctly, which I may not have), they suggest blocking these URLs in the robots.txt file, but should these pages that are already indexed be removed, redirected, canonicalised, or noindexed? I’m stumped, so any help is appreciated.”
John Mueller’s Response
Mueller says the pages that have already been created should definitely not be blocked on disallowed in robots.txt. If they’re disallowed from crawling Google won’t be able to canonicalize them at all.
When hreflang is thrown into the mix things can get overly complicated, Mueller explains:
“It’s easy to dig into endless pits of complexity with hreflang. “Let’s create all languages! Let’s make pages for all countries! What if someone in Japan wants to read it in Swahili? Let’s make even more pages!” My guess is most of these “pages created because you can” get very little traffic, add very little value, and they add a significant overhead (crawling, indexing, canonicalization, ranking, maintenance, hreflang, structured data, etc.).”
Ideally, site owners should limit the amount of multi-language pages to those that are necessary for the site to achieve its goals.
“My recommendation would be first to limit the number of pages you create to those that are absolutely critical & valuable — maybe that already cuts the pages you’re thinking about. Think big here; if you’re talking about individual pages within a medium-sized site, it’s probably a non-issue. On the other hand, if you’re considering copying your whole site into 20 languages x 10 countries, that’s something else.”
When it comes to hreflang in particular, Mueller recommends focusing on pages that are receiving wrong-language traffic.
“Past that, for hreflang, I’d focus first on pages where you’re seeing wrong-language traffic — often these are pages that get a lot of global, branded queries, where it’s hard to determine which language content they want. A search for “google” can match a lot of language pages, hreflang can help to differentiate. On the other hand, a search for “search engine” is pretty clear & matches pages where you write about “search engine” already, so pages like that don’t need as much help being language-targeted.”
Lastly, Mueller notes that it can be difficult to determine how to balance the “just do it everywhere” approach with the “save effort by thinking” approach.
Matt Southern has been the lead news writer at Search Engine Journal since 2013. With a degree in communications, Matt has an uncanny ability to make the most complex subject matter easy to understand. When he’s not ferociously following and covering the search industry, he’s busy writing SEO-friendly copy that converts.