Connect with us

SEO

What It Is & How It Works

Published

on

Canonicalization is the process that search engines use to determine the main version of a page. That is the page that will be indexed and shown to users. The chosen version is canonical, and ranking signals like links will consolidate to that page. This process is sometimes referred to as standardization or normalization.

According to Google Webmaster Trends Analyst Gary Illyes, ~60% of the internet is duplicate content.

Canonicalization is complex and often misunderstood. I don’t think most of the duplicates are nefarious. It’s mostly going to be technical issues that cause them. We’ll look at this more in a bit. I’m going to talk about how the canonicalization process works as well as:

A lot of different signals go into the canonicalization process. These include:

  • Duplicates
  • Canonical link elements
  • Sitemap URLs
  • Internal links
  • Redirects

Google looks at all the different signals and weighs them to determine what the canonical version should be. That’s the version of the page they will index and what they usually show to users.

A potential scenario when Google decides on the canonical based on internal links and the canonical URL.

A potential scenario when Google decides on the canonical based on internal links and the canonical URL.

Advertisement

Duplicates

With duplicate content, Google will pick a canonical version to index. All the eligible pages form a cluster of pages, and the signals that go to the pages in that cluster will consolidate at the chosen canonical. That canonical may even change over time.

How duplicate signals consolidateHow duplicate signals consolidate

Some SEOs believe there is a duplicate content penalty, but that’s not true. Generally, you’re going to have one version or another indexed. It may not be the version you want to be indexed, but it will be indexed and rank just as well as any other version of the same page.

Here are some examples of what can cause duplicate pages and sometimes canonicalization issues:

  • HTTP and HTTPS variants (e.g., http://www.example.com and https://www.example.com)
  • Non-www and www variants (e.g., http://example.com and http://www.example.com)
  • URLs with and without trailing slashes (e.g., https://example.com/page/ and https://example.com/page)
  • URLs with and without capital letters (e.g., https://example.com/page/ and https://example.com/Page/)
  • Default versions of the page such as index pages (e.g., https://www.example.com/, https://www.example.com/index.htm, https://www.example.com/index.html, https://www.example.com/index.php, https://www.example.com/default.htm, etc.)
  • Alternate versions of pages. This could include mobile versions (e.g., example.com and m.example.com), AMP versions (e.g., example.com/page and amp.example.com/page), print versions (e.g., example.com/page and example.com /page/print), alternate versions meant for other countries but containing the same content (e.g., example.com/en-us/, example.com/en-gb/, example.com/en-au/), or versions in a dev or staging site (e.g., dev.example.com).
  • URL parameters (e.g., example.com?parameter=whatever). These may exist because of tracking codes, faceted navigation, sorting content, session IDs, etc. There are some instances where parameters may change the page’s content so that it’s not a duplicate.
  • Other pages showing the full content. Google may choose the wrong canonical when another page displays the content in full. This may include the main blog page, paginated pages, tag pages, category pages, or feed pages.
  • Scraped or syndicated content. Content syndication best practices generally recommend having a canonical tag back to the original content or at least a link to the original content. That’s because the canonical chosen can be a completely different domain. They try to select the original source as the canonical, but in some cases, they choose the wrong page.

Most of these aren’t usually issues. As I mentioned, Google will usually choose one version or another as the canonical. There are a few exceptions to this.

  1. Sometimes with content syndication, the original source isn’t chosen as the canonical. This is a real problem. How would you feel if someone else started ranking for an article you wrote?
  2. Hreflang does not solve duplication on international sites. Google will generally try to swap to show the correct version, but it’s not guaranteed, and this setup often breaks. When this happens, users see pages from the wrong country. It’s best to avoid having the same content on multiple pages for international websites.
  3. With some JavaScript sites (typically app shell models), the initial code for the pages can look like other pages or even the code from other websites. Sometimes these pages get canonicalized to other pages on the same or even different websites.

I believe part of the problem with both hreflang and the JavaScript content is that Google may be running the duplicate detection via crawl algorithms that detect duplication patterns, again after just seeing the code, and yet again after rendering the pages.

Google’s render path marked up where I believe duplicate detection systems are run.Google’s render path marked up where I believe duplicate detection systems are run.

Google’s render path marked up where I believe duplicate detection systems are run.

Google’s render path marked up where I believe duplicate detection systems are run.

With the pages using hreflang, if they decide that the pages are duplicates without crawling them, they may not be able to swap them properly.

Before a page is even rendered, it may “look” like another page based on the HTML content. Google may choose the canonical based on this initial version and may not prioritize it for rendering because it’s already deemed a duplicate page. This usually resolves itself after rendering, but it can take some time to clear up.

Advertisement

Google has a couple of rules they generally follow when it comes to canonicalization of duplicates.

1. They prefer HTTPS pages over HTTP pages

They will generally index the HTTPS version, but there are a few issues or conflicting signals that may cause them to choose the HTTP version instead, such as:

  • Having an invalid security certificate
  • HTTPS page links to HTTP resources on the page (excludes images)
  • HTTPS redirecting to HTTP
  • HTTPS page having a rel=“canonical” link element pointing to the HTTP page

2. They prefer shorter URLs over longer URLs

This has been misconstrued over the years by SEOs to say that all your URLs should be shorter. But that’s not what was meant by the original statement. What Google said was that if you had, for instance, a clean short version of a URL and a longer version with parameters attached, they would generally choose the shorter version of the URL without the parameter as the canonical version.

Canonical link element

This is also commonly referred to as a canonical tag. It looks like this:

<link rel=”canonical” https://www.example.com />

The canonical tag is sometimes referred to as a hint because it’s just one canonicalization signal. Google ignores it if other signals are stronger.

If the canonical tag is respected, all signals like links will pass. However, if the canonical is ignored, no value is passed. The value isn’t lost; it stays with the original page or goes to whatever page Google chooses as the canonical.

Advertisement

A canonical link element can be implemented in two different ways. It can be in the <head> section or the HTTP header.

A fun anecdote. Google’s SEO Starter Guide used to be a PDF. They didn’t have a canonical tag set in the HTTP header, and people used to “steal” the listing with their own duplicate version.

Sometimes the <head> section of a page will end before it should. This is usually caused by a tag in the <head> not closed out properly. When that happens, a canonical tag may be put into the <body> section instead. If that happens, your canonical tag won’t be respected.

Invalid canonical tag located in the<body></noscript><img class=

Invalid canonical tag located in the <body> section

Sitemap URLs

The URLs you include in your sitemap are also a canonicalization signal. Most of the time, you only want to include URLs of pages that you want to be indexed.

There are some exceptions to this because sitemap URLs also help with crawling. After a website migration, you should create a sitemap that still lists the old pages, even though they aren’t canonical. This will help the redirects be processed faster. You’ll want to delete this sitemap after most of the redirects have been picked up and processed.

Advertisement

Internal links

It matters how you link to pages. Internal links are another canonicalization signal.

Generally, you should link to the version of a page you want to be canonical and update the links to any URLs that may have changed. However, there are exceptions to this, such as with faceted navigation. In some cases like this, what is best for users may trump what is best for SEO.

Redirects

There are several different types of redirects, and they’re all canonicalization signals. They pass PageRank and help determine which URL gets shown in Google’s index.

301s and 308s send signals forward to the new URL. 302s and some 307s send signals backwards to the redirected URL. If a 302 is left in place long enough or the URL it’s redirected to already exists, it may be treated as a 301 and send signals forward instead. It requires enough signals to flip the scale we saw earlier for canonicalization signals. As links build up, internal links are changed, sitemap URLs are updated, etc., more signals point to the new URL than the old URL, and the flip occurs.

At some point the scale flips for 302sAt some point the scale flips for 302s

At some point the scale flips for 302s

A 307 has two different cases. In cases where it’s a temporary redirect, it will be treated the same as a 302 and attempt to consolidate backward. When web servers require clients to only use HTTPS connections (HSTS policy), Google won’t see the 307 because it’s cached in the browser. The initial hit (without cache) will have a server response code that’s likely a 301 or a 302. But your browser will show you a 307 for subsequent requests.

Advertisement

There are also other types of redirects like those implemented with JavaScript. These are also canonicalization signals and pass the full value just like other redirects as long as they can be seen and processed by Google. They’re fine to use in most cases.

How to check the canonical

Your main source of truth for what Google chose as the canonical will be the URL Inspection tool in Google Search Console. Enter the URL, and it will show what the declared canonical is and what Google chose as the canonical.

The declared and Google-selected canonical via Google Search ConsoleThe declared and Google-selected canonical via Google Search Console

If you don’t have access to Google Search Console, the recommended way to check the version of a page Google has indexed is to paste the URL into Google. The top result is usually the canonical.

Similarly, if you check the cached version of a page in Google and a different page is shown, Google has selected a different version of the page.

Warning: Don’t use site: searches for checking canonicals. It shows what Google knows about, not necessarily what’s indexed or the selected canonical.

Advertisement

Within Site Audit, we show many issues related to canonicalization. Keep in mind that we’re flagging best practices in most cases. Because the canonical is a hint, Google and other search engines will have to choose which version of a page to index.

Canonicalization issues in Ahrefs' Site AuditCanonicalization issues in Ahrefs' Site Audit

Even if your website has lots of issues related to canonicalization, search engines may be able to figure out what version should be indexed and where they should consolidate signals. It may not create any real problems for them.

Fun fact. When running a Site Audit, we only count the canonical version of pages as crawl credits. Some other tools count every version of a page towards the credits. On many sites, this can eat multiple credits per page!

There’s a lot that can go wrong with canonicalization. Let’s look at some common mistakes.

Mistake #1: Blocking the canonicalized URL via robots.txt

Blocking a URL in robots.txt prevents Google from crawling it, meaning that they cannot see any canonical tags on that page. That, in turn, prevents them from transferring any “link equity” from the non-canonical to the canonical.

Unless you have a crawl budget issue, it’s probably better to let all the signals consolidate. Even if you’re going to block or noindex some versions, you still may want to check for versions with links that you should canonicalize instead. However, as Google tends to crawl non-canonical pages less over time, you may just want to wait.

Mistake #2: Setting the canonicalized URL to ‘noindex’

Never mix noindex and rel=canonical. They’re contradictory instructions.

Advertisement

As John Mueller states, Google will usually prioritize the canonical tag over the ‘noindex’ tag.

Mistake #3: Setting a 4XX HTTP status code for the canonicalized URL

Setting a 4XX HTTP status code for a canonicalized URL has the same effect as using the ‘noindex’ tag: Google will be unable to see the canonical tag and transfer “link equity” to the canonical version.

Mistake #4: Canonicalizing all paginated pages to the root page

Paginated pages should not be canonicalized to the first paginated page in the series. Instead, self-referencing canonicals should be used on all paginated pages.

Why? As Google’s John Mueller stated on Reddit, this is improper use of the rel=canonical.

The main thing to avoid, since this post is about canonicalization, is to use the rel=canonical on page 2 pointing to page 1. Page 2 isn’t equivalent to page 1, so the rel=canonical like that would be incorrect. 

John MuellerJohn Mueller

We have a guide on pagination for SEO and best practices if you’re interested.

Mistake #5: Don’t use the URL removal tool in Google Search Console for canonicalization.

This can remove all versions of a URL, effectively deindexing your page from search.

Mistake #6: Not keeping canonicalization signals consistent.

As we talked about earlier, there are many different canonicalization signals.

Advertisement

Having different signals suggest different canonicals means that you will be relying on Google to select a canonical for you. The more consistent signals you show them with your preferred version, the more likely it is that version will be the chosen canonical.

Mistake #7: Not using canonical tags with hreflang

Hreflang tags specify the language and geographical targeting of a webpage.

Google states that when using hreflang, you should “specify a canonical page in the same language, or the best possible substitute language if a canonical doesn’t exist for the same language.”

Mistake #8: Having multiple rel=canonical tags

Having multiple rel=canonical tags will usually cause Google to ignore them. In many cases, this happens because tags are inserted into a system at different points, such as by the CMS, the theme, and plugin(s). This is why many plugins have an overwrite option meant to ensure they are the only source for canonical tags.

Another area where this might be a problem is with canonicals added with JavaScript. If you have no canonical URL specified in the HTML response and then add a rel=canonical tag with JavaScript, it should be respected when Google renders the page. However, if you have a canonical specified in HTML and swap the preferred version with JavaScript, you send mixed signals to Google.

Mistake #9: Rel=canonical in the <body>

Rel=canonical should only appear in the <head> of a document. A canonical tag in the <body> section of a page will be ignored.

Advertisement

Where this can become a problem is with the parsing of a document. Even if the page’s source code has the rel=canonical tag in the correct place, many different things such as unclosed tags, JavaScript injected, or <iframes> in the <head> section can cause the <head> to end prematurely while rendering. In these cases, a canonical tag may be accidentally thrown into the <body> of a rendered page where it will not be respected.

Final thoughts

Many of the tools SEOs had for handling canonicalization have been taken away, such as the URL Parameters Tool and Preferred Domain setting in Google Search Console. However, there are still plenty of other signals to help Google choose a canonical.

If you have questions, message me on Twitter.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Google March 2024 Core Update Officially Completed A Week Ago

Published

on

By

Graphic depicting the Google logo with colorful segments on a blue circuit board background, accompanied by the text "Google March 2024 Core Update.

Google has officially completed its March 2024 Core Update, ending over a month of ranking volatility across the web.

However, Google didn’t confirm the rollout’s conclusion on its data anomaly page until April 26—a whole week after the update was completed on April 19.

Many in the SEO community had been speculating for days about whether the turbulent update had wrapped up.

The delayed transparency exemplifies Google’s communication issues with publishers and the need for clarity during core updates

Google March 2024 Core Update Timeline & Status

First announced on March 5, the core algorithm update is complete as of April 19. It took 45 days to complete.

Advertisement

Unlike more routine core refreshes, Google warned this one was more complex.

Google’s documentation reads:

“As this is a complex update, the rollout may take up to a month. It’s likely there will be more fluctuations in rankings than with a regular core update, as different systems get fully updated and reinforce each other.”

The aftershocks were tangible, with some websites reporting losses of over 60% of their organic search traffic, according to data from industry observers.

The ripple effects also led to the deindexing of hundreds of sites that were allegedly violating Google’s guidelines.

Addressing Manipulation Attempts

In its official guidance, Google highlighted the criteria it looks for when targeting link spam and manipulation attempts:

  • Creating “low-value content” purely to garner manipulative links and inflate rankings.
  • Links intended to boost sites’ rankings artificially, including manipulative outgoing links.
  • The “repurposing” of expired domains with radically different content to game search visibility.

The updated guidelines warn:

“Any links that are intended to manipulate rankings in Google Search results may be considered link spam. This includes any behavior that manipulates links to your site or outgoing links from your site.”

John Mueller, a Search Advocate at Google, responded to the turbulence by advising publishers not to make rash changes while the core update was ongoing.

Advertisement

However, he suggested sites could proactively fix issues like unnatural paid links.

Mueller stated on Reddit:

“If you have noticed things that are worth improving on your site, I’d go ahead and get things done. The idea is not to make changes just for search engines, right? Your users will be happy if you can make things better even if search engines haven’t updated their view of your site yet.”

Emphasizing Quality Over Links

The core update made notable changes to how Google ranks websites.

Most significantly, Google reduced the importance of links in determining a website’s ranking.

In contrast to the description of links as “an important factor in determining relevancy,” Google’s updated spam policies stripped away the “important” designation, simply calling links “a factor.”

This change aligns with Google’s Gary Illyes’ statements that links aren’t among the top three most influential ranking signals.

Advertisement

Instead, Google is giving more weight to quality, credibility, and substantive content.

Consequently, long-running campaigns favoring low-quality link acquisition and keyword optimizations have been demoted.

With the update complete, SEOs and publishers are left to audit their strategies and websites to ensure alignment with Google’s new perspective on ranking.

Core Update Feedback

Google has opened a ranking feedback form related to this core update.

You can use this form until May 31 to provide feedback to Google’s Search team about any issues noticed after the core update.

While the feedback provided won’t be used to make changes for specific queries or websites, Google says it may help inform general improvements to its search ranking systems for future updates.

Advertisement

Google also updated its help documentation on “Debugging drops in Google Search traffic” to help people understand ranking changes after a core update.


Featured Image: Rohit-Tripathi/Shutterstock

FAQ

After the update, what steps should websites take to align with Google’s new ranking criteria?

After Google’s March 2024 Core Update, websites should:

  • Improve the quality, trustworthiness, and depth of their website content.
  • Stop heavily focusing on getting as many links as possible and prioritize relevant, high-quality links instead.
  • Fix any shady or spam-like SEO tactics on their sites.
  • Carefully review their SEO strategies to ensure they follow Google’s new guidelines.

Source link

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google Declares It The “Gemini Era” As Revenue Grows 15%

Published

on

By

A person holding a smartphone displaying the Google Gemini Era logo, with a blurred background of stock market charts.

Alphabet Inc., Google’s parent company, announced its first quarter 2024 financial results today.

While Google reported double-digit growth in key revenue areas, the focus was on its AI developments, dubbed the “Gemini era” by CEO Sundar Pichai.

The Numbers: 15% Revenue Growth, Operating Margins Expand

Alphabet reported Q1 revenues of $80.5 billion, a 15% increase year-over-year, exceeding Wall Street’s projections.

Net income was $23.7 billion, with diluted earnings per share of $1.89. Operating margins expanded to 32%, up from 25% in the prior year.

Ruth Porat, Alphabet’s President and CFO, stated:

Advertisement

“Our strong financial results reflect revenue strength across the company and ongoing efforts to durably reengineer our cost base.”

Google’s core advertising units, such as Search and YouTube, drove growth. Google advertising revenues hit $61.7 billion for the quarter.

The Cloud division also maintained momentum, with revenues of $9.6 billion, up 28% year-over-year.

Pichai highlighted that YouTube and Cloud are expected to exit 2024 at a combined $100 billion annual revenue run rate.

Generative AI Integration in Search

Google experimented with AI-powered features in Search Labs before recently introducing AI overviews into the main search results page.

Regarding the gradual rollout, Pichai states:

“We are being measured in how we do this, focusing on areas where gen AI can improve the Search experience, while also prioritizing traffic to websites and merchants.”

Pichai reports that Google’s generative AI features have answered over a billion queries already:

Advertisement

“We’ve already served billions of queries with our generative AI features. It’s enabling people to access new information, to ask questions in new ways, and to ask more complex questions.”

Google reports increased Search usage and user satisfaction among those interacting with the new AI overview results.

The company also highlighted its “Circle to Search” feature on Android, which allows users to circle objects on their screen or in videos to get instant AI-powered answers via Google Lens.

Reorganizing For The “Gemini Era”

As part of the AI roadmap, Alphabet is consolidating all teams building AI models under the Google DeepMind umbrella.

Pichai revealed that, through hardware and software improvements, the company has reduced machine costs associated with its generative AI search results by 80% over the past year.

He states:

“Our data centers are some of the most high-performing, secure, reliable and efficient in the world. We’ve developed new AI models and algorithms that are more than one hundred times more efficient than they were 18 months ago.

How Will Google Make Money With AI?

Alphabet sees opportunities to monetize AI through its advertising products, Cloud offerings, and subscription services.

Advertisement

Google is integrating Gemini into ad products like Performance Max. The company’s Cloud division is bringing “the best of Google AI” to enterprise customers worldwide.

Google One, the company’s subscription service, surpassed 100 million paid subscribers in Q1 and introduced a new premium plan featuring advanced generative AI capabilities powered by Gemini models.

Future Outlook

Pichai outlined six key advantages positioning Alphabet to lead the “next wave of AI innovation”:

  1. Research leadership in AI breakthroughs like the multimodal Gemini model
  2. Robust AI infrastructure and custom TPU chips
  3. Integrating generative AI into Search to enhance the user experience
  4. A global product footprint reaching billions
  5. Streamlined teams and improved execution velocity
  6. Multiple revenue streams to monetize AI through advertising and cloud

With upcoming events like Google I/O and Google Marketing Live, the company is expected to share further updates on its AI initiatives and product roadmap.


Featured Image: Sergei Elagin/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

brightonSEO Live Blog

Published

on

brightonSEO Live Blog

Hello everyone. It’s April again, so I’m back in Brighton for another two days of sun, sea, and SEO!

Being the introvert I am, my idea of fun isn’t hanging around our booth all day explaining we’ve run out of t-shirts (seriously, you need to be fast if you want swag!). So I decided to do something useful and live-blog the event instead.

Follow below for talk takeaways and (very) mildly humorous commentary. 

Advertisement

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS