Connect with us

SEO

At Least 66.5% of Links to Sites in the Last 9 Years Are Dead (Ahrefs Study on Link Rot)

Published

on

At Least 66.5% of Links to Sites in the Last 9 Years Are Dead (Ahrefs Study on Link Rot)

The web is constantly changing, and pages get removed or redirected. This makes links to these pages go to a broken page or possibly a page that’s not like the original. This phenomenon is called link rot.

Since January 2013, 66.5% of the links pointing to the 2,062,173 websites we sampled have rotted. We found another 6.45% with temporary errors. We don’t know if they’re still there or not.

This is even more complicated when it comes to SEO. Another 1.55% have other issues that prevent the links from being counted for the purposes of ranking.

That means a total of 74.5% of the links in our study are considered lost, with at least 66.5% being rotted.

Often, the links that no longer work are important. Check out this example of a website that was referenced in a U.S. Supreme Court case. Someone bought the domain and used it to make a statement.

Image describing that a page referenced in a supreme court case has been removed

In a previous study of legal journals and citations from 2014, 70% of the links within the journals and 50% of the URLs from U.S. Supreme Court decisions did not contain the originally cited material.

Another study from 2012 found that 30% of social media links were dead within two years.

Most of the previous studies are fairly small and contain older parts of the web. I assume a lot more of the older web is already gone, if not most of it. For example, most sites stopped using extensions like .html on URLs many years ago in favor of clean URLs. Most sites have also moved from HTTP to HTTPs.

Considering the above, we decided to do the largest link rot study ever. And it’s one of the only ones that cover the more recent version of the web.

Let’s dig into the data.

About the data

Ahrefs has been crawling the web since 2010. But for the purpose of this study, we’re only looking at the data from January 2013.

You can use the Backlinks report in Ahrefs’ Site Explorer to check the data for your own site. For Ahrefs, 26.9 million out of 174.3 million links have been lost. Just compare the numbers with the “Lost” filter applied vs. the numbers with the “All” filter applied.

Gif showing how to check for lost backlinks in Ahrefs

There are a few cases we tag as lost that we don’t count as link rot. I’ll cover that below.

As I mentioned in the intro, at least 66.5% of links to the sampled websites have rotted in the last nine years.

The web is complex and messy, and some things change faster than others. I wanted to see how many sites have link rot—and what percentage of their links experience link rot. This is the distribution for the percentage of link rot by domain across the dataset.

Histogram showing the link rot percentage that occurs by number of domains

There are a lot of small sites that don’t have much link rot. If we take out the smallest sites and only look at those with more than 10 live links, you’ll see that larger sites seem to have quite a bit of link rot.

Histogram showing the link rot percentage that occurs by number of domains, filtered to greater than 10 live links

As I mentioned in the intro, the number of links we consider lost when it comes to SEO is even higher—percentage-wise, it’s 74.5%. I also wanted to see the distribution for these across the dataset.

Histogram showing lost link percentage by domain

There are a lot of small sites that don’t have many lost links. If we take out the smallest sites and only look at those with more than 10 live links, you’ll see that larger sites seem to have lost quite a lot of their links.

Histogram showing lost link percentage by domain, filtered to greater than 10 live links

Links can be lost for many reasons. We classify lost links in different ways at Ahrefs. Here are the most common reasons that links are lost:

  • Dropped (47.7%)
  • Link removed (34.2%)
  • Crawl error (6.45%)
  • 301/302 (5.99%)
  • Not found (4.11%)
  • Not canonical (0.82%)
  • Noindex (0.73%)
  • Broken redirect (0%)

Pie chart showing the main reasons links are lost

Let’s look at each of those and why they happen.

47.7% of links are from dropped pages

These pages are removed from our index for various reasons.

Example of link dropped

Pages may be dropped because they can’t be crawled or indexed. In some cases, a domain may not exist anymore.

34.2% of links are removed

In this case, the pages still exist; they just no longer link to you.

Example of link removed

It could be that someone removed the link during a content refresh, replaced your link with a different one, or removed the link due to company policies. Another possibility is that a competitor decided to no longer link to you.

6.45% of lost links are from crawl errors

When we encounter an error while trying to crawl a page, it will be put into this bucket.

Link lost due to crawl error

If the page is accessible when it’s crawled again and the link is still there, it will be counted as live. If the page continues to “error,” we may drop it from the index.

We chose to not count crawl errors in the total for link rot. It’s likely that a portion of these links no longer exists, but others still do.

5.99% of links are lost due to redirected pages

The page containing the link has been redirected somewhere else.

Link lost due to 301 redirect

Pages change locations for all kinds of reasons. Commonly, this is the result of some kind of website migration.

4.11% of links are pages that are not found

In this case, the linking page has been deleted. The content, including the link, is missing.

Page not found

Occasionally, these pages may become live again or be redirected; in such situations, they will be added back or placed in the redirect bucket.

0.82% of links are lost because the page they were on is no longer canonical

The canonical specified by the page has changed.

Page not canonical anymore

The linking page has a “rel=canonical” tag to some other location. It could be a change from HTTP to HTTPs or some kind of standardization involving trailing slashes or parameters. This is usually nothing to be worried about. The page is simply changing how it wants to be indexed. These links have just shifted locations, going from one page to another.

0.73% of links are lost because their pages are marked “noindex”

The linking page is marked “noindex,” so we don’t count the links from it. 

Page marked as noindex

We did not count pages marked as noindex in the numbers for link rot. The link technically exists, but the page it’s on won’t be found in search engines and won’t pass any value.

A small number of links are lost due to broken redirects

In this case, we saw multiple redirects in a chain before. Now one of those redirects is broken. The link is, thus, kind of disconnected from the target.

Redirect broken because destination changed

This happens if:

  • The redirect chain is broken – If any of the pages in the redirect chain fails to respond, it gets reported as a lost link.
  • The redirect no longer exists (or is changed) – Let’s say you had a link from Site A → Site B, but the link was first redirected through one or more other URLs (e.g., Site A → Site C → Site B). If the linking site swapped this link out so that it linked directly (rather than going through a redirect chain), it would be reported as a lost link. The same applies if the final URL of the redirect is changed to redirect elsewhere.

What can you do about link rot?

A lot of the links you obtain may be lost over time. One way you can possibly get some of them back is with link reclamation.

In many cases, your old URLs have links from other websites. If they’re not redirected to the current pages, then those links are lost and no longer count for your pages. It’s not too late to do these redirects, and you can quickly reclaim any lost value. Think of this as the fastest link building you will ever do.

Here’s how to find those opportunities:

I usually sort this by “Referring domains.”

Best by links report filtered to 404 status code to show redirect opportunities

You can even use link rot to your advantage. Broken link building is a tactic that involves finding resources in your niche that are no longer live, then reaching out to site owners and letting them know about a resource you have that can replace the broken link.

Want to know how to do this for your site? Our head of content, Joshua Hardwick, has you covered with a process-oriented guide to broken link building.

Another way to help with link rot is to fix broken links on your own website. These are easily identified in the Site Audit Links report. Just remove the links or update the reference to a relevant page that exists.

Broken internal links

You may also want to fix broken links from your site that point to other sites. I have trouble arguing for this for SEO and, generally, will deem it as a website health and maintenance task that is of pretty low priority.

However, you can argue that clicking these links is bad for user experience. Accordingly, you can prioritize the links that are more often clicked.

The list of broken links to external pages can also be found in the Links report. If you see zero broken external links as I do, it’s probably because you didn’t enable “Check HTTP status of external links” in your Site Audit crawl settings.

Site Audit settings need to have "Check HTTP status of external links" turned on

Final thoughts

Some companies and technologies have tried to help with link rot. Many of these solutions don’t really solve the problem of broken links or a changing web. Instead, they rely on archiving what was on the web so it can still be seen. For example, the Internet Archive has a Chrome extension that will show archives of pages if they’re broken.

Similarly, the CDN Cloudflare has an Always Online option that will first look for its own archived copy of a page that’s offline. But if that doesn’t exist, it will pull the most recent version from the Internet Archive.

If you use Brave browser, a broken page will have a message that lets you check for an archived version at archive.org.

The Law Library of Congress implemented an external archiving solution for the problem of link and reference rot in its legal research reports.

As always, message me on Twitter if you have any questions.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Google Search Leak: Conflicting Signals, Unanswered Questions

Published

on

By

Google Search Leak: Conflicting Signals, Unanswered Questions

An apparent leak of Google Search API documentation has sparked intense debate within the SEO community, with some claiming it proves Google’s dishonesty and others urging caution in interpreting the information.

As the industry grapples with the allegations, a balanced examination of Google’s statements and the perspectives of SEO experts is crucial to understanding the whole picture.

Leaked Documents Vs. Google’s Public Statements

Over the years, Google has consistently maintained that specific ranking signals, such as click data and user engagement metrics, aren’t used directly in its search algorithms.

In public statements and interviews, Google representatives have emphasized the importance of relevance, quality, and user experience while denying the use of specific metrics like click-through rates or bounce rates as ranking-related factors.

However, the leaked API documentation appears to contradict these statements.

It contains references to features like “goodClicks,” “badClicks,” “lastLongestClicks,” impressions, and unicorn clicks, tied to systems called Navboost and Glue, which Google VP Pandu Nayak confirmed in DOJ testimony are parts of Google’s ranking systems.

The documentation also alleges that Google calculates several metrics using Chrome browser data on individual pages and entire domains, suggesting the full clickstream of Chrome users is being leveraged to influence search rankings.

This contradicts past Google statements that Chrome data isn’t used for organic searches.

The Leak’s Origins & Authenticity

Erfan Azimi, CEO of digital marketing agency EA Eagle Digital, alleges he obtained the documents and shared them with Rand Fishkin and Mike King.

Azimi claims to have spoken with ex-Google Search employees who confirmed the authenticity of the information but declined to go on record due to the situation’s sensitivity.

While the leak’s origins remain somewhat ambiguous, several ex-Googlers who reviewed the documents have stated they appear legitimate.

Fishkin states:

“A critical next step in the process was verifying the authenticity of the API Content Warehouse documents. So, I reached out to some ex-Googler friends, shared the leaked docs, and asked for their thoughts.”

Three ex-Googlers responded, with one stating, “It has all the hallmarks of an internal Google API.”

However, without direct confirmation from Google, the authenticity of the leaked information is still debatable. Google has not yet publicly commented on the leak.

It’s important to note that, according to Fishkin’s article, none of the ex-Googlers confirmed that the leaked data was from Google Search. Only that it appears to have originated from within Google.

Industry Perspectives & Analysis

Many in the SEO community have long suspected that Google’s public statements don’t tell the whole story. The leaked API documentation has only fueled these suspicions.

Fishkin and King argue that if the information is accurate, it could have significant implications for SEO strategies and website search optimization.

Key takeaways from their analysis include:

  • Navboost and the use of clicks, CTR, long vs. Short clicks, and user data from Chrome appear to be among Google’s most powerful ranking signals.
  • Google employs safelists for sensitive topics like COVID-19, elections, and travel to control what sites appear.
  • Google uses Quality Rater feedback and ratings in its ranking systems, not just as a training set.
  • Click data influences how Google weights links for ranking purposes.
  • Classic ranking factors like PageRank and anchor text are losing influence compared to more user-centric signals.
  • Building a brand and generating search demand is more critical than ever for SEO success.

However, just because something is mentioned in API documentation doesn’t mean it’s being used to rank search results.

Other industry experts urge caution when interpreting the leaked documents.

They point out that Google may use the information for testing purposes or apply it only to specific search verticals rather than use it as active ranking signals.

There are also open questions about how much weight these signals carry compared to other ranking factors. The leak doesn’t provide the full context or algorithm details.

Unanswered Questions & Future Implications

As the SEO community continues to analyze the leaked documents, many questions still need to be answered.

Without official confirmation from Google, the authenticity and context of the information are still a matter of debate.

Key open questions include:

  • How much of this documented data is actively used to rank search results?
  • What is the relative weighting and importance of these signals compared to other ranking factors?
  • How have Google’s systems and use of this data evolved?
  • Will Google change its public messaging and be more transparent about using behavioral data?

As the debate surrounding the leak continues, it’s wise to approach the information with a balanced, objective mindset.

Unquestioningly accepting the leak as gospel truth or completely dismissing it are both shortsighted reactions. The reality likely lies somewhere in between.

Potential Implications For SEO Strategies and Website Optimization

It would be highly inadvisable to act on information shared from this supposed ‘leak’ without confirming whether it’s an actual Google search document.

Further, even if the content originates from search, the information is a year old and could have changed. Any insights derived from the leaked documentation should not be considered actionable now.

With that in mind, while the full implications remain unknown, here’s what we can glean from the leaked information.

1. Emphasis On User Engagement Metrics

If click data and user engagement metrics are direct ranking factors, as the leaked documents suggest, it could place greater emphasis on optimizing for these metrics.

This means crafting compelling titles and meta descriptions to increase click-through rates, ensuring fast page loads and intuitive navigation to reduce bounces, and strategically linking to keep users engaged on your site.

Driving traffic through other channels like social media and email can also help generate positive engagement signals.

However, it’s important to note that optimizing for user engagement shouldn’t come at the expense of creating reader-focused content. Gaming engagement metrics are unlikely to be a sustainable, long-term strategy.

Google has consistently emphasized the importance of quality and relevance in its public statements, and based on the leaked information, this will likely remain a key focus. Engagement optimization should support and enhance quality content, not replace it.

2. Potential Changes To Link-Building Strategies

The leaked documents contain information about how Google treats different types of links and their impact on search rankings.

This includes details about the use of anchor text, the classification of links into different quality tiers based on traffic to the linking page, and the potential for links to be ignored or demoted based on various spam factors.

If this information is accurate, it could influence how SEO professionals approach link building and the types of links they prioritize.

Links that drive real click-throughs may carry more weight than links on rarely visited pages.

The fundamentals of good link building still apply—create link-worthy content, build genuine relationships, and seek natural, editorially placed links that drive qualified referral traffic.

The leaked information doesn’t change this core approach but offers some additional nuance to be aware of.

3. Increased Focus On Brand Building and Driving Search Demand

The leaked documents suggest that Google uses brand-related signals and offline popularity as ranking factors. This could include metrics like brand mentions, searches for the brand name, and overall brand authority.

As a result, SEO strategies may emphasize building brand awareness and authority through both online and offline channels.

Tactics could include:

  • Securing brand mentions and links from authoritative media sources.
  • Investing in traditional PR, advertising, and sponsorships to increase brand awareness.
  • Encouraging branded searches through other marketing channels.
  • Optimizing for higher search volumes for your brand vs. unbranded keywords.
  • Building engaged social media communities around your brand.
  • Establishing thought leadership through original research, data, and industry contributions.

The idea is to make your brand synonymous with your niche and build an audience that seeks you out directly. The more people search for and engage with your brand, the stronger those brand signals may become in Google’s systems.

4. Adaptation To Vertical-Specific Ranking Factors

Some leaked information suggests that Google may use different ranking factors or algorithms for specific search verticals, such as news, local search, travel, or e-commerce.

If this is the case, SEO strategies may need to adapt to each vertical’s unique ranking signals and user intents.

For example, local search optimization may focus more heavily on factors like Google My Business listings, local reviews, and location-specific content.

Travel SEO could emphasize collecting reviews, optimizing images, and directly providing booking/pricing information on your site.

News SEO requires focusing on timely, newsworthy content and optimized article structure.

While the core principles of search optimization still apply, understanding your particular vertical’s nuances, based on the leaked information and real-world testing, can give you a competitive advantage.

The leaks suggest a vertical-specific approach to SEO could give you an advantage.

Conclusion

The Google API documentation leak has created a vigorous discussion about Google’s ranking systems.

As the SEO community continues to analyze and debate the leaked information, it’s important to remember a few key things:

  1. The information isn’t fully verified and lacks context. Drawing definitive conclusions at this stage is premature.
  2. Google’s ranking algorithms are complex and constantly evolving. Even if entirely accurate, this leak only represents a snapshot in time.
  3. The fundamentals of good SEO – creating high-quality, relevant, user-centric content and promoting it effectively – still apply regardless of the specific ranking factors at play.
  4. Real-world testing and results should always precede theorizing based on incomplete information.

What To Do Next

As an SEO professional, the best course of action is to stay informed about the leak.

Because details about the document remain unknown, it’s not a good idea to consider any takeaways actionable.

Most importantly, remember that chasing algorithms is a losing battle.

The only winning strategy in SEO is to make your website the best result for your message and audience. That’s Google’s endgame, and that’s where your focus should be, regardless of what any particular leaked document suggests.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google’s AI Overviews Shake Up Ecommerce Search Visibility

Published

on

By

Google's AI Overviews Shake Up Ecommerce Search Visibility

An analysis of 25,000 ecommerce queries by Bartosz Góralewicz, founder of Onely, reveals the impact of Google’s AI overviews on search visibility for online retailers.

The study found that 16% of eCommerce queries now return an AI overview in search results, accounting for 13% of total search volume in this sector.

Notably, 80% of the sources listed in these AI overviews do not rank organically for the original query.

“Ranking #1-3 gives you only an 8% chance of being a source in AI overviews,” Góralewicz stated.

Shift Toward “Accelerated” Product Experiences

International SEO consultant Aleyda Solis analyzed the disconnect between traditional organic ranking and inclusion in AI overviews.

According to Solis, for product-related queries, Google is prioritizing an “accelerated” approach over summarizing currently ranking pages.

She commented Góralewicz’ findings, stating:

“… rather than providing high level summaries of what’s already ranked organically below, what Google does with e-commerce is “accelerate” the experience by already showcasing what the user would get next.”

Solis explains that for queries where Google previously ranked category pages, reviews, and buying guides, it’s now bypassing this level of results with AI overviews.

Assessing AI Overview Traffic Impact

To help retailers evaluate their exposure, Solis has shared a spreadsheet that analyzes the potential traffic impact of AI overviews.

As Góralewicz notes, this could be an initial rollout, speculating that “Google will expand AI overviews for high-cost queries when enabling ads” based on data showing they are currently excluded for high cost-per-click keywords.

An in-depth report across ecommerce and publishing is expected soon from Góralewicz and Onely, with additional insights into this search trend.

Why SEJ Cares

AI overviews represent a shift in how search visibility is achieved for ecommerce websites.

With most overviews currently pulling product data from non-ranking sources, the traditional connection between organic rankings and search traffic is being disrupted.

Retailers may need to adapt their SEO strategies for this new search environment.

How This Can Benefit You

While unsettling for established brands, AI overviews create new opportunities for retailers to gain visibility without competing for the most commercially valuable keywords.

Ecommerce sites can potentially circumvent traditional ranking barriers by optimizing product data and detail pages for Google’s “accelerated” product displays.

The detailed assessment framework provided by Solis enables merchants to audit their exposure and prioritize optimization needs accordingly.


FAQ

What are the key findings from the analysis of AI overviews & ecommerce queries?

Góralewicz’s analysis of 25,000 ecommerce queries found:

  • 16% of ecommerce queries now return an AI overview in the search results.
  • 80% of the sources listed in these AI overviews do not rank organically for the original query.
  • Ranking positions #1-3 only provides an 8% chance of being a source in AI overviews.

These insights reveal significant shifts in how ecommerce sites need to approach search visibility.

Why are AI overviews pulling product data from non-ranking sources, and what does this mean for retailers?

Google’s AI overviews prioritize “accelerated” experiences over summarizing currently ranked pages for product-related queries.

This shift focuses on showcasing directly what users seek instead of traditional organic results.

For retailers, this means:

  • A need to optimize product pages beyond traditional SEO practices, catering to the data requirements of AI overviews.
  • Opportunities to gain visibility without necessarily holding top organic rankings.
  • Potential to bypass traditional ranking barriers by focusing on enhanced product data integration.

Retailers must adapt quickly to remain competitive in this evolving search environment.

What practical steps can retailers take to evaluate and improve their search visibility in light of AI overview disruptions?

Retailers can take several practical steps to evaluate and improve their search visibility:

  • Utilize the spreadsheet provided by Aleyda Solis to assess the potential traffic impact of AI overviews.
  • Optimize product and detail pages to align with the data and presentation style preferred by AI overviews.
  • Continuously monitor changes and updates to AI overviews, adapting strategies based on new data and trends.

These steps can help retailers navigate the impact of AI overviews and maintain or improve their search visibility.


Featured Image: Marco Lazzarini/Shutterstock



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google’s AI Overviews Go Viral, Draw Mainstream Media Scrutiny

Published

on

By

Google's AI Overviews Go Viral, Draw Mainstream Media Scrutiny

Google’s rollout of AI-generated overviews in US search results is taking a disastrous turn, with mainstream media outlets like The New York Times, BBC, and CNBC reporting on numerous inaccuracies and bizarre responses.

On social media, users are sharing endless examples of the feature’s nonsensical and sometimes dangerous output.

From recommending non-toxic glue on pizza to suggesting that eating rocks provides nutritional benefits, the blunders would be amusing if they weren’t so alarming.

Mainstream Media Coverage

As reported by The New York Times, Google’s AI overviews struggle with basic facts, claiming that Barack Obama was the first Muslim president of the United States and stating that Andrew Jackson graduated from college in 2005.

These errors undermine trust in Google’s search engine, which more than two billion people rely on for authoritative information worldwide.

Manual Removal & System Refinements

As reported by The Verge, Google is now scrambling to remove the bizarre AI-generated responses and improve its systems manually.

A Google spokesperson confirmed that the company is taking “swift action” to remove problematic responses and using the examples to refine its AI overview feature.

Google’s Rush To AI Integration

The flawed rollout of AI overviews isn’t an isolated incident for Google.

As CNBC notes in its report, Google made several missteps in a rush to integrate AI into its products.

In February, Google was forced to pause its Gemini chatbot after it generated inaccurate images of historical figures and refused to depict white people in most instances.

Before that, the company’s Bard chatbot faced ridicule for sharing incorrect information about outer space, leading to a $100 billion drop in Google’s market value.

Despite these setbacks, industry experts cited by The New York Times suggest that Google has little choice but to continue advancing AI integration to remain competitive.

However, the challenges of taming large language models, which ingest false information and satirical posts, are now more apparent.

The Debate Over AI In Search

The controversy surrounding AI overviews adds fuel to the debate over the risks and limitations of AI.

While the technology holds potential, these missteps remind everyone that more testing is needed before unleashing it on the public.

The BBC notes that Google’s rivals face similar backlash over their attempts to cram more AI tools into their consumer-facing products.

The UK’s data watchdog is investigating Microsoft after it announced a feature that would take continuous screenshots of users’ online activity.

At the same time, actress Scarlett Johansson criticized OpenAI for using a voice likened to her own without permission.

What This Means For Websites & SEO Professionals

Mainstream media coverage of Google’s erroneous AI overviews brings the issue of declining search quality to public attention.

As the company works to address inaccuracies, the incident serves as a cautionary tale for the entire industry.

Important takeaway: Prioritize responsible use of AI technology to ensure the benefits outweigh its risks.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending