SEO

Google On Percentage That Represents Duplicate Content

Published

2 years ago

September 26, 2022

Google On Percentage That Represents Duplicate Content

Google’s John Mueller recently answered a question of whether there’s a percentage threshold of content duplication that Google uses to identify and filter out duplicate content.

What Percentage Equals Duplicate Content?

The conversation actually started on Facebook when Duane Forrester (@DuaneForrester) asked if anyone knew if any search engine has published a percentage of content overlap at which content is considered duplicate.

Bill Hartzer (bhartzer) turned to Twitter to ask John Mueller and received a near immediate response.

Bill tweeted:

Hey @johnmu is there a percentage that represents duplicate content?
For example, should we be trying to make sure pages are at least 72.6 percent unique than other pages on our site?
Does Google even measure it?
Advertisement

— Bill Hartzer (@bhartzer) September 23, 2022

Google’s John Mueller responded:

There is no number (also how do you measure it anyway?)
— 🌽〈link href=//johnmu.com rel=canonical 〉🌽 (@JohnMu) September 23, 2022

How Does Google Detect Duplicate Content?

Google’s methodology for detecting duplicate content has remained remarkably similar for many years.

Back in 2013, Matt Cutts (@mattcutts), a software engineer at the time at Google published an official Google video describing how Google detects duplicate content.

He started the video by stating that a great deal of Internet content is duplicate and that it’s a normal thing to happen.

“It’s important ot realize that if you look at content on the web, something like 25% or 30% of all the web’s content is duplicate content.
…People will quote a paragraph of a blog and then link to the blog, that sort of thing.”

He went on to say that because so much of duplicate content is innocent and without spammy intent that Google won’t penalize that content.

Penalizing webpages for having some duplicate content, he said, would have a negative effect on the quality of the search results.

What Google does when it finds duplicate content is:

“…try to group it all together and treat it as if it’s just one piece of content.”

Matt continued:

“It’s just treated as something that we need to cluster appropriately. And we need to make sure that it ranks correctly.”

He explained that Google then chooses which page to show in the search results and that it filters out the duplicate pages in order to improve the user experience.

How Google Handles Duplicate Content – 2020 Version

Fast forward to 2020 and Google published a Search Off the Record podcast episode where the same topic is described in remarkably similar language.

Here is the relevant section of that podcast from the 06:44 minutes into the episode:

“Gary Illyes: And now we ended up with the next step, which is actually canonicalization and dupe detection.
Martin Splitt: Isn’t that the same, dupe detection and canonicalization, kind of?
Gary Illyes: [00:06:56] Well, it’s not, right? Because first you have to detect the dupes, basically cluster them together, saying that all of these pages are dupes of each other,
and then you have to basically find a leader page for all of them.
Advertisement

…And that is canonicalization.
So, you have the duplication, which is the whole term, but within that you have cluster building, like dupe cluster building, and canonicalization. “

Gary next explains in technical terms how exactly they do this. Basically, Google isn’t really looking at percentages exactly, but rather comparing checksums.

A checksum can be said to be a representation of content as a series of numbers or letters. So if the content is duplicate then the checksum number sequence will be similar.

This is how Gary explained it:

“So, for dupe detection what we do is, well, we try to detect dupes.
And how we do that is perhaps how most people at other search engines do it, which is, basically, reducing the content into a hash or checksum and then comparing the checksums.”
Advertisement

Gary said Google does it that way because it’s easier (and obviously accurate).

Google Detects Duplicate Content with Checksums

So when talking about duplicate content it’s probably not a matter of a threshold of percentage, where there’s a number at which content is said to be duplicate.

But rather, duplicate content is detected with a representation of the content in the form of a checksum and then those checksums are compared.

An additional takeaway is that there appears to be a distinction between when part of the content is duplicate and all of the content is duplicate.

Featured image by Shutterstock/Ezume Images

Related Topics:Content Duplicate Google Percentage Represents seo

Up Next

How Hunter Built 96 Links in 3 Months (Case Study)

Don't Miss

How To Launch Your First Google Ads Remarketing Campaign

Click to comment

You must be logged in to post a comment Login

SEO

How To Drive Pipeline With A Silo-Free Strategy

Published

9 hours ago

May 1, 2024

Max

How To Drive Pipeline With A Silo-Free Strategy

When it comes to B2B strategy, a holistic approach is the only approach.

Revenue organizations usually operate with siloed teams, and often expect a one-size-fits-all solution (usually buying clicks with paid media).

However, without cohesive brand, infrastructure, and pipeline generation efforts, they’re pretty much doomed to fail.

It’s just like rowing crew, where each member of the team must synchronize their movements to propel the boat forward – successful B2B marketing requires an integrated strategy.

So if you’re ready to ditch your disjointed marketing efforts and try a holistic approach, we’ve got you covered.

Join us on May 15, for an insightful live session with Digital Reach Agency on how to craft a compelling brand and PMF.

We’ll walk through the critical infrastructure you need, and the reliances and dependences of the core digital marketing disciplines.

Key takeaways from this webinar:

Thinking Beyond Traditional Silos: Learn why traditional marketing silos are no longer viable and how they spell doom for modern revenue organizations.
How To Identify and Fix Silos: Discover actionable strategies for pinpointing and sealing the gaps in your marketing silos.
The Power of Integration: Uncover the secrets to successfully integrating brand strategy, digital infrastructure, and pipeline generation efforts.

Ben Childs, President and Founder of Digital Reach Agency, and Jordan Gibson, Head of Growth at Digital Reach Agency, will show you how to seamlessly integrate various elements of your marketing strategy for optimal results.

Don’t make the common mistake of using traditional marketing silos – sign up now and learn what it takes to transform your B2B go-to-market.

You’ll also get the opportunity to ask Ben and Jordan your most pressing questions, following the presentation.

And if you can’t make it to the live event, register anyway and we’ll send you a recording shortly after the webinar.

SEO

Why Big Companies Make Bad Content

Published

13 hours ago

May 1, 2024

Entireweb News Bot

It’s like death and taxes: inevitable. The bigger a company gets, the worse its content marketing becomes.

HubSpot teaching you how to type the shrug emoji or buy bitcoin stock. Salesforce sharing inspiring business quotes. GoDaddy helping you use Bing AI, or Zendesk sharing catchy sales slogans.

Judged by content marketing best practice, these articles are bad.

They won’t resonate with decision-makers. Nobody will buy a HubSpot license after Googling “how to buy bitcoin stock.” It’s the very definition of vanity traffic: tons of visits with no obvious impact on the business.

So why does this happen?

I did a double-take the first time I discovered this article on the HubSpot blog.

There’s an obvious (but flawed) answer to this question: big companies are inefficient.

As companies grow, they become more complicated, and writing good, relevant content becomes harder. I’ve experienced this firsthand:

extra rounds of legal review and stakeholder approval creeping into processes.
content watered down to serve an ever-more generic “brand voice”.
growing misalignment between search and content teams.
a lack of content leadership within the company as early employees leave.

Similarly, funded companies have to grow, even when they’re already huge. Content has to feed the machine, continually increasing traffic… even if that traffic never contributes to the bottom line.

There’s an element of truth here, but I’ve come to think that both these arguments are naive, and certainly not the whole story.

It is wrong to assume that the same people that grew the company suddenly forgot everything they once knew about content, and wrong to assume that companies willfully target useless keywords just to game their OKRs.

Instead, let’s assume that this strategy is deliberate, and not oversight. I think bad content—and the vanity traffic it generates—is actually good for business.

There are benefits to driving tons of traffic, even if that traffic never directly converts. Or put in meme format:

Programmatic SEO is a good example. Why does Dialpad create landing pages for local phone numbers?

1714584366 91 Why Big Companies Make Bad Content

Why does Wise target exchange rate keywords?

1714584366 253 Why Big Companies Make Bad Content

Why do we have a list of most popular websites pages?

1714584367 988 Why Big Companies Make Bad Content

As this Twitter user points out, these articles will never convert…

The conversion rate for this must be awful though. Like there is no purchase intent behind these searches, and it’s most likely all consumers who aren’t their target market and don’t then want to buy an ‘AI powered collaboration platform’. These are basically vanity visits…
— Aaron Beashel (@aaronbeashel) February 28, 2024

…but they don’t need to.

Every published URL and targeted keyword is a new doorway from the backwaters of the internet into your website. It’s a chance to acquire backlinks that wouldn’t otherwise exist, and an opportunity to get your brand in front of thousands of new, otherwise unfamiliar people.

These benefits might not directly translate into revenue, but over time, in aggregate, they can have a huge indirect impact on revenue. They can:

Strengthen domain authority and the search performance of every other page on the website.
Boost brand awareness, and encourage serendipitous interactions that land your brand in front of the right person at the right time.
Deny your competitors traffic and dilute their share of voice.

These small benefits become more worthwhile when multiplied across many hundreds or thousands of pages. If you can minimize the cost of the content, there is relatively little downside.

What about topical authority?

“But what about topical authority?!” I hear you cry. “If you stray too far from your area of expertise, won’t rankings suffer for it?”

I reply simply with this screenshot of Forbes’ “health” subfolder, generating almost 4 million estimated monthly organic pageviews:

1714584367 695 Why Big Companies Make Bad Content

And big companies can minimize cost. For large, established brands, the marginal cost of content creation is relatively low.

Many companies scale their output through networks of freelancer writers, avoiding the cost of fully loaded employees. They have established, efficient processes for research, briefing, editorial review, publication and maintenance. The cost of an additional “unit” of content—or ten, or a hundred—is not that great, especially relative to other marketing channels.

There is also relatively little opportunity cost to consider: the fact that energy spent on “vanity” traffic could be better spent elsewhere, on more business-relevant topics.

In reality, many of the companies engaging in this strategy have already plucked the low-hanging fruit and written almost every product-relevant topic. There are a finite number of high traffic, high relevance topics; blog consistently for a decade and you too will reach these limits.

On top of that, the HubSpots and Salesforces of the world have very established, very efficient sales processes. Content gating, lead capture and scoring, and retargeting allow them to put very small conversion rates to relatively good use.

1714584367 376 Why Big Companies Make Bad Content

Even HubSpot’s article on Bitcoin stock has its own relevant call-to-action—and for HubSpot, building a database of aspiring investors is more valuable than it sounds, because…

The bigger a company grows, the bigger its audience needs to be to continue sustaining that growth rate.

Companies generally expand their total addressable market (TAM) as they grow, like HubSpot broadening from marketing to sales and customer success, launching new product lines for new—much bigger—audiences. This means the target audience for their content marketing grows alongside.

As Peep Laja put its:

When in early stages, you have to focus.
But as you grow in revenue and want to become an absolute monster of a company… they all seem to become for “everyone”. Salesforce, Hubspot, Zendesk, Freshworks, etc etc.
Any exceptions here? $1B+ B2B SaaS companies that are narrowly…
— Pe:p Laja (@peeplaja) April 4, 2024

But for the biggest companies, this principle is taken to an extreme. When a company gears up to IPO, its target audience expands to… pretty much everyone.

This was something Janessa Lantz (ex-HubSpot and dbt Labs) helped me understand: the target audience for a post-IPO company is not just end users, but institutional investors, market analysts, journalists, even regular Jane investors.

These are people who can influence the company’s worth in ways beyond simply buying a subscription: they can invest or encourage others to invest and dramatically influence the share price. These people are influenced by billboards, OOH advertising and, you guessed it, seemingly “bad” content showing up whenever they Google something.

You can think of this as a second, additional marketing funnel for post-IPO companies:

Illustration: When companies IPO, the traditional marketing funnel is accompanied by a second funnel. Website visitors contribute value through stock appreciation, not just revenue.

These visitors might not purchase a software subscription when they see your article in the SERP, but they will notice your brand, and maybe listen more attentively the next time your stock ticker appears on the news.

They won’t become power users, but they might download your eBook and add an extra unit to the email subscribers reported in your S1.

They might not contribute revenue now, but they will in the future: in the form of stock appreciation, or becoming the target audience for a future product line.

Vanity traffic does create value, but in a form most content marketers are not used to measuring.

If any of these benefits apply, then it makes sense to acquire them for your company—but also to deny them to your competitors.

SEO is an arms race: there are a finite number of keywords and topics, and leaving a rival to claim hundreds, even thousands of SERPs uncontested could very quickly create a headache for your company.

SEO can quickly create a moat of backlinks and brand awareness that can be virtually impossible to challenge; left unchecked, the gap between your company and your rival can accelerate at an accelerating pace.

Pumping out “bad” content and chasing vanity traffic is a chance to deny your rivals unchallenged share of voice, and make sure your brand always has a seat at the table.

Final thoughts

These types of articles are miscategorized—instead of thinking of them as bad content, it’s better to think of them as cheap digital billboards with surprisingly great attribution.

Big companies chasing “vanity traffic” isn’t an accident or oversight—there are good reasons to invest energy into content that will never convert. There is benefit, just not in the format most content marketers are used to.

This is not an argument to suggest that every company should invest in hyper-broad, high-traffic keywords. But if you’ve been blogging for a decade, or you’re gearing up for an IPO, then “bad content” and the vanity traffic it creates might not be so bad.

SEO

Is It Alternatives You’re Looking For?

Published

19 hours ago

May 1, 2024

Entireweb News Bot

You probably clicked this result because a) you appreciate a good Lionel Ritchie pun, or b) you’ve heard that HARO is dead and want some alternatives—hopefully both.

Whatever the reason, in this article, I’ll share some alternatives to HARO and a few extra ways to get expert quotes and backlinks for your website.

Disclaimer: I am not a PR expert. I did a bit of outreach a few years ago, but I have only been an occasional user of HARO in the past year or so.

So, rather than providing my opinion on the best alternatives to HARO, I thought it would be fun to ask users of the “new HARO” what they thought were the best alternatives.

I wanted to give the “new HARO”—Connectively—the benefit of the doubt.

Still, a few minutes after my pitch was accepted, I got two responses that appeared to be AI-generated from two “visionary directors,” both with “extensive experience.”

My experience of Connectively so far mirrored Josh’s experience of old HARO: The responses were most likely automated.

Although I was off to a bad start, looking through most of the responses afterward, these two were the only blatant automated pitches I could spot.

These responses weren’t included in my survey, and anyone who saw my pitch would have to copy and paste the survey link to complete it—increasing the chance of genuine human responses—hopefully.

So, without further ado, here are the results of the survey.

1714560965 992 Is It Alternatives Youre Looking For

Sidenote.

The survey on Connectively ran for a week and received 101 votes. Respondents could vote for their top three HARO alternatives.

Price: Free.

Help a B2B Writer was the #1 alternative platform respondents recommended. In my survey it got 22% of the vote.

Help a B2B Writer is a platform run by Superpath that is similar to HARO but focused on connecting business-to-business (B2B) journalists with industry experts and sources for their stories.

Price: Free and paid plans. Paid plans start at $99 per month.

Coming in joint second place, Featured was popular, scoring 18% of the vote.

Featured connects journalists with experts and thought leaders. It allows experts to create profiles showcasing their expertise and helps journalists find suitable sources for their stories.

1714560966 485 Is It Alternatives Youre Looking For

Price: Free and paid plans. Paid plans start at $99 per month.

Qwoted is another platform that I’ve heard talked about a lot. It came in joint second place, scoring 18% of the vote.

Qwoted matches journalists with expert sources, allowing them to collaborate on creating high-quality content. It streamlines the process of finding and connecting with relevant sources.

1714560966 768 Is It Alternatives Youre Looking For

Price: Free for ten pitches per month

Despite being the “new HARO,” Connectively came 4th on my list, scoring 12% of the vote—surprisingly, it wasn’t even the top choice for most users on its own platform.

Connectively connects journalists with sources and experts. It helps journalists find relevant sources for their stories and allows experts to gain media exposure.

Price: Free and paid plans. Paid plans start at $5.95 per month.

SourceBottle is an online platform that connects journalists, bloggers, and media professionals with expert sources. It allows experts to pitch their ideas and insights to journalists looking for story sources. It scored 9% of the vote in my survey.

1714560967 683 Is It Alternatives Youre Looking For

Price: Free.

JournoRequest is an X account that shares journalist requests for sources. UK-based journalists and experts often use it, but it can sometimes have international reach. It scored 7% of the vote in my survey.

1714560967 333 Is It Alternatives Youre Looking For

Price: Paid. Plans start at $1,150 per year.

ProfNet connects journalists to expert sources. It helps journalists find knowledgeable sources for their articles, interviews, and other media content. It helps subject matter experts gain media exposure and share their expertise. It scored 5% of the vote in my survey.

1714560967 357 Is It Alternatives Youre Looking For

Price: 7-day free trial and paid plans. Paid plans start at $147 per month.

JustReachOut is a PR and influencer outreach platform that helps businesses find and connect with relevant journalists and influencers. It provides tools for personalized outreach and relationship management. It scored 3% of the vote in my survey.

1714560967 678 Is It Alternatives Youre Looking For

Price: 14-day free trial and paid plans. Paid plans start at $50 per month.

OnePitch is a platform that simplifies the process of pitching story ideas to journalists. Businesses and PR professionals can create and send targeted pitches to relevant media outlets. It scored 3% of the vote in my survey.

1714560967 17 Is It Alternatives Youre Looking For

Price: Free.

PitchRate is a free PR tool that connects journalists and highly rated experts. Useful for subject matter experts looking for free PR leads, media coverage, or publicity. Or journalists looking for credible sources. It scored 1% of the vote in my survey.

1714560967 890 Is It Alternatives Youre Looking For

Price: Free and paid plans. Paid plans start at ~$105 per month.

A UK service that connects media professionals with expert sources, press releases, and PR contacts. It scored 1% of the vote in my survey.

1714560968 51 Is It Alternatives Youre Looking For

Price: Invitation-only platform.

Forbes Councils is an invitation-only community for executives and entrepreneurs. Members can contribute expert insights and thought leadership content to Forbes.com and gain media exposure. It scored 1% of the vote in my survey.

1714560968 753 Is It Alternatives Youre Looking For

Price: Free.

Yes, you read that right.

HERO was created by Peter Shankman, the original creator of HARO, who said the platform will always be free. It scored 1% of the vote in my survey.

Peter set up the platform after receiving over 2,000 emails asking him to build a new version of HARO.

1714560968 119 Is It Alternatives Youre Looking For

Price: Paid. Sign up for details.

Meltwater received no votes in my survey, but I included it because I’d seen it shared on social media as a paid alternative to HARO.

It’s a media intelligence and social media monitoring platform. It provides tools for tracking media coverage, analyzing sentiment, and identifying influencers and journalists for outreach.

1714560968 923 Is It Alternatives Youre Looking For

Price: Free.

Expertise Finder also received no votes in my survey, but it was included as I saw it had been recommended as an HARO alternative on LinkedIn. It’s a platform that helps journalists find and connect with expert sources from universities.

HARO had a dual purpose for SEOs: it was a place to acquire links, but it also was a place to get expert quotes on topics for your next article.

Here are a few more free methods outside the platforms we’ve covered that can help you get expert quotes and links.

We’ve already seen that JournoRequest is a popular X account that shares journalist requests for sources.

But you can also follow hashtags on X to access even more opportunities.

Here are my favorite hashtags to follow:

I used to track the #journorequest hashtag to find opportunities for my clients when I worked agency-side, so I know it can work well for quotes and link acquisition.

Here are two opportunities I found just checking the #journorequest hashtag:

Here’s another example from the Telegraph—a DR 92 website:

Telegraph example DR 92 website journorequest

Certain types of content are more likely to be shared by journalists and PRs than others.

One of these types of content is statistics-based content. The reason? Journalists often use statistics to support their points.

Once they have included your statistic in their post, they often add a backlink back to your post.

We tested this with our SEO statistics post, and as you can see, it still ranks number one in Google.

Another method is to use the Linking authors report in Ahrefs’ Site Explorer. This report shows the authors’ names who link to any website you enter.

You can see which authors link to their site by entering your competitor’s domain. Some of these authors may represent outreach opportunities for your website as well.

Head to Site Explorer, click on Linking authors
Type in your competitor’s URL
Contact any authors that you think may be interested in your website and its content

Tip

If you download your website’s linking authors and your competitors into a spreadsheet and put them into separate tabs, you can compare the lists to see which authors are only linking to your competitor’s website.

When I was about to wrap up this article, I was contacted by Greg Heilers of Jolly SEO on LinkedIn.

He said he’d sent 200,000+ pitches over the years and wanted to share the results with me.

These are his top three platforms over the last 1,000 pitches he sent. Interestingly, we can see that it’s similar to my much smaller-scale survey.

Jolly SEOs 1000 link placements — Ordered by number of link placements. Average Domain Rating from Ahrefs. Average organic traffic from Ahrefs.

Hopefully, the data here speaks for itself. The high-quality links and traffic from HARO alternatives is considerable.

This research shows that Featured gained the most link placements in this campaign.

We have compiled some helpful content related to link building that you can get your teeth into. These hand-picked guides will take you from beginner to expert in no time.

Here are my favorite resources on this topic:

Final thoughts

There are many options for sourcing expert quotes and getting links for your next marketing campaigns. HARO may be dead, but its legacy lives on.

My highly unscientific survey suggests that most “new HARO” users liked Help a B2B Writer the most, but for HARO purists, there really is only one choice—HERO.

Give your favorites from this list a whirl, and let me know if you have any success. Got more questions? Ping me on X. 🙂