Connect with us

SEO

Semantic Keyword Clustering For 10,000+ Keywords [With Script]

Published

on

Semantic Keyword Clustering For 10,000+ Keywords [With Script]

Semantic keyword clustering can help take your keyword research to the next level.

In this article, you’ll learn how to use a Google Colaboratory sheet shared exclusively with Search Engine Journal readers.

This article will walk you through using the Google Colab sheet, a high-level view of how it works under the hood, and how to make adjustments to suit your needs.

But first, why cluster keywords at all?

Common Use Cases For Keyword Clustering

Here are a few use cases for clustering keywords.

Advertisement

Faster Keyword Research:

  • Filter out branded keywords or keywords with no commercial value.
  • Group related keywords together to create more in-depth articles.
  • Group related questions and answers together for FAQ creation.

Paid Search Campaigns:

  • Create negative keyword lists for Ads using large datasets faster – stop wasting money on junk keywords!
  • Group similar keywords into campaign ideas for Ads.

Here’s an example of the script clustering similar questions together, perfect for an in-depth article!

Screenshot from Microsoft Excel, February 2022

Issues With Earlier Versions Of This Tool

If you’ve been following my work on Twitter, you’ll know I’ve been experimenting with keyword clustering for a while now.

Earlier versions of this script were based on the excellent PolyFuzz library using TF-IDF matching.

While it got the job done, there were always some head-scratching clusters which I felt the original result could be improved on.

Words that shared a similar pattern of letters would be clustered even if they were unrelated semantically.

For example, it was unable to cluster words like “Bike” with “Bicycle”.

Earlier versions of the script also had other issues:

Advertisement
  • It didn’t work well in languages other than English.
  • It created a high number of groups that were unable to be clustered.
  • There wasn’t much control over how the clusters were created.
  • The script was limited to ~10,000 rows before it timed out due to a lack of resources.

Semantic Keyword Clustering Using Deep Learning Natural Language Processing (NLP)

Fast forward four months to the latest release which has been completely rewritten to utilize state-of-the-art, deep learning sentence embeddings.

Check out some of these awesome semantic clusters!

Notice that heated, thermal, and warm are contained within the same cluster of keywords?

excel sheet showing an example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Or how about, Wholesale and Bulk?

excel sheet showing another example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Dog and Dachshund, Xmas and Christmas?

excel sheet showing another example of semantic keyword clustering. Showing that Dachshund and dogs have been grouped together.Screenshot from Microsoft Excel, February 2022

It can even cluster keywords in over one hundred different languages!

excel sheet showing another example of semantic keyword clustering in FrenchScreenshot from Microsoft Excel, February 2022

Features Of The New Script Versus Earlier Iterations

In addition to semantic keyword grouping, the following improvements have been added to the latest version of this script.

  • Support for clustering 10,000+ keywords at once.
  • Reduced no cluster groups.
  • Ability to choose different pre-trained models (although the default model works fine!).
  • Ability to choose how closely related clusters should be.
  • Choice of the minimum number of keywords to use per cluster.
  • Automatic detection of character encoding and CSV delimiters.
  • Multi-lingual clustering.
  • Works with many common keyword exports out of the box. (Search Console Data, AdWords or third-party keyword tools like Ahrefs and Semrush).
  • Works with any CSV file with a column named “Keyword.”
  • Simple to use (The script works by inserting a new column called Cluster Name to any list of keywords uploaded).

How To Use The Script In Five Steps (Quick Start)

To get started, you will need to click this link, and then choose the option, Open in Colab as shown below.

How to Open Google Colab from GithubScreenshot from Google Colaboratory, February 2022

Change the Runtime type to GPU by selecting Runtime > Change Runtime Type.

Google Collab, How to change settings to use the GPUScreenshot from Google Colaboratory, February 2022

Select Runtime > Run all from the top navigation from within Google Colaboratory, (Or just press Ctrl+F9).

How to run all cell in Google ColabScreenshot from Google Colaboratory, February 2022

Upload a .csv file containing a column called “Keyword” when prompted.

How to upload a file using Google ColabScreenshot from Google Colaboratory, February 2022

Clustering should be fairly quick, but ultimately it depends on the number of keywords, and the model used.

Generally speaking, you should be good for 50,000 keywords.

If you see a Cuda Out of Memory Error, you’re trying to cluster too many keywords at the same time!

(It’s worth noting that this script can easily be adapter to run on a local machine without the confines of Google Colaboratory.)

The Script Output

The script will run and append clusters to your original file to a new column called Cluster Name.

Advertisement

Cluster names are assigned using the shortest length keyword in the cluster.

For example, the cluster name for the following group of keywords has been set as “alpaca socks” because that is the shortest keyword in the cluster.

Demonstration of the example output from the script showing alpaca socks have been grouped together Screenshot from Microsoft Excel, February 2022

Once clustering has been completed, a new file is automatically saved, with clustered appended in a new column to the original file.

How The Key Clustering Tool Works

This script is based upon the Fast Clustering algorithm and uses models which have been pre-trained at scale on large amounts of data.

This makes it easy to compute the semantic relationships between keywords using off-the-shelf models.

(You don’t have to be a data scientist to use it!)

In fact, whilst I’ve made it customizable for those who like to tinker and experiment, I’ve chosen some balanced defaults which should be reasonable for most people’s use cases.

Advertisement

Different models can be swapped in and out of the script depending on the requirements, (faster clustering, better multi-language support, better semantic performance, and so on).

After a lot of testing, I found the perfect balance of speed and accuracy using the all-MiniLM-L6-v2 transformer which provided a great balance between speed and accuracy.

If you prefer to use your own, you can just experiment, you can replace the existing pre-trained model with any of the models listed here or on the Hugging Face Model Hub.

Swapping In Pre-Trained Models

Swapping in models is as easy as replacing the variable with the name of your preferred transformer.

For example, you can change the default model all-miniLM-L6-v2 to all-mpnet-base-v2 by editing:

transformer = ‘all-miniLM-L6-v2’

to

Advertisement

transformer = ‘all-mpnet-base-v2

Here’s where you would edit it in the Google Colaboratory sheet.

How to choose a sentence transformer for keyword clusteringScreenshot from Google Colaboratory, February 2022

The Trade-off Between Cluster Accuracy And No Cluster Groups

A common complaint with previous iterations of this script is that it resulted in a high number of unclustered results.

Unfortunately, it will always be a balancing act between cluster accuracy versus the number of clusters.

A higher cluster accuracy setting will result in a higher number of unclustered results.

There are two variables that can directly influence the size and accuracy of all clusters:

min_cluster_size

and

cluster accuracy

I have set a default of 85 (/100) for cluster accuracy and a minimum cluster size of 2.

Advertisement

In testing, I found this to be the sweet spot, but feel free to experiment!

Here’s where to set those variables in the script.

How to set the minimum sentence size and keyword cluster accuracyScreenshot from Google Colaboratory, February 2022

That’s it! I hope this keyword clustering script is useful to your work.

More resources:


Featured Image: Graphic Grid/Shutterstock




Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Measuring Content Impact Across The Customer Journey

Published

on

By

Measuring Content Impact Across The Customer Journey

Understanding the impact of your content at every touchpoint of the customer journey is essential – but that’s easier said than done. From attracting potential leads to nurturing them into loyal customers, there are many touchpoints to look into.

So how do you identify and take advantage of these opportunities for growth?

Watch this on-demand webinar and learn a comprehensive approach for measuring the value of your content initiatives, so you can optimize resource allocation for maximum impact.

You’ll learn:

  • Fresh methods for measuring your content’s impact.
  • Fascinating insights using first-touch attribution, and how it differs from the usual last-touch perspective.
  • Ways to persuade decision-makers to invest in more content by showcasing its value convincingly.

With Bill Franklin and Oliver Tani of DAC Group, we unravel the nuances of attribution modeling, emphasizing the significance of layering first-touch and last-touch attribution within your measurement strategy. 

Check out these insights to help you craft compelling content tailored to each stage, using an approach rooted in first-hand experience to ensure your content resonates.

Advertisement

Whether you’re a seasoned marketer or new to content measurement, this webinar promises valuable insights and actionable tactics to elevate your SEO game and optimize your content initiatives for success. 

View the slides below or check out the full webinar for all the details.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

How to Find and Use Competitor Keywords

Published

on

How to Find and Use Competitor Keywords

Competitor keywords are the keywords your rivals rank for in Google’s search results. They may rank organically or pay for Google Ads to rank in the paid results.

Knowing your competitors’ keywords is the easiest form of keyword research. If your competitors rank for or target particular keywords, it might be worth it for you to target them, too.

There is no way to see your competitors’ keywords without a tool like Ahrefs, which has a database of keywords and the sites that rank for them. As far as we know, Ahrefs has the biggest database of these keywords.

How to find all the keywords your competitor ranks for

  1. Go to Ahrefs’ Site Explorer
  2. Enter your competitor’s domain
  3. Go to the Organic keywords report

The report is sorted by traffic to show you the keywords sending your competitor the most visits. For example, Mailchimp gets most of its organic traffic from the keyword “mailchimp.”

Mailchimp gets most of its organic traffic from the keyword, “mailchimp”.Mailchimp gets most of its organic traffic from the keyword, “mailchimp”.

Since you’re unlikely to rank for your competitor’s brand, you might want to exclude branded keywords from the report. You can do this by adding a Keyword > Doesn’t contain filter. In this example, we’ll filter out keywords containing “mailchimp” or any potential misspellings:

Filtering out branded keywords in Organic keywords reportFiltering out branded keywords in Organic keywords report

If you’re a new brand competing with one that’s established, you might also want to look for popular low-difficulty keywords. You can do this by setting the Volume filter to a minimum of 500 and the KD filter to a maximum of 10.

Finding popular, low-difficulty keywords in Organic keywordsFinding popular, low-difficulty keywords in Organic keywords

How to find keywords your competitor ranks for, but you don’t

  1. Go to Competitive Analysis
  2. Enter your domain in the This target doesn’t rank for section
  3. Enter your competitor’s domain in the But these competitors do section
Competitive analysis reportCompetitive analysis report

Hit “Show keyword opportunities,” and you’ll see all the keywords your competitor ranks for, but you don’t.

Content gap reportContent gap report

You can also add a Volume and KD filter to find popular, low-difficulty keywords in this report.

Volume and KD filter in Content gapVolume and KD filter in Content gap

How to find keywords multiple competitors rank for, but you don’t

  1. Go to Competitive Analysis
  2. Enter your domain in the This target doesn’t rank for section
  3. Enter the domains of multiple competitors in the But these competitors do section
Competitive analysis report with multiple competitorsCompetitive analysis report with multiple competitors

You’ll see all the keywords that at least one of these competitors ranks for, but you don’t.

Content gap report with multiple competitorsContent gap report with multiple competitors

You can also narrow the list down to keywords that all competitors rank for. Click on the Competitors’ positions filter and choose All 3 competitors:

Selecting all 3 competitors to see keywords all 3 competitors rank forSelecting all 3 competitors to see keywords all 3 competitors rank for
  1. Go to Ahrefs’ Site Explorer
  2. Enter your competitor’s domain
  3. Go to the Paid keywords report
Paid keywords reportPaid keywords report

This report shows you the keywords your competitors are targeting via Google Ads.

Since your competitor is paying for traffic from these keywords, it may indicate that they’re profitable for them—and could be for you, too.

Advertisement

You know what keywords your competitors are ranking for or bidding on. But what do you do with them? There are basically three options.

1. Create pages to target these keywords

You can only rank for keywords if you have content about them. So, the most straightforward thing you can do for competitors’ keywords you want to rank for is to create pages to target them.

However, before you do this, it’s worth clustering your competitor’s keywords by Parent Topic. This will group keywords that mean the same or similar things so you can target them all with one page.

Here’s how to do that:

  1. Export your competitor’s keywords, either from the Organic Keywords or Content Gap report
  2. Paste them into Keywords Explorer
  3. Click the “Clusters by Parent Topic” tab
Clustering keywords by Parent TopicClustering keywords by Parent Topic

For example, MailChimp ranks for keywords like “what is digital marketing” and “digital marketing definition.” These and many others get clustered under the Parent Topic of “digital marketing” because people searching for them are all looking for the same thing: a definition of digital marketing. You only need to create one page to potentially rank for all these keywords.

Keywords under the cluster of "digital marketing"Keywords under the cluster of "digital marketing"

2. Optimize existing content by filling subtopics

You don’t always need to create new content to rank for competitors’ keywords. Sometimes, you can optimize the content you already have to rank for them.

How do you know which keywords you can do this for? Try this:

Advertisement
  1. Export your competitor’s keywords
  2. Paste them into Keywords Explorer
  3. Click the “Clusters by Parent Topic” tab
  4. Look for Parent Topics you already have content about

For example, if we analyze our competitor, we can see that seven keywords they rank for fall under the Parent Topic of “press release template.”

Our competitor ranks for seven keywords that fall under the "press release template" clusterOur competitor ranks for seven keywords that fall under the "press release template" cluster

If we search our site, we see that we already have a page about this topic.

Site search finds that we already have a blog post on press release templatesSite search finds that we already have a blog post on press release templates

If we click the caret and check the keywords in the cluster, we see keywords like “press release example” and “press release format.”

Keywords under the cluster of "press release template"Keywords under the cluster of "press release template"

To rank for the keywords in the cluster, we can probably optimize the page we already have by adding sections about the subtopics of “press release examples” and “press release format.”

3. Target these keywords with Google Ads

Paid keywords are the simplest—look through the report and see if there are any relevant keywords you might want to target, too.

For example, Mailchimp is bidding for the keyword “how to create a newsletter.”

Mailchimp is bidding for the keyword “how to create a newsletter”Mailchimp is bidding for the keyword “how to create a newsletter”

If you’re ConvertKit, you may also want to target this keyword since it’s relevant.

If you decide to target the same keyword via Google Ads, you can hover over the magnifying glass to see the ads your competitor is using.

Mailchimp's Google Ad for the keyword “how to create a newsletter”Mailchimp's Google Ad for the keyword “how to create a newsletter”

You can also see the landing page your competitor directs ad traffic to under the URL column.

The landing page Mailchimp is directing traffic to for “how to create a newsletter”The landing page Mailchimp is directing traffic to for “how to create a newsletter”

Learn more

Check out more tutorials on how to do competitor keyword analysis:

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google Confirms Links Are Not That Important

Published

on

By

Google confirms that links are not that important anymore

Google’s Gary Illyes confirmed at a recent search marketing conference that Google needs very few links, adding to the growing body of evidence that publishers need to focus on other factors. Gary tweeted confirmation that he indeed say those words.

Background Of Links For Ranking

Links were discovered in the late 1990’s to be a good signal for search engines to use for validating how authoritative a website is and then Google discovered soon after that anchor text could be used to provide semantic signals about what a webpage was about.

One of the most important research papers was Authoritative Sources in a Hyperlinked Environment by Jon M. Kleinberg, published around 1998 (link to research paper at the end of the article). The main discovery of this research paper is that there is too many web pages and there was no objective way to filter search results for quality in order to rank web pages for a subjective idea of relevance.

The author of the research paper discovered that links could be used as an objective filter for authoritativeness.

Kleinberg wrote:

Advertisement

“To provide effective search methods under these conditions, one needs a way to filter, from among a huge collection of relevant pages, a small set of the most “authoritative” or ‘definitive’ ones.”

This is the most influential research paper on links because it kick-started more research on ways to use links beyond as an authority metric but as a subjective metric for relevance.

Objective is something factual. Subjective is something that’s closer to an opinion. The founders of Google discovered how to use the subjective opinions of the Internet as a relevance metric for what to rank in the search results.

What Larry Page and Sergey Brin discovered and shared in their research paper (The Anatomy of a Large-Scale Hypertextual Web Search Engine – link at end of this article) was that it was possible to harness the power of anchor text to determine the subjective opinion of relevance from actual humans. It was essentially crowdsourcing the opinions of millions of website expressed through the link structure between each webpage.

What Did Gary Illyes Say About Links In 2024?

At a recent search conference in Bulgaria, Google’s Gary Illyes made a comment about how Google doesn’t really need that many links and how Google has made links less important.

Patrick Stox tweeted about what he heard at the search conference:

” ‘We need very few links to rank pages… Over the years we’ve made links less important.’ @methode #serpconf2024″

Google’s Gary Illyes tweeted a confirmation of that statement:

Advertisement

“I shouldn’t have said that… I definitely shouldn’t have said that”

Why Links Matter Less

The initial state of anchor text when Google first used links for ranking purposes was absolutely non-spammy, which is why it was so useful. Hyperlinks were primarily used as a way to send traffic from one website to another website.

But by 2004 or 2005 Google was using statistical analysis to detect manipulated links, then around 2004 “powered-by” links in website footers stopped passing anchor text value, and by 2006 links close to the words “advertising” stopped passing link value, links from directories stopped passing ranking value and by 2012 Google deployed a massive link algorithm called Penguin that destroyed the rankings of likely millions of websites, many of which were using guest posting.

The link signal eventually became so bad that Google decided in 2019 to selectively use nofollow links for ranking purposes. Google’s Gary Illyes confirmed that the change to nofollow was made because of the link signal.

Google Explicitly Confirms That Links Matter Less

In 2023 Google’s Gary Illyes shared at a PubCon Austin that links were not even in the top 3 of ranking factors. Then in March 2024, coinciding with the March 2024 Core Algorithm Update, Google updated their spam policies documentation to downplay the importance of links for ranking purposes.

Google March 2024 Core Update: 4 Changes To Link Signal

The documentation previously said:

Advertisement

“Google uses links as an important factor in determining the relevancy of web pages.”

The update to the documentation that mentioned links was updated to remove the word important.

Links are not just listed as just another factor:

“Google uses links as a factor in determining the relevancy of web pages.”

At the beginning of April Google’s John Mueller advised that there are more useful SEO activities to engage on than links.

Mueller explained:

“There are more important things for websites nowadays, and over-focusing on links will often result in you wasting your time doing things that don’t make your website better overall”

Finally, Gary Illyes explicitly said that Google needs very few links to rank webpages and confirmed it.

Why Google Doesn’t Need Links

The reason why Google doesn’t need many links is likely because of the extent of AI and natural language undertanding that Google uses in their algorithms. Google must be highly confident in its algorithm to be able to explicitly say that they don’t need it.

Way back when Google implemented the nofollow into the algorithm there were many link builders who sold comment spam links who continued to lie that comment spam still worked. As someone who started link building at the very beginning of modern SEO (I was the moderator of the link building forum at the #1 SEO forum of that time), I can say with confidence that links have stopped playing much of a role in rankings beginning several years ago, which is why I stopped about five or six years ago.

Read the research papers

Authoritative Sources in a Hyperlinked Environment – Jon M. Kleinberg (PDF)

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Featured Image by Shutterstock/RYO Alexandre

Advertisement



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS