Ta kontakt med oss


Semantic Keyword Clustering For 10,000+ Keywords [With Script]


Semantic Keyword Clustering For 10,000+ Keywords [With Script]

Semantic keyword clustering can help take your keyword research to the next level.

In this article, you’ll learn how to use a Google Colaboratory sheet shared exclusively with Search Engine Journal readers.

This article will walk you through using the Google Colab sheet, a high-level view of how it works under the hood, and how to make adjustments to suit your needs.

But first, why cluster keywords at all?

Common Use Cases For Keyword Clustering

Here are a few use cases for clustering keywords.

Faster Keyword Research:

  • Filter out branded keywords or keywords with no commercial value.
  • Group related keywords together to create more in-depth articles.
  • Group related questions and answers together for FAQ creation.

Paid Search Campaigns:

  • Skapa negative keyword lists for Ads using large datasets faster – stop wasting money on junk keywords!
  • Group similar keywords into campaign ideas for Ads.

Here’s an example of the script clustering similar questions together, perfect for an in-depth article!

Screenshot from Microsoft Excel, February 2022

Issues With Earlier Versions Of This Tool

If you’ve been following my work on Twitter, you’ll know I’ve been experimenting with keyword clustering for a while now.

Earlier versions of this script were based on the excellent PolyFuzz library using TF-IDF matching.

While it got the job done, there were always some head-scratching clusters which I felt the original result could be improved on.

Words that shared a similar pattern of letters would be clustered even if they were unrelated semantically.

For example, it was unable to cluster words like “Bike” with “Bicycle”.

Earlier versions of the script also had other issues:

  • It didn’t work well in languages other than English.
  • It created a high number of groups that were unable to be clustered.
  • There wasn’t much control over how the clusters were created.
  • The script was limited to ~10,000 rows before it timed out due to a lack of resources.

Semantic Keyword Clustering Using Deep Learning Natural Language Processing (NLP)

Fast forward four months to the latest release which has been completely rewritten to utilize state-of-the-art, deep learning sentence embeddings.

Check out some of these awesome semantic clusters!

Notice that heated, thermal, and warm are contained within the same cluster of keywords?

excel sheet showing an example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Or how about, Wholesale and Bulk?

excel sheet showing another example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Dog and Dachshund, Xmas and Christmas?

excel sheet showing another example of semantic keyword clustering. Showing that Dachshund and dogs have been grouped together.Screenshot from Microsoft Excel, February 2022

It can even cluster keywords in over one hundred different languages!

excel sheet showing another example of semantic keyword clustering in FrenchScreenshot from Microsoft Excel, February 2022

Features Of The New Script Versus Earlier Iterations

Dessutom semantic keyword grouping, the following improvements have been added to the latest version of this script.

  • Support for clustering 10,000+ keywords at once.
  • Reduced no cluster groups.
  • Ability to choose different pre-trained models (although the default model works fine!).
  • Ability to choose how closely related clusters should be.
  • Choice of the minimum number of keywords to use per cluster.
  • Automatic detection of character encoding and CSV delimiters.
  • Multi-lingual clustering.
  • Works with many common keyword exports out of the box. (Search Console Data, AdWords or third-party keyword tools like Ahrefs and Semrush).
  • Works with any CSV file with a column named “Keyword.”
  • Simple to use (The script works by inserting a new column called Cluster Name to any list of keywords uploaded).

How To Use The Script In Five Steps (Quick Start)

To get started, you will need to click this link, and then choose the option, Open in Colab as shown below.

How to Open Google Colab from GithubScreenshot from Google Colaboratory, February 2022

Change the Runtime type to GPU by selecting Runtime > Change Runtime Type.

Google Collab, How to change settings to use the GPUScreenshot from Google Colaboratory, February 2022

Välj Runtime > Springa all from the top navigation from within Google Colaboratory, (Or just press Ctrl+F9).

How to run all cell in Google ColabScreenshot from Google Colaboratory, February 2022

Upload a .csv file containing a column called “Keyword” when prompted.

How to upload a file using Google ColabScreenshot from Google Colaboratory, February 2022

Clustering should be fairly quick, but ultimately it depends on the number of keywords, and the model used.

Generally speaking, you should be good for 50,000 keywords.

If you see a Cuda Out of Memory Error, you’re trying to cluster too many keywords at the same time!

(It’s worth noting that this script can easily be adapter to run on a local machine without the confines of Google Colaboratory.)

The Script Output

The script will run and append clusters to your original file to a new column called Cluster Name.

Cluster names are assigned using the shortest length keyword in the cluster.

For example, the cluster name for the following group of keywords has been set as “alpaca socks” because that is the shortest keyword in the cluster.

Demonstration of the example output from the script showing alpaca socks have been grouped together Screenshot from Microsoft Excel, February 2022

Once clustering has been completed, a new file is automatically saved, with clustered appended in a new column to the original file.

How The Key Clustering Tool Works

This script is based upon the Fast Clustering algorithm and uses models which have been pre-trained at scale on large amounts of data.

This makes it easy to compute the semantic relationships between keywords using off-the-shelf models.

(You don’t have to be a data scientist to use it!)

In fact, whilst I’ve made it customizable for those who like to tinker and experiment, I’ve chosen some balanced defaults which should be reasonable for most people’s use cases.

Different models can be swapped in and out of the script depending on the requirements, (faster clustering, better multi-language support, better semantic performance, and so on).

After a lot of testing, I found the perfect balance of speed and accuracy using the all-MiniLM-L6-v2 transformer which provided a great balance between speed and accuracy.

If you prefer to use your own, you can just experiment, you can replace the existing pre-trained model with any of the models listed här or on the Hugging Face Model Hub.

Swapping In Pre-Trained Models

Swapping in models is as easy as replacing the variable with the name of your preferred transformer.

For example, you can change the default model all-miniLM-L6-v2 to all-mpnet-base-v2 by editing:

transformer = ‘all-miniLM-L6-v2’


transformer = ‘all-mpnet-base-v2'

Here’s where you would edit it in the Google Colaboratory sheet.

How to choose a sentence transformer for keyword clusteringScreenshot from Google Colaboratory, February 2022

The Trade-off Between Cluster Accuracy And No Cluster Groups

A common complaint with previous iterations of this script is that it resulted in a high number of unclustered results.

Unfortunately, it will always be a balancing act between cluster accuracy versus the number of clusters.

A higher cluster accuracy setting will result in a higher number of unclustered results.

There are two variables that can directly influence the size and accuracy of all clusters:



cluster accuracy

I have set a default of 85 (/100) for cluster accuracy and a minimum cluster size of 2.

In testing, I found this to be the sweet spot, but feel free to experiment!

Here’s where to set those variables in the script.

How to set the minimum sentence size and keyword cluster accuracyScreenshot from Google Colaboratory, February 2022

That’s it! I hope this keyword clustering script is useful to your work.

Fler resurser:

Featured Image: Graphic Grid/Shutterstock



Microsoft gör tre förutsägelser för PPC-trender under det nya året


Microsoft Makes 3 Predictions For PPC Trends In The New Year

Microsoft gör tre förutsägelser för produktkategorier som kommer att öka annonsklick under det nya året och ger råd om hur du kan optimera kampanjer därefter.

Enligt en global studie som drivs av Opeepl är folks nummer ett mest populära nyår att bli friskare, vilket de planerar att åstadkomma genom kost och träning.

När vi tittar på kommande hälsotrender delar Microsoft Advertising med sig av sätt att optimera kampanjer för de tre viktigaste produktkategorierna.

1. 'Ekologisk mat' Upp 20%

Microsoft Advertising förutspår att klick på annonser för ekologisk mat kommer att öka under veckan den 14 januari, vilket resulterar i 20%-tillväxt från samma vecka i december.

För att dra fördel av denna trend föreslår Microsoft Advertising följande:

"Rikta in dig på användare som söker efter hälsosamma, näringsrika matalternativ i januari med In-market Audiences. Våra interna prognosdata tyder på att antalet klick kommer att nå sin topp under vintern den 14 januari, så även om du bör öka din budget efter semestern, se till att du inte tar slut mitt i månaden.”

2. "Sportskläder" uppe i början av december till januari

Microsoft Advertising förutspår att sökningar efter sportkläder kommer att börja öka i början av december och fortsätta till och med januari.

I ett blogginlägg delar Microsoft Advertising följande råd:

"Använd Shopping-kampanjer för att visa upp dina sport- och träningskläder i slutet av november och början av december under julhandeln. Microsofts interna data uppskattar att konsumenterna mest kommer att leta efter utrustning mellan veckorna och den 26 november och den 3 december, men aktiviteten kommer att förbli hög fram till januari.”

3. "Fitness & Nutrition"-sökningar kommer i vågor

Föga överraskande väntas sökningar efter fitness och näring öka under det nya året.

Microsoft Advertising rekommenderar dock en "alltid-på"-strategi för att rikta in sig på denna kategori, eftersom sökintresset kommer att öka flera gånger under året.

"Genom att använda 2021-data som en jämförelse för vad vi kan förvänta oss aktivitetsmässigt under nästa år, kan vi anta att klick för näring och fitness kommer att nå en topp i januari, maj, juli och oktober. Överväg ett tillvägagångssätt som alltid är på eftersom målgruppsannonser visas för att leda användare ner i tratten till söktaktik."

Källa: Microsoft Advertising

Utvald bild: SeaStudio/Shutterstock


Fortsätt läsa

Prenumerera på vårt nyhetsbrev
Vi lovar att inte spamma dig. Avsluta prenumerationen när som helst.
Ogiltig e-postadress