Connect with us


Semantic Keyword Clustering For 10,000+ Keywords [With Script]



Semantic Keyword Clustering For 10,000+ Keywords [With Script]

Semantic keyword clustering can help take your keyword research to the next level.

In this article, you’ll learn how to use a Google Colaboratory sheet shared exclusively with Search Engine Journal readers.

This article will walk you through using the Google Colab sheet, a high-level view of how it works under the hood, and how to make adjustments to suit your needs.

But first, why cluster keywords at all?

Common Use Cases For Keyword Clustering

Here are a few use cases for clustering keywords.

Faster Keyword Research:

  • Filter out branded keywords or keywords with no commercial value.
  • Group related keywords together to create more in-depth articles.
  • Group related questions and answers together for FAQ creation.

Paid Search Campaigns:

  • Create negative keyword lists for Ads using large datasets faster – stop wasting money on junk keywords!
  • Group similar keywords into campaign ideas for Ads.

Here’s an example of the script clustering similar questions together, perfect for an in-depth article!

Screenshot from Microsoft Excel, February 2022

Issues With Earlier Versions Of This Tool

If you’ve been following my work on Twitter, you’ll know I’ve been experimenting with keyword clustering for a while now.

Earlier versions of this script were based on the excellent PolyFuzz library using TF-IDF matching.


While it got the job done, there were always some head-scratching clusters which I felt the original result could be improved on.

Words that shared a similar pattern of letters would be clustered even if they were unrelated semantically.

For example, it was unable to cluster words like “Bike” with “Bicycle”.

Earlier versions of the script also had other issues:

  • It didn’t work well in languages other than English.
  • It created a high number of groups that were unable to be clustered.
  • There wasn’t much control over how the clusters were created.
  • The script was limited to ~10,000 rows before it timed out due to a lack of resources.
See also  How to optimize SEO titles with popular keywords

Semantic Keyword Clustering Using Deep Learning Natural Language Processing (NLP)

Fast forward four months to the latest release which has been completely rewritten to utilize state-of-the-art, deep learning sentence embeddings.

Check out some of these awesome semantic clusters!

Notice that heated, thermal, and warm are contained within the same cluster of keywords?

excel sheet showing an example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Or how about, Wholesale and Bulk?

excel sheet showing another example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Dog and Dachshund, Xmas and Christmas?

excel sheet showing another example of semantic keyword clustering. Showing that Dachshund and dogs have been grouped together.Screenshot from Microsoft Excel, February 2022

It can even cluster keywords in over one hundred different languages!

excel sheet showing another example of semantic keyword clustering in FrenchScreenshot from Microsoft Excel, February 2022

Features Of The New Script Versus Earlier Iterations

In addition to semantic keyword grouping, the following improvements have been added to the latest version of this script.

  • Support for clustering 10,000+ keywords at once.
  • Reduced no cluster groups.
  • Ability to choose different pre-trained models (although the default model works fine!).
  • Ability to choose how closely related clusters should be.
  • Choice of the minimum number of keywords to use per cluster.
  • Automatic detection of character encoding and CSV delimiters.
  • Multi-lingual clustering.
  • Works with many common keyword exports out of the box. (Search Console Data, AdWords or third-party keyword tools like Ahrefs and Semrush).
  • Works with any CSV file with a column named “Keyword.”
  • Simple to use (The script works by inserting a new column called Cluster Name to any list of keywords uploaded).

How To Use The Script In Five Steps (Quick Start)

To get started, you will need to click this link, and then choose the option, Open in Colab as shown below.

How to Open Google Colab from GithubScreenshot from Google Colaboratory, February 2022

Change the Runtime type to GPU by selecting Runtime > Change Runtime Type.

Google Collab, How to change settings to use the GPUScreenshot from Google Colaboratory, February 2022

Select Runtime > Run all from the top navigation from within Google Colaboratory, (Or just press Ctrl+F9).

How to run all cell in Google ColabScreenshot from Google Colaboratory, February 2022

Upload a .csv file containing a column called “Keyword” when prompted.

How to upload a file using Google ColabScreenshot from Google Colaboratory, February 2022

Clustering should be fairly quick, but ultimately it depends on the number of keywords, and the model used.

See also  Free Google Ads Script To Dynamically Change Target ROAS

Generally speaking, you should be good for 50,000 keywords.

If you see a Cuda Out of Memory Error, you’re trying to cluster too many keywords at the same time!


(It’s worth noting that this script can easily be adapter to run on a local machine without the confines of Google Colaboratory.)

The Script Output

The script will run and append clusters to your original file to a new column called Cluster Name.

Cluster names are assigned using the shortest length keyword in the cluster.

For example, the cluster name for the following group of keywords has been set as “alpaca socks” because that is the shortest keyword in the cluster.

Demonstration of the example output from the script showing alpaca socks have been grouped together Screenshot from Microsoft Excel, February 2022

Once clustering has been completed, a new file is automatically saved, with clustered appended in a new column to the original file.

How The Key Clustering Tool Works

This script is based upon the Fast Clustering algorithm and uses models which have been pre-trained at scale on large amounts of data.

This makes it easy to compute the semantic relationships between keywords using off-the-shelf models.

(You don’t have to be a data scientist to use it!)

In fact, whilst I’ve made it customizable for those who like to tinker and experiment, I’ve chosen some balanced defaults which should be reasonable for most people’s use cases.


Different models can be swapped in and out of the script depending on the requirements, (faster clustering, better multi-language support, better semantic performance, and so on).

After a lot of testing, I found the perfect balance of speed and accuracy using the all-MiniLM-L6-v2 transformer which provided a great balance between speed and accuracy.

See also  Seven Alternative Keyword Research Tactics to Uncover More Ranking Opportunities

If you prefer to use your own, you can just experiment, you can replace the existing pre-trained model with any of the models listed here or on the Hugging Face Model Hub.

Swapping In Pre-Trained Models

Swapping in models is as easy as replacing the variable with the name of your preferred transformer.

For example, you can change the default model all-miniLM-L6-v2 to all-mpnet-base-v2 by editing:

transformer = ‘all-miniLM-L6-v2’


transformer = ‘all-mpnet-base-v2

Here’s where you would edit it in the Google Colaboratory sheet.

How to choose a sentence transformer for keyword clusteringScreenshot from Google Colaboratory, February 2022

The Trade-off Between Cluster Accuracy And No Cluster Groups

A common complaint with previous iterations of this script is that it resulted in a high number of unclustered results.

Unfortunately, it will always be a balancing act between cluster accuracy versus the number of clusters.


A higher cluster accuracy setting will result in a higher number of unclustered results.

There are two variables that can directly influence the size and accuracy of all clusters:



cluster accuracy

I have set a default of 85 (/100) for cluster accuracy and a minimum cluster size of 2.

In testing, I found this to be the sweet spot, but feel free to experiment!

Here’s where to set those variables in the script.

How to set the minimum sentence size and keyword cluster accuracyScreenshot from Google Colaboratory, February 2022

That’s it! I hope this keyword clustering script is useful to your work.

More resources:

Featured Image: Graphic Grid/Shutterstock


Source link



Search Engine Journal Promotes Miranda Miller To Senior Managing Editor



Search Engine Journal Promotes Miranda Miller To Senior Managing Editor

It is with tremendous pride and excitement that I announce the promotion of Miranda Miller, an editorial and content strategy champion, to Senior Managing Editor.

While this promotion actually happened in January, having recently joined SEJ myself, I wanted to celebrate Miranda’s growth and leadership within the organization.

In this role, her areas of ownership will include core elements of SEJ’s editorial operations in this advanced role, including its rich educational and evergreen content.

“The word that comes to mind is blossom,” said Jenise Uehara, CEO of SEJ’s parent company Alpha Brand Media.

”Miranda dove into organizational management, cross-department collaboration, and business process design, while also tackling inefficiencies and challenges.”

An exceptional writer and editor, Miranda also brings considerable experience and expertise in content strategy.

Over the past several months, Miranda has truly shone as an organizational leader, working to grow SEJ’s editorial blueprint, talent, and operations exponentially.


“And all the while, Miranda somehow kept the publishing crank turning out exceptional content and breaking news,” Jenise added.

“I’ve been thrilled to watch Miranda so quickly engage and take on a meatier leadership role.”

Having followed her work since joining SEJ, I can personally speak to Miranda’s diversified wealth of knowledge to the editorial team – not to mention the publication at large.

“I joined SEJ early in 2021 to help lead the editorial team in a period of great growth and opportunity,” Miranda said.

“We’ve been able to increase expert and educational content production by over 60% even while introducing a data-driven approach to content strategy and optimization,” she added.

See also  Google Drops Word Count & Keyword Density Slide From Digital Marketing Certification Coursework

This year, we’ll publish 50% more ebooks than in 2021, and our contributing author program has grown to over 130 digital marketing and SEO experts with her guidance.

Prior to coming in-house, Miranda spent over 15 years leading her own marketing agency which served clients as wide-ranging as the world’s leading polar expeditions company, fintech and app startups, Fortune 100 companies, and several government agencies.

She’s been a prolific ghostwriter for brands and executives in SEO, tech, and finance, and has authored thousands of articles for clients that have appeared in the best-known B2B publications, technical journals, and mainstream media.


As Editor-in-Chief, I’m beyond excited to watch Miranda’s leadership undoubtedly contribute to SEJ’s ongoing substantial growth.

SEJ’s brand and content offerings continue to evolve, to serve an audience of marketers and business leaders – and there has never been a more inspiring or energizing time to cultivate, promote, and be part of this remarkably talented team.

“Content quality has always been my number one priority, and so it’s refreshing to see that this is a shared value across the SEJ team,” Miranda said.

“I love to see the new ebook formats and article types we’re creating now and am really looking forward to continuing to innovate and teach, to bring the most useful and helpful educational content possible to the SEJ audience.”

Miranda is a part-time digital nomad and runs a location-independent content studio for enterprise organizations.

Three to four months of the year, she works in coworking spaces and cafes around Europe, Latin America, and the Caribbean.

See also  Local SEO: The Complete Guide

“But not at the beach,” Miranda adds. “That’s a myth. Ever had sand in your laptop? No bueno.”

At her home base on Georgian Bay in Canada, she’s the wife of a talented chef; mother bear to two teens, two Shepherds, and three budgies; and a patron of the local live music and literary scenes.


She’s also a huge fan of adrienne maree brown, Lizzo, Phoebe Waller-Bridge, Margaret Atwood, AOC – and hockey.

Featured Image: Miranda Miller


if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
fbq(‘dataProcessingOptions’, []);

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘sej-miranda-miller-sr-managing-editor’,
content_category: ‘careers-education news’

Source link

Continue Reading

Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address