Connect with us


Semantic Keyword Clustering For 10,000+ Keywords [With Script]



Semantic Keyword Clustering For 10,000+ Keywords [With Script]


Semantic keyword clustering can help take your keyword research to the next level.

In this article, you’ll learn how to use a Google Colaboratory sheet shared exclusively with Search Engine Journal readers.

This article will walk you through using the Google Colab sheet, a high-level view of how it works under the hood, and how to make adjustments to suit your needs.

But first, why cluster keywords at all?

Common Use Cases For Keyword Clustering

Here are a few use cases for clustering keywords.

Faster Keyword Research:

  • Filter out branded keywords or keywords with no commercial value.
  • Group related keywords together to create more in-depth articles.
  • Group related questions and answers together for FAQ creation.

Paid Search Campaigns:

  • Create negative keyword lists for Ads using large datasets faster – stop wasting money on junk keywords!
  • Group similar keywords into campaign ideas for Ads.

Here’s an example of the script clustering similar questions together, perfect for an in-depth article!

Screenshot from Microsoft Excel, February 2022

Issues With Earlier Versions Of This Tool

If you’ve been following my work on Twitter, you’ll know I’ve been experimenting with keyword clustering for a while now.

Earlier versions of this script were based on the excellent PolyFuzz library using TF-IDF matching.


While it got the job done, there were always some head-scratching clusters which I felt the original result could be improved on.

Words that shared a similar pattern of letters would be clustered even if they were unrelated semantically.

For example, it was unable to cluster words like “Bike” with “Bicycle”.

Earlier versions of the script also had other issues:

  • It didn’t work well in languages other than English.
  • It created a high number of groups that were unable to be clustered.
  • There wasn’t much control over how the clusters were created.
  • The script was limited to ~10,000 rows before it timed out due to a lack of resources.

Semantic Keyword Clustering Using Deep Learning Natural Language Processing (NLP)

Fast forward four months to the latest release which has been completely rewritten to utilize state-of-the-art, deep learning sentence embeddings.

Check out some of these awesome semantic clusters!

Notice that heated, thermal, and warm are contained within the same cluster of keywords?

excel sheet showing an example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Or how about, Wholesale and Bulk?

excel sheet showing another example of semantic keyword clusteringScreenshot from Microsoft Excel, February 2022

Dog and Dachshund, Xmas and Christmas?

excel sheet showing another example of semantic keyword clustering. Showing that Dachshund and dogs have been grouped together.Screenshot from Microsoft Excel, February 2022

It can even cluster keywords in over one hundred different languages!

excel sheet showing another example of semantic keyword clustering in FrenchScreenshot from Microsoft Excel, February 2022

Features Of The New Script Versus Earlier Iterations

In addition to semantic keyword grouping, the following improvements have been added to the latest version of this script.

  • Support for clustering 10,000+ keywords at once.
  • Reduced no cluster groups.
  • Ability to choose different pre-trained models (although the default model works fine!).
  • Ability to choose how closely related clusters should be.
  • Choice of the minimum number of keywords to use per cluster.
  • Automatic detection of character encoding and CSV delimiters.
  • Multi-lingual clustering.
  • Works with many common keyword exports out of the box. (Search Console Data, AdWords or third-party keyword tools like Ahrefs and Semrush).
  • Works with any CSV file with a column named “Keyword.”
  • Simple to use (The script works by inserting a new column called Cluster Name to any list of keywords uploaded).

How To Use The Script In Five Steps (Quick Start)

To get started, you will need to click this link, and then choose the option, Open in Colab as shown below.

How to Open Google Colab from GithubScreenshot from Google Colaboratory, February 2022

Change the Runtime type to GPU by selecting Runtime > Change Runtime Type.

Google Collab, How to change settings to use the GPUScreenshot from Google Colaboratory, February 2022

Select Runtime > Run all from the top navigation from within Google Colaboratory, (Or just press Ctrl+F9).

How to run all cell in Google ColabScreenshot from Google Colaboratory, February 2022

Upload a .csv file containing a column called “Keyword” when prompted.

How to upload a file using Google ColabScreenshot from Google Colaboratory, February 2022

Clustering should be fairly quick, but ultimately it depends on the number of keywords, and the model used.

Generally speaking, you should be good for 50,000 keywords.

If you see a Cuda Out of Memory Error, you’re trying to cluster too many keywords at the same time!


(It’s worth noting that this script can easily be adapter to run on a local machine without the confines of Google Colaboratory.)

The Script Output

The script will run and append clusters to your original file to a new column called Cluster Name.

Cluster names are assigned using the shortest length keyword in the cluster.

For example, the cluster name for the following group of keywords has been set as “alpaca socks” because that is the shortest keyword in the cluster.

Demonstration of the example output from the script showing alpaca socks have been grouped together Screenshot from Microsoft Excel, February 2022

Once clustering has been completed, a new file is automatically saved, with clustered appended in a new column to the original file.

How The Key Clustering Tool Works

This script is based upon the Fast Clustering algorithm and uses models which have been pre-trained at scale on large amounts of data.

This makes it easy to compute the semantic relationships between keywords using off-the-shelf models.

(You don’t have to be a data scientist to use it!)

In fact, whilst I’ve made it customizable for those who like to tinker and experiment, I’ve chosen some balanced defaults which should be reasonable for most people’s use cases.


Different models can be swapped in and out of the script depending on the requirements, (faster clustering, better multi-language support, better semantic performance, and so on).

After a lot of testing, I found the perfect balance of speed and accuracy using the all-MiniLM-L6-v2 transformer which provided a great balance between speed and accuracy.

If you prefer to use your own, you can just experiment, you can replace the existing pre-trained model with any of the models listed here or on the Hugging Face Model Hub.

Swapping In Pre-Trained Models

Swapping in models is as easy as replacing the variable with the name of your preferred transformer.

For example, you can change the default model all-miniLM-L6-v2 to all-mpnet-base-v2 by editing:

transformer = ‘all-miniLM-L6-v2’


transformer = ‘all-mpnet-base-v2

Here’s where you would edit it in the Google Colaboratory sheet.

How to choose a sentence transformer for keyword clusteringScreenshot from Google Colaboratory, February 2022

The Trade-off Between Cluster Accuracy And No Cluster Groups

A common complaint with previous iterations of this script is that it resulted in a high number of unclustered results.

Unfortunately, it will always be a balancing act between cluster accuracy versus the number of clusters.


A higher cluster accuracy setting will result in a higher number of unclustered results.

There are two variables that can directly influence the size and accuracy of all clusters:



cluster accuracy

I have set a default of 85 (/100) for cluster accuracy and a minimum cluster size of 2.

In testing, I found this to be the sweet spot, but feel free to experiment!

Here’s where to set those variables in the script.

How to set the minimum sentence size and keyword cluster accuracyScreenshot from Google Colaboratory, February 2022

That’s it! I hope this keyword clustering script is useful to your work.

More resources:

Featured Image: Graphic Grid/Shutterstock



Source link



How upskilling your paid advertising skills will tackle economic downturns



How upskilling your paid advertising skills will tackle economic downturns

30-second summary:

  • Marketing budgets are often the first to be slashed in a downturn – upskilling your existing team with digital marketing techniques can provide huge efficiencies and minimize the impact of cuts
  • Creating an upskilling program does not need to be expensive or time-consuming if a well-thought-out strategy is adopted and results are constantly measured
  • Nurturing your own in-house talent pool also increases business resilience, improves marketing innovation and creativity, and reduces reliance on third-party operators
  • Choosing the right skills for your team to acquire depends both on your immediate goals and long-term business strategy – done right you can steal a march on your competitors
  • Sarah Gilchriest, Global COO of Circus Street, discusses the key skills brands need to cultivate to stay competitive during an economic downturn

We’re entering what is likely to be a pretty tough global recession. As consumer sentiment worsens, brands will increasingly look at ways they can cut costs to protect their bottom line. Unfortunately, we all know that marketing is usually one of the first budgets to be slashed.

It is seemingly much easier to stop a campaign or give an agency notice than it is to sack a developer or reduce infrastructure costs. However, more often than not, cutting marketing is a false economy that worsens the impact of a downturn by slowing a company’s growth. So, is there a way for brands to instead maximize their digital marketing output while also freezing or reducing costs?

The answer may be found in upskilling.

Training while cutting costs?

Now, your first reaction may be that training programs are expensive luxuries that make little sense if your goal is to cut costs. There are a few things to unpack here –

  1. Size and scope of training matter. You can make an outsized impact by training one or two individuals who then share their knowledge with their wider team. The right strategy (which I’ll discuss further below) can lead to a highly targeted program that gives the most critical skills to those who will be best placed to use them immediately.
  2. Next, there are a lot of freely available supporting resources that can significantly reduce costs and help to embed learning.
  3. Finally, let’s put costs in perspective. The ROI on a well-executed training scheme pays for itself and the initial outlay pales in comparison to most other business functions. Put simply, you get a lot of bang for your buck. 

Why paid advertising skills?

Paid advertising makes a lot of sense to focus on for a number of reasons. Generally, compared to other marketing fields, paid advertising is characterized by the sheer diversity of skills and techniques needed to fully execute a campaign. It is incredibly fast-moving and often requires you to leverage a number of different tech platforms. Consequently, many brands outsource this functionality to a network of agencies and freelancers. Those that don’t usually rely on one or two individual ‘power users’ or worse, skills are haphazardly spread among a range of departments leading to bottlenecks and single points of failure.

As such, digital advertising is usually the prime area where efficiencies, greater innovation, and marketing effectiveness can occur via upskilling. It is where your business can do much more for less. 

Identifying the right skills

Getting the right skill mix is where the rubber meets the road. A mixture of creativity, data analysis, platform knowledge, development techniques, and marketing expertise are all needed. To get started the best approach is to fully understand what capability your team has in-house. The crucial element is to remember that a lot of ability might be hidden because it is not used on a day-to-day basis. You would be surprised at how quickly a business ‘forgets’ about the previous experiences of team members after they have been hired.


Auditing team skills should expand beyond the marketing department

You don’t know what gems are lurking in other areas of your business until you start to look. This is also the perfect opportunity to identify both the potential of your employees to acquire new skills and also their individual aspirations. It is much easier to upskill someone who has a professional and personal investment in learning that particular expertise. The audit itself does not need to be complex – a simple matrix that enables people to categorize their proficiency and outline the areas where they would like to develop will suffice.

When you know what you have to work with, then it’ll become much easier to define the best way forward. Deciding the best skill mix comes down to first working out how to fulfill your most immediate needs. For example, taking a costly service in-house, plugging a weakness – where a team member’s departure would severely hamper your ability to function, or obvious gaps in ability that prevent you from undertaking certain digital advertising activities.

Build on the compatibility between your employee’s aspirations and your commercial objectives

This is then overlaid by areas where your marketing output can most obviously be improved and your future aspirations in line with your commercial objectives. For example, if in the future you want to more heavily target users on particular social media platforms or ‘exotic’ platforms like IoT devices and digital boards. Perhaps you can see the financial benefits of adopting headless CMS tech and would like to put in place the skills needed to make that transition after the recession. Maybe you want your team to have the insight to tell you whether the Metaverse has any potential for your business.

This may sound complex but once you get started the hierarchy of skills you need more often than not becomes very obvious. Remember, one of upskilling’s great strengths is its flexibility – if your needs change or you feel you have chosen the wrong skills – it’s very easy to change track.

Getting started in a cost-efficient way

How you train your team is very much up to individual preferences – everyone learns in different ways. Speaking to your employees and specialists will enable you to build a tailored teaching structure. It can be a combination of in-house learning, online tutorials, accredited programs, or book learning. You do not have to go all in on a full program straight away. Piloting can remove a lot of the risk. Start small – one team or a handful of individuals from across your company – and continually assess the impact.

A mistake to avoid

A common mistake businesses make is they wait too long to get their team to use their new knowledge. This can hold up the process and damage ROI. The best way to embed new skills is to apply them. Ensure that your team has an opportunity to practice their newfound expertise on real initiatives. Then keep a close eye on your business metrics – including team and customer feedback – to determine the impact. Unlike many other departments, digital marketing can have very clear outputs. This will let you know quite quickly if it is working. From there, you can decide on how to roll out your training scheme. 

Marketing doesn’t end with the marketers

As I’ve mentioned, diversifying the skillset of your team builds resilience and promotes more innovation. The reason is simple, if you only have marketing skills in your marketing department, you are naturally limiting the number of people who can provide useful insights that fuel innovation. You reduce oversight and feedback loops, and your marketing output will suffer from a lack of outside perspectives. 

By making your teams multidisciplinary and cross-functional you can spread useful skills throughout your business. Customer service teams can learn the fundamentals of digital marketing, marketers know how to do the basic dev and data work to enable their day-to-day, and your data teams can think like marketers if they need to.


Preparing for the worst doesn’t mean losing capabilities

If the worst does happen and you do need to make cuts to your team, having key skills shared across your business means that the damage to core functions will be limited.

To finish – I should highlight that much of what I’ve discussed applies equally to business owners as it does to individual freelancers. A downturn can be a daunting prospect if you are a sole trader. Upskilling can be one of the best ways to increase your value to clients now and future-proof your business.

If you have seen business drop off, the time you now have available could be best dedicated to more training. This may sound obvious, but a mistake many people make in their careers is failing to adapt to how demand for skills can quickly change or technology can come along that makes them obsolete. Adding more skill strings to you and your company’s bow is never a bad thing.

Sarah Gilchriest is the Global COO of Circus Street.

Subscribe to the Search Engine Watch newsletter for insights on SEO, the search landscape, search marketing, digital marketing, leadership, podcasts, and more.

Join the conversation with us on LinkedIn and Twitter.


Source link

Continue Reading

Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address