Connect with us

SEO

How to Build a Keyword Strategy [Free Template]

Published

on

How to Build a Keyword Strategy [Free Template]

Building a keyword strategy is the process of deciding what keywords to target and prioritize in organic search.

There are many ways to do it, but all methods pretty much boil down to finding keywords that: 

  1. Potential customers are searching for.
  2. Have value for your business.
  3. Are within your wheelhouse to rank for.
The best keywords have traffic, business, and ranking potential

Let’s go through how to do this in four simple steps.

1. Find keywords with traffic potential

There’s no point in targeting keywords that nobody types into Google because they won’t send you traffic even if you rank #1. So the first step is to find keywords that potential customers are searching for.

Let’s look at a couple of ways to do this.

A. See which of your competitors’ pages get the most traffic

If your competitor gets a lot of organic traffic to a page, the keyword it’s targeting must have traffic potential. And because they’re a competitor, such keywords are probably ones your customers are searching for too.

Here’s how to find which of your competitors’ pages get the most search traffic: 

  1. Go to Ahrefs’ Site Explorer
  2. Enter a competitor’s domain
  3. Go to the Top pages report

For example, if you are looking for keywords for an online computer parts store, you may enter newegg.com. In the report, you’ll see its top pages by estimated organic traffic and the keyword sending the most traffic to each page.

Top pages by estimated organic search traffic for newegg.com, via Ahrefs' Site Explorer

Recommendation

If you see a lot of brand mentions in the “Top keyword” column, filter out keywords containing those words. 

Excluding branded keyword in the Organic keywords report in Site Explorer

Make a note of any that your customers may also be searching for. 

Here are a few for our hypothetical computer parts store:

Examples of keywords with traffic potential

Note that we didn’t highlight “gaming desk” or “gaming pc” because our store sells computer parts, not accessories or ready-built computers. There’s no point in targeting these keywords, as the people searching for them aren’t our customers.

B. Use a keyword research tool

Keyword research tools are big keyword databases that you can search and filter. 

Here’s how to use our keyword tool to find keywords with traffic potential:

  1. Go to Ahrefs’ Keywords Explorer
  2. Enter a few broad keywords related to your site (these are known as your “seeds”)
  3. Go to the Matching terms report
  4. Filter for keywords with Traffic Potential (TP)

For example, for a computer parts store, you can enter seed keywords like pc, computer, computers, motherboard, motherboards, amd, and intel. Then you’ll filter the Matching terms report for keywords with Traffic Potential.

Filtering for keywords with Traffic Potential in Ahrefs' Keywords Explorer

Recommendation

Traffic Potential is the estimated monthly organic search traffic to the top-ranking page for a keyword. It tends to be a more reliable estimate of a keyword’s true traffic potential than search volume because pages tend to rank for many keywords, not just one. Learn more here

Now it’s just a case of eyeballing the report for keywords potential customers are searching for.

Examples of keywords customers of a computer parts store would be searching for

2. Check their value for your business

Each keyword your potential customers are searching for has some value for your business. But some have more value than others.

Take these three keywords, for example:

  • graphics card” – Searchers are shopping around. They’re nowhere near ready to buy. 
  • b550 vs x570 motherboard” Searchers have done some shopping around and narrowed down their options. They’re almost ready to buy. 
  • amd ryzen 5 3600” – Searchers have decided what they want. They’re ready to buy. 

Given that some of these searchers are closer to buying a computer part than others, some keywords arguably have more “business potential” for a computer parts store. These are the ones you should prioritize in your keyword strategy.

Here’s a quick cheat sheet for scoring the “business potential” of keywords:

How to score a keyword's business potential

Sidenote.

This is based on the scoring system we came up with when assessing keywords for our blog. Read more about this in our guide to keyword research.

Just keep in mind that the way you score a keyword may differ from how someone else scores it. It depends on how valuable it is for your business. For example, if you don’t sell the “amd ryzen 5 3600,” that keyword has a “business potential” score of 0 for you—not 3.

3. Check ranking difficulty

Keywords vary in ranking difficulty for a number of reasons. This doesn’t mean you should avoid those that are harder to rank for than others, but it’s important to take ranking difficulty into account when building your keyword strategy.

Here are four things we recommend you look at to assess ranking difficulty:

Backlinks

Backlinks are one of Google’s top ranking factors. The more high-quality backlinks the current top-ranking pages have, the harder it’ll be to compete with them. 

For a rough idea of how many backlinks you’ll need to crack the top 10, check the Keyword Difficulty (KD) score in Ahrefs’ Keywords Explorer. This is based on the number of links to the top-ranking pages.

Example of a keyword with medium keyword difficulty in Ahrefs' Keywords Explorer

For a more thorough assessment, scroll to the SERP overview and check the “Domains” column to see the number of linking websites to each page. 

Linking domains to the top-ranking pages for "micro atx case"

Just keep in mind that these numbers only tell you the quantity of links to each page. To understand link quality, you’ll have to review each page’s backlink profile. You can get to this by clicking on the number in the “Backlinks” column. 

Backlinks report for one of the top-ranking pages for "micro atx case," via Ahrefs' Site Explorer

Learn more: How to Do a Basic Backlink Audit

Authority

Many SEOs believe that popular websites have an easier time ranking on Google. For that reason, most take a website authority metric like Domain Rating (DR) into account when assessing a keyword’s ranking difficulty. 

Google representatives have said many times that Google doesn’t evaluate a site’s authority. But there are a couple of ways the so-called authority of a site could indirectly contribute to rankings.

  1. Internal links – High-DR sites tend to have more high-authority pages, and internal links from those pages may help other pages to rank higher.
  2. Familiar brands – Searchers likely want to see household names for some queries. If that’s not you, you may have a harder time ranking. 

If you are in the camp that thinks Google takes site authority into account or just want to err on the safe side, check the top-ranking pages’ DR scores in Keywords Explorer. If they’re all much higher than your site’s DR, you may want to prioritize other keywords.

Domain Rating (DR) of the top-ranking sites for "3090 graphics card"

Recommendation

To find your site’s DR score, enter your domain into Site Explorer.

Domain Rating (DR) in Ahrefs' Site Explorer

Search intent

People are looking for something specific when they search. This is known as their search intent. As Google wants to give searchers what they want, you’ll struggle to rank for keywords unless you can create content that aligns with intent. 

For example, people searching for “backlink checker” are clearly looking for a free tool. We know this because all of the top-ranking results are free tools.

People searching for "backlink checker" are looking for a free tool

To stand virtually any chance of ranking for this keyword, you’ll need to create a free tool. Unless you have a backlink database like ours, that will be almost impossible. 

For that reason, we recommend analyzing the top-ranking results for the three Cs of search intent to figure out a keyword’s ranking difficulty for you.

  1. Content type – Are they blog posts, landing pages, product pages, or something else?
  2. Content format – Are they listicles, how-tos, recipes, tools, or something else?
  3. Content angle – Is there a dominant selling point, like how easy it is?

Learn more: What Is Search Intent? A Complete Guide for Beginners

Quality

If the top-ranking pages for your keyword are high quality, it’ll take more time and effort to compete. 

For example, the folks ranking #1 for “best air purifier” tested 47 air purifiers over eight years to make their recommendation. It’s going to cost an awful lot of time, effort, and money to compete on content quality.

It would be hard to compete with the top-ranking page for "best air purifier" on content

Compare this to the top results for many other queries that have no background information on how recommendations were chosen. It’ll likely be much easier to beat these on content quality than the former. 

Based on your assessment of the four attributes above, you can give keywords a “ranking potential” score like so:

How to score a keyword's ranking potential

Creating a keyword strategy from this process means combining everything into one document. You can use our free template for this. It’s a simple spreadsheet with the following data points and conditional formatting:

  • Keyword
  • Traffic Potential (TP)
  • Business Potential (BP)
  • Ranking Potential (RP)
Example keyword strategy

This strategy document lets you see the most promising keywords at a glance.

For example, if we assess a few keywords for our hypothetical computer parts store, the keyword “amd ryzen 5 3600” has high TP, BP, and RP. So the row is all green.

Example of a good keyword to prioritize

On the other hand, “how to install ram on pc” has high TP but low BP and RP. So the row is mostly red. 

Example of a not-so-good keyword

Your keyword strategy from here is simple: prioritize targeting keywords with the most traffic, business, and ranking potential. 

Keep learning

If you want to learn more about finding, prioritizing, and ranking for keywords, read these:



Source link

SEO

Essential Functions For SEO Data Analysis

Published

on

Essential Functions For SEO Data Analysis

Learning to code, whether with PythonJavaScript, or another programming language, has a whole host of benefits, including the ability to work with larger datasets and automate repetitive tasks.

But despite the benefits, many SEO professionals are yet to make the transition – and I completely understand why! It isn’t an essential skill for SEO, and we’re all busy people.

If you’re pressed for time, and you already know how to accomplish a task within Excel or Google Sheets, then changing tack can feel like reinventing the wheel.

When I first started coding, I initially only used Python for tasks that I couldn’t accomplish in Excel – and it’s taken several years to get to the point where it’s my defacto choice for data processing.

Looking back, I’m incredibly glad that I persisted, but at times it was a frustrating experience, with many an hour spent scanning threads on Stack Overflow.

This post is designed to spare other SEO pros the same fate.

Within it, we’ll cover the Python equivalents of the most commonly used Excel formulas and features for SEO data analysis – all of which are available within a Google Colab notebook linked in the summary.

Specifically, you’ll learn the equivalents of:

  • LEN.
  • Drop Duplicates.
  • Text to Columns.
  • SEARCH/FIND.
  • CONCATENATE.
  • Find and Replace.
  • LEFT/MID/RIGHT.
  • IF.
  • IFS.
  • VLOOKUP.
  • COUNTIF/SUMIF/AVERAGEIF.
  • Pivot Tables.

Amazingly, to accomplish all of this, we’ll primarily be using a singular library – Pandas – with a little help in places from its big brother, NumPy.

Prerequisites

For the sake of brevity, there are a few things we won’t be covering today, including:

  • Installing Python.
  • Basic Pandas, like importing CSVs, filtering, and previewing dataframes.

If you’re unsure about any of this, then Hamlet’s guide on Python data analysis for SEO is the perfect primer.

Now, without further ado, let’s jump in.

LEN

LEN provides a count of the number of characters within a string of text.

For SEO specifically, a common use case is to measure the length of title tags or meta descriptions to determine whether they’ll be truncated in search results.

Within Excel, if we wanted to count the second cell of column A, we’d enter:

=LEN(A2)
Screenshot from Microsoft Excel, November 2022

Python isn’t too dissimilar, as we can rely on the inbuilt len function, which can be combined with Pandas’ loc[] to access a specific row of data within a column:

len(df['Title'].loc[0])

In this example, we’re getting the length of the first row in the “Title” column of our dataframe.

len function python
Screenshot of VS Code, November, 2022

Finding the length of a cell isn’t that useful for SEO, though. Normally, we’d want to apply a function to an entire column!

In Excel, this would be achieved by selecting the formula cell on the bottom right-hand corner and either dragging it down or double-clicking.

When working with a Pandas dataframe, we can use str.len to calculate the length of rows within a series, then store the results in a new column:

df['Length'] = df['Title'].str.len()

Str.len is a ‘vectorized’ operation, which is designed to be applied simultaneously to a series of values. We’ll use these operations extensively throughout this article, as they almost universally end up being faster than a loop.

Another common application of LEN is to combine it with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1

In Pandas, we can achieve this by combining the str.split and str.len functions together:

df['No. Words'] = df['Title'].str.split().str.len()

We’ll cover str.split in more detail later, but essentially, what we’re doing is splitting our data based upon whitespaces within the string, then counting the number of component parts.

word count PythonScreenshot from VS Code, November 2022

Dropping Duplicates

Excel’s ‘Remove Duplicates’ feature provides an easy way to remove duplicate values within a dataset, either by deleting entirely duplicate rows (when all columns are selected) or removing rows with the same values in specific columns.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

In Pandas, this functionality is provided by drop_duplicates.

To drop duplicate rows within a dataframe type:

df.drop_duplicates(inplace=True)

To drop rows based on duplicates within a singular column, include the subset parameter:

df.drop_duplicates(subset="column", inplace=True)

Or specify multiple columns within a list:

df.drop_duplicates(subset=['column','column2'], inplace=True)

One addition above that’s worth calling out is the presence of the inplace parameter. Including inplace=True allows us to overwrite our existing dataframe without needing to create a new one.

There are, of course, times when we want to preserve our raw data. In this case, we can assign our deduped dataframe to a different variable:

df2 = df.drop_duplicates(subset="column")

Text To Columns

Another everyday essential, the ‘text to columns’ feature can be used to split a text string based on a delimiter, such as a slash, comma, or whitespace.

As an example, splitting a URL into its domain and individual subfolders.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

When dealing with a dataframe, we can use the str.split function, which creates a list for each entry within a series. This can be converted into multiple columns by setting the expand parameter to True:

df['URL'].str.split(pat="/", expand=True)
str split PythonScreenshot from VS Code, November 2022

As is often the case, our URLs in the image above have been broken up into inconsistent columns, because they don’t feature the same number of folders.

This can make things tricky when we want to save our data within an existing dataframe.

Specifying the n parameter limits the number of splits, allowing us to create a specific number of columns:

df[['Domain', 'Folder1', 'Folder2', 'Folder3']] = df['URL'].str.split(pat="/", expand=True, n=3)

Another option is to use pop to remove your column from the dataframe, perform the split, and then re-add it with the join function:

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

Duplicating the URL to a new column before the split allows us to preserve the full URL. We can then rename the new columns:🐆

df['Split'] = df['URL']

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

df.rename(columns = {0:'Domain', 1:'Folder1', 2:'Folder2', 3:'Folder3', 4:'Parameter'}, inplace=True)
Split pop join functions PythonScreenshot from VS Code, November 2022

CONCATENATE

The CONCAT function allows users to combine multiple strings of text, such as when generating a list of keywords by adding different modifiers.

In this case, we’re adding “mens” and whitespace to column A’s list of product types:

=CONCAT($F$1," ",A2)
concat Excel
Screenshot from Microsoft Excel, November 2022

Assuming we’re dealing with strings, the same can be achieved in Python using the arithmetic operator:

df['Combined] = 'mens' + ' ' + df['Keyword']

Or specify multiple columns of data:

df['Combined'] = df['Subdomain'] + df['URL']
concat PythonScreenshot from VS Code, November 2022

Pandas has a dedicated concat function, but this is more useful when trying to combine multiple dataframes with the same columns.

For instance, if we had multiple exports from our favorite link analysis tool:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

SEARCH/FIND

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These commands are commonly combined with ISNUMBER to create a Boolean column that helps filter down a dataset, which can be extremely helpful when performing tasks like log file analysis, as explained in this guide. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)
isnumber search ExcelScreenshot from Microsoft Excel, November 2022

The difference between SEARCH and FIND is that find is case-sensitive.

The equivalent Pandas function, str.contains, is case-sensitive by default:

df['Journal'] = df['URL'].str.contains('engine', na=False)

Case insensitivity can be enabled by setting the case parameter to False:

df['Journal'] = df['URL'].str.contains('engine', case=False, na=False)

In either scenario, including na=False will prevent null values from being returned within the Boolean column.

One massive advantage of using Pandas here is that, unlike Excel, regex is natively supported by this function – as it is in Google sheets via REGEXMATCH.

Chain together multiple substrings by using the pipe character, also known as the OR operator:

df['Journal'] = df['URL'].str.contains('engine|search', na=False)

Find And Replace

Excel’s “Find and Replace” feature provides an easy way to individually or bulk replace one substring with another.

find replace ExcelScreenshot from Microsoft Excel, November 2022

When processing data for SEO, we’re most likely to select an entire column and “Replace All.”

The SUBSTITUTE formula provides another option here and is useful if you don’t want to overwrite the existing column.

As an example, we can change the protocol of a URL from HTTP to HTTPS, or remove it by replacing it with nothing.

When working with dataframes in Python, we can use str.replace:

df['URL'] = df['URL'].str.replace('http://', 'https://')

Or:

df['URL'] = df['URL'].str.replace('http://', '') # replace with nothing

Again, unlike Excel, regex can be used – like with Google Sheets’ REGEXREPLACE:

df['URL'] = df['URL'].str.replace('http://|https://', '')

Alternatively, if you want to replace multiple substrings with different values, you can use Python’s replace method and provide a list.

This prevents you from having to chain multiple str.replace functions:

df['URL'] = df['URL'].replace(['http://', ' https://'], ['https://www.', 'https://www.’], regex=True)

LEFT/MID/RIGHT

Extracting a substring within Excel requires the usage of the LEFT, MID, or RIGHT functions, depending on where the substring is located within a cell.

Let’s say we want to extract the root domain and subdomain from a URL:

=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)
left mid right ExcelScreenshot from Microsoft Excel, November 2022

Using a combination of MID and multiple FIND functions, this formula is ugly, to say the least – and things get a lot worse for more complex extractions.

Again, Google Sheets does this better than Excel, because it has REGEXEXTRACT.

What a shame that when you feed it larger datasets, it melts faster than a Babybel on a hot radiator.

Thankfully, Pandas offers str.extract, which works in a similar way:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)')
str extract PythonScreenshot from VS Code, November 2022

Combine with fillna to prevent null values, as you would in Excel with IFERROR:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)').fillna('-')

If

IF statements allow you to return different values, depending on whether or not a condition is met.

To illustrate, suppose that we want to create a label for keywords that are ranking within the top three positions.

Excel IFScreenshot from Microsoft Excel, November 2022

Rather than using Pandas in this instance, we can lean on NumPy and the where function (remember to import NumPy, if you haven’t already):

df['Top 3'] = np.where(df['Position'] <= 3, 'Top 3', 'Not Top 3')

Multiple conditions can be used for the same evaluation by using the AND/OR operators, and enclosing the individual criteria within round brackets:

df['Top 3'] = np.where((df['Position'] <= 3) & (df['Position'] != 0), 'Top 3', 'Not Top 3')

In the above, we’re returning “Top 3” for any keywords with a ranking less than or equal to three, excluding any keywords ranking in position zero.

IFS

Sometimes, rather than specifying multiple conditions for the same evaluation, you may want multiple conditions that return different values.

In this case, the best solution is using IFS:

=IFS(B2<=3,"Top 3",B2<=10,"Top 10",B2<=20,"Top 20")
IFS ExcelScreenshot from Microsoft Excel, November 2022

Again, NumPy provides us with the best solution when working with dataframes, via its select function.

With select, we can create a list of conditions, choices, and an optional value for when all of the conditions are false:

conditions = [df['Position'] <= 3, df['Position'] <= 10, df['Position'] <=20]

choices = ['Top 3', 'Top 10', 'Top 20']

df['Rank'] = np.select(conditions, choices, 'Not Top 20')

It’s also possible to have multiple conditions for each of the evaluations.

Let’s say we’re working with an ecommerce retailer with product listing pages (PLPs) and product display pages (PDPs), and we want to label the type of branded pages ranking within the top 10 results.

The easiest solution here is to look for specific URL patterns, such as a subfolder or extension, but what if competitors have similar patterns?

In this scenario, we could do something like this:

conditions = [(df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0),
(df['URL'].str.contains('/product/')) & (df['Brand Rank'] > 0),
(~df['URL'].str.contains('/product/')) & (~df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0)]

choices = ['PLP', 'PDP', 'Other']

df['Brand Page Type'] = np.select(conditions, choices, None)

Above, we’re using str.contains to evaluate whether or not a URL in the top 10 matches our brand’s pattern, then using the “Brand Rank” column to exclude any competitors.

In this example, the tilde sign (~) indicates a negative match. In other words, we’re saying we want every brand URL that doesn’t match the pattern for a “PDP” or “PLP” to match the criteria for ‘Other.’

Lastly, None is included because we want non-brand results to return a null value.

np select PythonScreenshot from VS Code, November 2022

VLOOKUP

VLOOKUP is an essential tool for joining together two distinct datasets on a common column.

In this case, adding the URLs within column N to the keyword, position, and search volume data in columns A-C, using the shared “Keyword” column:

=VLOOKUP(A2,M:N,2,FALSE)
vlookup ExcelScreenshot from Microsoft Excel, November 2022

To do something similar with Pandas, we can use merge.

Replicating the functionality of an SQL join, merge is an incredibly powerful function that supports a variety of different join types.

For our purposes, we want to use a left join, which will maintain our first dataframe and only merge in matching values from our second dataframe:

mergeddf = df.merge(df2, how='left', on='Keyword')

One added advantage of performing a merge over a VLOOKUP, is that you don’t have to have the shared data in the first column of the second dataset, as with the newer XLOOKUP.

It will also pull in multiple rows of data rather than the first match in finds.

One common issue when using the function is for unwanted columns to be duplicated. This occurs when multiple shared columns exist, but you attempt to match using one.

To prevent this – and improve the accuracy of your matches – you can specify a list of columns:

mergeddf = df.merge(df2, how='left', on=['Keyword', 'Search Volume'])

In certain scenarios, you may actively want these columns to be included. For instance, when attempting to merge multiple monthly ranking reports:

mergeddf = df.merge(df2, on='Keyword', how='left', suffixes=('', '_october'))
    .merge(df3, on='Keyword', how='left', suffixes=('', '_september'))

The above code snippet executes two merges to join together three dataframes with the same columns – which are our rankings for November, October, and September.

By labeling the months within the suffix parameters, we end up with a much cleaner dataframe that clearly displays the month, as opposed to the defaults of _x and _y seen in the earlier example.

multi merge PythonScreenshot from VS Code, November 2022

COUNTIF/SUMIF/AVERAGEIF

In Excel, if you want to perform a statistical function based on a condition, you’re likely to use either COUNTIF, SUMIF, or AVERAGEIF.

Commonly, COUNTIF is used to determine how many times a specific string appears within a dataset, such as a URL.

We can accomplish this by declaring the ‘URL’ column as our range, then the URL within an individual cell as our criteria:

=COUNTIF(D:D,D2)
Excel countifScreenshot from Microsoft Excel, November 2022

In Pandas, we can achieve the same outcome by using the groupby function:

df.groupby('URL')['URL'].count()
Python groupbyScreenshot from VS Code, November 2022

Here, the column declared within the round brackets indicates the individual groups, and the column listed in the square brackets is where the aggregation (i.e., the count) is performed.

The output we’re receiving isn’t perfect for this use case, though, because it’s consolidated the data.

Typically, when using Excel, we’d have the URL count inline within our dataset. Then we can use it to filter to the most frequently listed URLs.

To do this, use transform and store the output in a column:

df['URL Count'] = df.groupby('URL')['URL'].transform('count')
Python groupby transformScreenshot from VS Code, November 2022

You can also apply custom functions to groups of data by using a lambda (anonymous) function:

df['Google Count'] = df.groupby(['URL'])['URL'].transform(lambda x: x[x.str.contains('google')].count())

In our examples so far, we’ve been using the same column for our grouping and aggregations, but we don’t have to. Similarly to COUNTIFS/SUMIFS/AVERAGEIFS in Excel, it’s possible to group using one column, then apply our statistical function to another.

Going back to the earlier search engine results page (SERP) example, we may want to count all ranking PDPs on a per-keyword basis and return this number alongside our existing data:

df['PDP Count'] = df.groupby(['Keyword'])['URL'].transform(lambda x: x[x.str.contains('/product/|/prd/|/pd/')].count())
Python groupby countifsScreenshot from VS Code, November 2022

Which in Excel parlance, would look something like this:

=SUM(COUNTIFS(A:A,[@Keyword],D:D,{"*/product/*","*/prd/*","*/pd/*"}))

Pivot Tables

Last, but by no means least, it’s time to talk pivot tables.

In Excel, a pivot table is likely to be our first port of call if we want to summarise a large dataset.

For instance, when working with ranking data, we may want to identify which URLs appear most frequently, and their average ranking position.

pivot table ExcelScreenshot from Microsoft Excel, November 2022

Again, Pandas has its own pivot tables equivalent – but if all you want is a count of unique values within a column, this can be accomplished using the value_counts function:

count = df['URL'].value_counts()

Using groupby is also an option.

Earlier in the article, performing a groupby that aggregated our data wasn’t what we wanted – but it’s precisely what’s required here:

grouped = df.groupby('URL').agg(
     url_frequency=('Keyword', 'count'),
     avg_position=('Position', 'mean'),
     )

grouped.reset_index(inplace=True)
groupby-pivot PythonScreenshot from VS Code, November 2022

Two aggregate functions have been applied in the example above, but this could easily be expanded upon, and 13 different types are available.

There are, of course, times when we do want to use pivot_table, such as when performing multi-dimensional operations.

To illustrate what this means, let’s reuse the ranking groupings we made using conditional statements and attempt to display the number of times a URL ranks within each group.

ranking_groupings = df.groupby(['URL', 'Grouping']).agg(
     url_frequency=('Keyword', 'count'),
     )
python groupby groupingScreenshot from VS Code, November 2022

This isn’t the best format to use, as multiple rows have been created for each URL.

Instead, we can use pivot_table, which will display the data in different columns:

pivot = pd.pivot_table(df,
index=['URL'],
columns=['Grouping'],
aggfunc="size",
fill_value=0,
)
pivot table PythonScreenshot from VS Code, November 2022

Final Thoughts

Whether you’re looking for inspiration to start learning Python, or are already leveraging it in your SEO workflows, I hope that the above examples help you along on your journey.

As promised, you can find a Google Colab notebook with all of the code snippets here.

In truth, we’ve barely scratched the surface of what’s possible, but understanding the basics of Python data analysis will give you a solid base upon which to build.

More resources:


Featured Image: mapo_japan/Shutterstock



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish