Connect with us

SEO

How to Do Keyword Optimization for SEO (3 Steps)

Published

on

How to Do Keyword Optimization for SEO (3 Steps)

Keyword optimization is the process of increasing the relevance of a webpage’s content to a given search query.

It’s a fundamental process in SEO because Google aims to serve the most relevant content to its users.

In this post, you’ll learn how to optimize your new and existing content for any keyword.

Step 1. Make sure you’re optimizing for the right keyword 

Whether you’re optimizing existing or new content, you need to make sure that keyword optimization is worth the effort and that your chances of ranking are good. 

This step is arguably the hardest part of the process, so here are some considerations to think about right from the start. 

Search traffic potential

Measuring the potential of a keyword to bring you traffic can be tricky. Most SEO tools try to solve this with search volume, but that’s not enough:

  • Some searches don’t result in clicks on pages For example, clicks on ads or searches that provide sufficient answers right on the SERP. 
  • Pages can rank for hundreds of keywords while being optimized just for one – So you can actually get more traffic than the search volume indicates. 

A better way is to estimate the traffic that the ranking pages get. In Ahrefs, this is automatically calculated with the Traffic Potential (TP) metric. 

The TP metric sums up traffic estimations from all keywords that the top-ranking page for your target keyword ranks for. This shows you how much traffic you could be looking at if you outranked this page.

Traffic Potential metric in Ahrefs
The TP for the keyword “submit website to search engines” is almost 10 times higher than the search volume.

Value to your website 

Optimize for keywords that can bring you valuable traffic. 

When picking a target keyword, ask yourself what is the practical use of attracting searchers. Is it direct sales, or maybe brand awareness, or building a readership? 

You can map each keyword on a scale that matches your overall goal. For example, your strategy may be to create content that helps the reader solve their problems using your product (aka product-led content). Then, your scale can look something like this: 

Business potential score

So while there will be no harm in generating traffic with 0 business value from time to time, you may want to focus on optimizing content with a high business potential score. 

Keyword difficulty 

Some keywords will be harder to rank for than others. 

To get a quick overview of the ranking difficulty of a keyword, look at the number of unique domains linking the top 10 ranking pages. The more linking domains, the harder it will likely be to rank because backlinks are still one of the most impactful ranking signals for Google.

In Ahrefs, Keyword Difficulty (KD) is calculated automatically based on backlinks on a scale of 0 to 100. 

Keywords with shoes, different keyword difficulty

So for example, if your website is new and doesn’t have a strong backlink profile yet, you may want to focus on low-competition keywords below KD 20.

There may be other factors that can come into play, such as familiarity with the brand. Learn more about estimating keyword difficulty here

Search intent 

Search intent is the reason behind the search. Usually, searchers either want to learn something, buy something, or find a specific website. 

Search intent matters for keyword optimization because Google tends to rank content that matches the dominating intent behind the query. 

Your task here is to identify what searchers are after and decide whether you can offer that and whether it’s worth it for you. 

To illustrate, it could be tough for a “non e-commerce” website to break the mold for a query like “women’s shoes.” It’s product pages from top to bottom. 

Top-ranking pages with the same search intent
The entire first page for “women’s shoes” shows product pages.

We’ll talk about search intent more in the next step of this guide. 

Your expertise 

In Google Search, the messenger is at least as important as the message. 

Google expressed philosophy on that through E-A-T principles (expertise, authoritativeness, and trustworthiness). E-A-T is known to have a significant impact on queries in the Your Money or Your Life domain (i.e., health, financial topics, safety, etc.).

Google further emphasized the role of the authority of the website (maybe even gave it more significance) in the recent “helpful content” core update:

Does your content clearly demonstrate first-hand expertise and a depth of knowledge (for example, expertise that comes from having actually used a product or service, or visiting a place)?

Did you decide to enter some niche topic area without any real expertise, but instead mainly because you thought you’d get search traffic?

Google wants to show quality content to its users. Knowing that something comes from a trusted source simply makes it easier for a search engine to recognize quality content. 

So for example, a blog on health should ideally be written or at least reviewed by someone with formal medical training. Also, it should be reviewed and updated on a regular basis. 

Health article reviewed by an expert

Sidenote.

If you’d like to learn more about finding, choosing, and prioritizing your keywords, we’ve got a full guide on that.

Step 2. Align with the search intent 

Let’s ask Google what kind of content searchers want to see. We call this analyzing the three Cs of search intent. 

Content type

Content type refers to the goal the searcher is after. The content type will usually be one of the following:

  • Blog post/article
  • Product page
  • Category page
  • Landing page

The task here is to take top-ranking pages for your keyword and look for the dominating type of content among them. The top three ranking pages and SERP features (People Also Ask box, featured snippet) will be most impactful here. Then match that with your content. 

For example, for “best all season car tires,” we see almost only articles (except for the #1 result). So if you were to compete for this keyword, your best chance to do so would be with an article too because that’s the dominating content type.

SERP for "best all season car tires"
9/10 results are articles.

Content format 

Content format refers to how users seemingly prefer information served to them. The content type will usually be one of the following:

  • How-to” guide
  • Step-by-step tutorial
  • List post
  • Opinion piece
  • Review
  • Comparison
  • Product page (homepage or subpage)

For example, “home decor tips” is dominated by listicles; most of them have numbers in titles and/or the main content is structured in ordered lists. 

SERP for "home decor tips"

Analogically to the other Cs of search intent, the idea is to identify what content format dominates the SERPs and use it for your page. 

TIP

Note: SERPs are not always this obvious. Sometimes, Google ranks different types and formats of content. 

One reason for this may be that Google moves to serve search journeys rather than search queries. 

In SEO, this is called mixed or fractured search intent. See what you can do in such a situation: 

You may come across a chance to rank a different type of content from the dominating one. This usually happens in broad terms where people can look for different things. Indications of this can be found in:

  • Questions in the PAA box.
  • Presence of certain rich search results, such as the “Things to know.” 

There’s an interesting analysis of the “coffee” keyword by Kayle Larkin. I highly recommend it if you want to get a hint on how to spot these kinds of opportunities. A bet on that tactic, however, may be riskier.

Content angle 

Content angle is the unique selling point of a page. It should catch the attention of the searcher and indicate what is special about the page.

To illustrate, consider the query “how to become rich.” Some angles for this query are:

  • Before 30
  • In a smart way
  • Fast 
  • According to experts
  • Best
  • From nothing 
  • In five years
SERP for query "how to become rich"

Makes sense, right? That’s why the content angle should be tightly matched with the topic. A topic may require the freshest view of said topic, while another may require a list of free online tools. SERPs are again the best place to look for that information.

For example, it won’t make sense to use “before 30” or “in five years” for a query like “how to peel a banana.” We can see on the SERPs that what seems to be valuable to users is learning how to do it the right way (i.e., like a monkey).

SERP for "how to peel a banana"

Step 3. Follow on-page SEO best practices 

Once we have picked our target keyword and identified the search intent for it, it’s time to write our content with SEO in mind. 

For this, we need the so-called on-page SEO. In other words, tried and tested things that you can do on the page itself to help Google and searchers better understand and digest your content. 

If you’re optimizing old content, it’s a good idea to go through the process with a tool like Ahrefs’ Site Audit (also available for free in Ahrefs Webmaster Tools). It will help you catch all the missing tags, unoptimized images, and more.

All issues from the Content report in Site Audit

Give searchers what they want 

You may have a completely unique opinion on your topic. You may want to approach it in an unconventional fashion. That’s all fine because Google wants unique content. But if you want your content to rank, you need to meet searchers’ expectations too. Google is quite clear about it

Provide an appropriate amount of content for your subject … . So, for example, if you describe your page as a recipe, provide a complete recipe that is easy to follow, rather than just a set of ingredients or a basic description of the dish.

You can get a pretty good idea of what searchers want by looking at the topics covered by the top-ranking pages. The more commonalities between pages, the higher the probability that a given subtopic is important to the searchers.

You can automate this process using Ahrefs’ Content Gap tool. Simply enter the URLs of top-ranking pages and get the keywords that they rank for. The keywords will indicate subtopics that you should consider including in your content. 

Content Gap tool in Ahrefs
Step 1. Make sure to leave the last input blank.
Results from Content Gap tool
Step 2. Look at the keywords to spot topics and patterns.

You can then also adjust the “Intersect” settings to pick only the biggest commonalities. 

"Intersect" settings to filter Content Gap report results

This report also makes it easier to optimize existing content. You can add your page in the last field to see keywords that your content doesn’t rank for compared to your competitors. 

Content Gap tool, comparing to existing content

Recommendation

Sometimes, you can smell an “optimized” text from a mile through keywords shoehorned into every second sentence. This kind of quasi-optimization is something you should avoid, as it’s based on two SEO myths: LSI and TF-IDF keywords. 

Here’s the thing. You don’t need to tactically sprinkle closely related keywords (the idea behind LSI), nor do you need to repeat them a certain number of times (TF-IDF). 

But mentioning related keywords, phrases, and entities in your text can boost your SEO. It has nothing to do with gaming the system. Rather, it’s about understanding what type of information searchers may be looking for. The difference may sound subtle, so feel free to learn more here.

Make your content easy to digest 

Easy-to-digest content in the SEO world means these three things:

  1. Writing in simple words, avoiding complex sentences – Of course, you can and probably should use technical terms when the topics require them. 
  2. Making content skimmable Two reasons: (1) Most people aren’t here for the whole thing—just specific info, and (2) people skim content to decide whether it’s worth their time.
  3. Using images They make content more comprehensive and break walls of text. 

Imagine Google serving results that most people can’t digest. If you were Google, that’s the kind of results you’d like to avoid. 

Learn more: Flesch Reading Ease: Does It Matter for SEO? (Data Study) 

Optimize page title 

Both searchers and Google use the title of the page to understand the context of the page. So you need to optimize the page title for both parties:

  • Make your target keyword part of the title – Just to be clear, Google is advanced enough to rank relevant pages that don’t use the search query in the title. But including the keyword in the title tag is your best bet here. 
  • Make the title informative yet attractive to the reader 
  • Not too short, not too long – Use a tool like SERPsim to check your titles before you publish. 

Learn more: How to Craft the Perfect SEO Title Tag (Our 4-Step Process) 

Match the H1 tag with the title tag

The consensus among SEOs seems to be that your H1 tag should be consistent with the title tag. This means two things.

First, these tags can be slightly different. However, it’s best if the H1 also contains the target keyword. 

For example, a product page can have a title tag that describes the value proposition of the product, while the H1 tag can be just a heading for the content that follows below. 

Similar title and H1 shown by Ahrefs' SEO Toolbar

But it’s perfectly fine if both the H1 and title tag are the same. This is a rule you may want to go with for blog posts. 

Write a compelling meta description

In case you’re wondering, what you put inside the meta description tag most probably won’t impact rankings. 

But it’s still a good idea to give that little piece of content some thought because it may entice readers to choose your page among others on the SERP.

Here are a few tips on crafting your meta descriptions:

  • Think about what searchers expect from a page found on the #1 page of Google – This comes back to search intent. It helps if you check your meta description on a mobile device. It’s more prominent there, so you’ll instantly know if the meta is enticing and helpful.
  • Mind the length – Use something like SERPsim. 
  • Write in newspaper headlines – For instance, compare this “TireHeaven has a wide variety of tires and wheels in stock. We have all of the top brands of passenger and truck tires, along with lawn, trailer, and tire …” to “Tires and wheels for all vehicles. Top brands. Fast and free shipping to an installer near you.” 
  • Take cues from descriptions on search ads – Marketers actually spend a lot of time tweaking those. 
  • Have a unique meta description for each page

Learn more: How to Write the Perfect Meta Description 

Sidenote.

Google may still replace your title tag (study) and meta description (study) with something that, according to the system, fits the search query better. But writing your own title and meta is your best bet for displaying what you want and not what Google wants.

Use H2–H6 tags for subheadings

Here, the solution is straightforward: The best use you can make of tags H2–H6 is for subheadings. 

Subheadings are good for creating a skimmable hierarchy in a document. A good hierarchy should allow the reader to understand what they can find in the text just by skimming through the page. 

Create a user-friendly URL

Although John Mueller said not to worry about URLs, I think Google said it all with this in its SEO guide: 

Quote from Google's SEO guide

Bad URLs are a slant against Google’s grand design of serving helpful results. 

URLs do appear in the search results, and some users may read them to make sure they’re clicking on legit pages. But since Google doesn’t always show the full URL on the SERPs, I guess this is not something to ponder too much about. Just a clear, simple, and human-readable structure is all you need. 

So do this:

/how-to-peel-a-banana

Instead of this:

/how-to-peel-a-banana-like-a-monkey-the-right-way-10-2022 

Optimize images (filenames and alt tags)

It isn’t just text that’s important for keyword optimization. Images help Google understand what a page is about too. 

Makes sense if you think about it. If Google finds a lot of images about dogs on a page, it has a very good reason to think that the page is about dogs. 

Moreover, images can rank in Google Image Search. Additionally, your images may show up as previews in Search or in Google Discover

Google looks at a number of things when it comes to images. You can find the entire list here. In short: 

  • Use relevant images – It’s best if they’re original. 
  • Use descriptive, succinct filenames and alt tags – Avoid generic names, and don’t make them too long. Something like “house-on-a-hill.jpg” is better than “image1.jpg”.
  • Place your images close to relevant text 

Learn more: Alt Text for SEO: How to Optimize Your Images 

Link to relevant internal and external resources

When the content demands a link to some other page, don’t hold back. Both internal and external links help Google understand the context, and that’s a good thing for keyword optimization. They can also help establish E-A-T—just make sure you link to pages that you trust. 

Healthline linking to sources

But where you link to is not just about the context. Internal links can be used tactically to boost rankings because they are known to pass link equity. 

Learn more: Internal Links for SEO: An Actionable Guide

Optimize for featured snippets 

Featured snippets are bits of content that Google pulls from pages to answer search queries. 

Featured snippet example

Basically, when Google thinks there is a short and sweet answer to the question, it tends to show it right on the SERPs without making people click on anything. 

Optimizing for featured snippets is basically about:

  1. Providing the answer to the main question early in the text. 
  2. Making the answer succinct.
  3. Structuring your content in an organized, clear way.
  4. Using easy-to-understand language (avoiding jargon too). 

If you plug in your target keyword in Google, you will see right away if there’s a featured snippet you can optimize for. But if you’re working with a bigger list of keywords, you may want to use a tool like Ahrefs’ Keywords Explorer. Simply enter all your keywords and set the “SERP features” filter to “Featured snippet.” You can do the same with your existing content and Ahrefs’ Site Explorer

 "Featured snippet" filter in Ahrefs

Learn more: How to Optimize for Google’s Featured Snippets 

Optimize for rich results 

Rich result is any type of visually enhanced search result with information pulled from relevant structured data. It likely doesn’t impact rankings. But it can make your page more eye-catching. 

Rich results example

Some content formats are eligible for special types of search results, such as this recipe carousel.

Rich results carousel

To make a page eligible for rich results, you need to add some simple code called schema markup. Each content format that supports rich results has its own set of markup properties. 

Here’s the process. You should: 

  1. Check available properties for your content in Google’s documentation.
  2. Deploy the code. Use a markup generator or write it yourself.
  3. Test the code using this Rich Results Test tool.
  4. Use the URL Inspection tool in Google Search Console to see if the site looks OK. If there are no issues, Google recommends using the request indexing tool to let it know about changes.

Final thoughts 

Keep in mind that the aim of keyword optimization is not to game the system in some cyberpunk fashion. The goal is to help Google and searchers find and understand your content.

So once you’re done with all the points from this guide, it’s a good idea to circle back and take this self-assessment test to make sure your content is “helpful, reliable and people-first.”

Once your content is live, here are two things you can do next:

  1. Build links to your content to boost rankings. Check our guide to link building to start off on the right foot. 
  2. Monitor your ranking progress to check if your tactics are working or when to update the content. But don’t do it manually on Google; rather, use a rank tracker. Here’s why.

Got questions? Ping me on Twitter



Source link

SEO

Essential Functions For SEO Data Analysis

Published

on

Essential Functions For SEO Data Analysis

Learning to code, whether with PythonJavaScript, or another programming language, has a whole host of benefits, including the ability to work with larger datasets and automate repetitive tasks.

But despite the benefits, many SEO professionals are yet to make the transition – and I completely understand why! It isn’t an essential skill for SEO, and we’re all busy people.

If you’re pressed for time, and you already know how to accomplish a task within Excel or Google Sheets, then changing tack can feel like reinventing the wheel.

When I first started coding, I initially only used Python for tasks that I couldn’t accomplish in Excel – and it’s taken several years to get to the point where it’s my defacto choice for data processing.

Looking back, I’m incredibly glad that I persisted, but at times it was a frustrating experience, with many an hour spent scanning threads on Stack Overflow.

This post is designed to spare other SEO pros the same fate.

Within it, we’ll cover the Python equivalents of the most commonly used Excel formulas and features for SEO data analysis – all of which are available within a Google Colab notebook linked in the summary.

Specifically, you’ll learn the equivalents of:

  • LEN.
  • Drop Duplicates.
  • Text to Columns.
  • SEARCH/FIND.
  • CONCATENATE.
  • Find and Replace.
  • LEFT/MID/RIGHT.
  • IF.
  • IFS.
  • VLOOKUP.
  • COUNTIF/SUMIF/AVERAGEIF.
  • Pivot Tables.

Amazingly, to accomplish all of this, we’ll primarily be using a singular library – Pandas – with a little help in places from its big brother, NumPy.

Prerequisites

For the sake of brevity, there are a few things we won’t be covering today, including:

  • Installing Python.
  • Basic Pandas, like importing CSVs, filtering, and previewing dataframes.

If you’re unsure about any of this, then Hamlet’s guide on Python data analysis for SEO is the perfect primer.

Now, without further ado, let’s jump in.

LEN

LEN provides a count of the number of characters within a string of text.

For SEO specifically, a common use case is to measure the length of title tags or meta descriptions to determine whether they’ll be truncated in search results.

Within Excel, if we wanted to count the second cell of column A, we’d enter:

=LEN(A2)
Screenshot from Microsoft Excel, November 2022

Python isn’t too dissimilar, as we can rely on the inbuilt len function, which can be combined with Pandas’ loc[] to access a specific row of data within a column:

len(df['Title'].loc[0])

In this example, we’re getting the length of the first row in the “Title” column of our dataframe.

len function python
Screenshot of VS Code, November, 2022

Finding the length of a cell isn’t that useful for SEO, though. Normally, we’d want to apply a function to an entire column!

In Excel, this would be achieved by selecting the formula cell on the bottom right-hand corner and either dragging it down or double-clicking.

When working with a Pandas dataframe, we can use str.len to calculate the length of rows within a series, then store the results in a new column:

df['Length'] = df['Title'].str.len()

Str.len is a ‘vectorized’ operation, which is designed to be applied simultaneously to a series of values. We’ll use these operations extensively throughout this article, as they almost universally end up being faster than a loop.

Another common application of LEN is to combine it with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1

In Pandas, we can achieve this by combining the str.split and str.len functions together:

df['No. Words'] = df['Title'].str.split().str.len()

We’ll cover str.split in more detail later, but essentially, what we’re doing is splitting our data based upon whitespaces within the string, then counting the number of component parts.

word count PythonScreenshot from VS Code, November 2022

Dropping Duplicates

Excel’s ‘Remove Duplicates’ feature provides an easy way to remove duplicate values within a dataset, either by deleting entirely duplicate rows (when all columns are selected) or removing rows with the same values in specific columns.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

In Pandas, this functionality is provided by drop_duplicates.

To drop duplicate rows within a dataframe type:

df.drop_duplicates(inplace=True)

To drop rows based on duplicates within a singular column, include the subset parameter:

df.drop_duplicates(subset="column", inplace=True)

Or specify multiple columns within a list:

df.drop_duplicates(subset=['column','column2'], inplace=True)

One addition above that’s worth calling out is the presence of the inplace parameter. Including inplace=True allows us to overwrite our existing dataframe without needing to create a new one.

There are, of course, times when we want to preserve our raw data. In this case, we can assign our deduped dataframe to a different variable:

df2 = df.drop_duplicates(subset="column")

Text To Columns

Another everyday essential, the ‘text to columns’ feature can be used to split a text string based on a delimiter, such as a slash, comma, or whitespace.

As an example, splitting a URL into its domain and individual subfolders.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

When dealing with a dataframe, we can use the str.split function, which creates a list for each entry within a series. This can be converted into multiple columns by setting the expand parameter to True:

df['URL'].str.split(pat="/", expand=True)
str split PythonScreenshot from VS Code, November 2022

As is often the case, our URLs in the image above have been broken up into inconsistent columns, because they don’t feature the same number of folders.

This can make things tricky when we want to save our data within an existing dataframe.

Specifying the n parameter limits the number of splits, allowing us to create a specific number of columns:

df[['Domain', 'Folder1', 'Folder2', 'Folder3']] = df['URL'].str.split(pat="/", expand=True, n=3)

Another option is to use pop to remove your column from the dataframe, perform the split, and then re-add it with the join function:

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

Duplicating the URL to a new column before the split allows us to preserve the full URL. We can then rename the new columns:🐆

df['Split'] = df['URL']

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

df.rename(columns = {0:'Domain', 1:'Folder1', 2:'Folder2', 3:'Folder3', 4:'Parameter'}, inplace=True)
Split pop join functions PythonScreenshot from VS Code, November 2022

CONCATENATE

The CONCAT function allows users to combine multiple strings of text, such as when generating a list of keywords by adding different modifiers.

In this case, we’re adding “mens” and whitespace to column A’s list of product types:

=CONCAT($F$1," ",A2)
concat Excel
Screenshot from Microsoft Excel, November 2022

Assuming we’re dealing with strings, the same can be achieved in Python using the arithmetic operator:

df['Combined] = 'mens' + ' ' + df['Keyword']

Or specify multiple columns of data:

df['Combined'] = df['Subdomain'] + df['URL']
concat PythonScreenshot from VS Code, November 2022

Pandas has a dedicated concat function, but this is more useful when trying to combine multiple dataframes with the same columns.

For instance, if we had multiple exports from our favorite link analysis tool:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

SEARCH/FIND

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These commands are commonly combined with ISNUMBER to create a Boolean column that helps filter down a dataset, which can be extremely helpful when performing tasks like log file analysis, as explained in this guide. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)
isnumber search ExcelScreenshot from Microsoft Excel, November 2022

The difference between SEARCH and FIND is that find is case-sensitive.

The equivalent Pandas function, str.contains, is case-sensitive by default:

df['Journal'] = df['URL'].str.contains('engine', na=False)

Case insensitivity can be enabled by setting the case parameter to False:

df['Journal'] = df['URL'].str.contains('engine', case=False, na=False)

In either scenario, including na=False will prevent null values from being returned within the Boolean column.

One massive advantage of using Pandas here is that, unlike Excel, regex is natively supported by this function – as it is in Google sheets via REGEXMATCH.

Chain together multiple substrings by using the pipe character, also known as the OR operator:

df['Journal'] = df['URL'].str.contains('engine|search', na=False)

Find And Replace

Excel’s “Find and Replace” feature provides an easy way to individually or bulk replace one substring with another.

find replace ExcelScreenshot from Microsoft Excel, November 2022

When processing data for SEO, we’re most likely to select an entire column and “Replace All.”

The SUBSTITUTE formula provides another option here and is useful if you don’t want to overwrite the existing column.

As an example, we can change the protocol of a URL from HTTP to HTTPS, or remove it by replacing it with nothing.

When working with dataframes in Python, we can use str.replace:

df['URL'] = df['URL'].str.replace('http://', 'https://')

Or:

df['URL'] = df['URL'].str.replace('http://', '') # replace with nothing

Again, unlike Excel, regex can be used – like with Google Sheets’ REGEXREPLACE:

df['URL'] = df['URL'].str.replace('http://|https://', '')

Alternatively, if you want to replace multiple substrings with different values, you can use Python’s replace method and provide a list.

This prevents you from having to chain multiple str.replace functions:

df['URL'] = df['URL'].replace(['http://', ' https://'], ['https://www.', 'https://www.’], regex=True)

LEFT/MID/RIGHT

Extracting a substring within Excel requires the usage of the LEFT, MID, or RIGHT functions, depending on where the substring is located within a cell.

Let’s say we want to extract the root domain and subdomain from a URL:

=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)
left mid right ExcelScreenshot from Microsoft Excel, November 2022

Using a combination of MID and multiple FIND functions, this formula is ugly, to say the least – and things get a lot worse for more complex extractions.

Again, Google Sheets does this better than Excel, because it has REGEXEXTRACT.

What a shame that when you feed it larger datasets, it melts faster than a Babybel on a hot radiator.

Thankfully, Pandas offers str.extract, which works in a similar way:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)')
str extract PythonScreenshot from VS Code, November 2022

Combine with fillna to prevent null values, as you would in Excel with IFERROR:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)').fillna('-')

If

IF statements allow you to return different values, depending on whether or not a condition is met.

To illustrate, suppose that we want to create a label for keywords that are ranking within the top three positions.

Excel IFScreenshot from Microsoft Excel, November 2022

Rather than using Pandas in this instance, we can lean on NumPy and the where function (remember to import NumPy, if you haven’t already):

df['Top 3'] = np.where(df['Position'] <= 3, 'Top 3', 'Not Top 3')

Multiple conditions can be used for the same evaluation by using the AND/OR operators, and enclosing the individual criteria within round brackets:

df['Top 3'] = np.where((df['Position'] <= 3) & (df['Position'] != 0), 'Top 3', 'Not Top 3')

In the above, we’re returning “Top 3” for any keywords with a ranking less than or equal to three, excluding any keywords ranking in position zero.

IFS

Sometimes, rather than specifying multiple conditions for the same evaluation, you may want multiple conditions that return different values.

In this case, the best solution is using IFS:

=IFS(B2<=3,"Top 3",B2<=10,"Top 10",B2<=20,"Top 20")
IFS ExcelScreenshot from Microsoft Excel, November 2022

Again, NumPy provides us with the best solution when working with dataframes, via its select function.

With select, we can create a list of conditions, choices, and an optional value for when all of the conditions are false:

conditions = [df['Position'] <= 3, df['Position'] <= 10, df['Position'] <=20]

choices = ['Top 3', 'Top 10', 'Top 20']

df['Rank'] = np.select(conditions, choices, 'Not Top 20')

It’s also possible to have multiple conditions for each of the evaluations.

Let’s say we’re working with an ecommerce retailer with product listing pages (PLPs) and product display pages (PDPs), and we want to label the type of branded pages ranking within the top 10 results.

The easiest solution here is to look for specific URL patterns, such as a subfolder or extension, but what if competitors have similar patterns?

In this scenario, we could do something like this:

conditions = [(df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0),
(df['URL'].str.contains('/product/')) & (df['Brand Rank'] > 0),
(~df['URL'].str.contains('/product/')) & (~df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0)]

choices = ['PLP', 'PDP', 'Other']

df['Brand Page Type'] = np.select(conditions, choices, None)

Above, we’re using str.contains to evaluate whether or not a URL in the top 10 matches our brand’s pattern, then using the “Brand Rank” column to exclude any competitors.

In this example, the tilde sign (~) indicates a negative match. In other words, we’re saying we want every brand URL that doesn’t match the pattern for a “PDP” or “PLP” to match the criteria for ‘Other.’

Lastly, None is included because we want non-brand results to return a null value.

np select PythonScreenshot from VS Code, November 2022

VLOOKUP

VLOOKUP is an essential tool for joining together two distinct datasets on a common column.

In this case, adding the URLs within column N to the keyword, position, and search volume data in columns A-C, using the shared “Keyword” column:

=VLOOKUP(A2,M:N,2,FALSE)
vlookup ExcelScreenshot from Microsoft Excel, November 2022

To do something similar with Pandas, we can use merge.

Replicating the functionality of an SQL join, merge is an incredibly powerful function that supports a variety of different join types.

For our purposes, we want to use a left join, which will maintain our first dataframe and only merge in matching values from our second dataframe:

mergeddf = df.merge(df2, how='left', on='Keyword')

One added advantage of performing a merge over a VLOOKUP, is that you don’t have to have the shared data in the first column of the second dataset, as with the newer XLOOKUP.

It will also pull in multiple rows of data rather than the first match in finds.

One common issue when using the function is for unwanted columns to be duplicated. This occurs when multiple shared columns exist, but you attempt to match using one.

To prevent this – and improve the accuracy of your matches – you can specify a list of columns:

mergeddf = df.merge(df2, how='left', on=['Keyword', 'Search Volume'])

In certain scenarios, you may actively want these columns to be included. For instance, when attempting to merge multiple monthly ranking reports:

mergeddf = df.merge(df2, on='Keyword', how='left', suffixes=('', '_october'))
    .merge(df3, on='Keyword', how='left', suffixes=('', '_september'))

The above code snippet executes two merges to join together three dataframes with the same columns – which are our rankings for November, October, and September.

By labeling the months within the suffix parameters, we end up with a much cleaner dataframe that clearly displays the month, as opposed to the defaults of _x and _y seen in the earlier example.

multi merge PythonScreenshot from VS Code, November 2022

COUNTIF/SUMIF/AVERAGEIF

In Excel, if you want to perform a statistical function based on a condition, you’re likely to use either COUNTIF, SUMIF, or AVERAGEIF.

Commonly, COUNTIF is used to determine how many times a specific string appears within a dataset, such as a URL.

We can accomplish this by declaring the ‘URL’ column as our range, then the URL within an individual cell as our criteria:

=COUNTIF(D:D,D2)
Excel countifScreenshot from Microsoft Excel, November 2022

In Pandas, we can achieve the same outcome by using the groupby function:

df.groupby('URL')['URL'].count()
Python groupbyScreenshot from VS Code, November 2022

Here, the column declared within the round brackets indicates the individual groups, and the column listed in the square brackets is where the aggregation (i.e., the count) is performed.

The output we’re receiving isn’t perfect for this use case, though, because it’s consolidated the data.

Typically, when using Excel, we’d have the URL count inline within our dataset. Then we can use it to filter to the most frequently listed URLs.

To do this, use transform and store the output in a column:

df['URL Count'] = df.groupby('URL')['URL'].transform('count')
Python groupby transformScreenshot from VS Code, November 2022

You can also apply custom functions to groups of data by using a lambda (anonymous) function:

df['Google Count'] = df.groupby(['URL'])['URL'].transform(lambda x: x[x.str.contains('google')].count())

In our examples so far, we’ve been using the same column for our grouping and aggregations, but we don’t have to. Similarly to COUNTIFS/SUMIFS/AVERAGEIFS in Excel, it’s possible to group using one column, then apply our statistical function to another.

Going back to the earlier search engine results page (SERP) example, we may want to count all ranking PDPs on a per-keyword basis and return this number alongside our existing data:

df['PDP Count'] = df.groupby(['Keyword'])['URL'].transform(lambda x: x[x.str.contains('/product/|/prd/|/pd/')].count())
Python groupby countifsScreenshot from VS Code, November 2022

Which in Excel parlance, would look something like this:

=SUM(COUNTIFS(A:A,[@Keyword],D:D,{"*/product/*","*/prd/*","*/pd/*"}))

Pivot Tables

Last, but by no means least, it’s time to talk pivot tables.

In Excel, a pivot table is likely to be our first port of call if we want to summarise a large dataset.

For instance, when working with ranking data, we may want to identify which URLs appear most frequently, and their average ranking position.

pivot table ExcelScreenshot from Microsoft Excel, November 2022

Again, Pandas has its own pivot tables equivalent – but if all you want is a count of unique values within a column, this can be accomplished using the value_counts function:

count = df['URL'].value_counts()

Using groupby is also an option.

Earlier in the article, performing a groupby that aggregated our data wasn’t what we wanted – but it’s precisely what’s required here:

grouped = df.groupby('URL').agg(
     url_frequency=('Keyword', 'count'),
     avg_position=('Position', 'mean'),
     )

grouped.reset_index(inplace=True)
groupby-pivot PythonScreenshot from VS Code, November 2022

Two aggregate functions have been applied in the example above, but this could easily be expanded upon, and 13 different types are available.

There are, of course, times when we do want to use pivot_table, such as when performing multi-dimensional operations.

To illustrate what this means, let’s reuse the ranking groupings we made using conditional statements and attempt to display the number of times a URL ranks within each group.

ranking_groupings = df.groupby(['URL', 'Grouping']).agg(
     url_frequency=('Keyword', 'count'),
     )
python groupby groupingScreenshot from VS Code, November 2022

This isn’t the best format to use, as multiple rows have been created for each URL.

Instead, we can use pivot_table, which will display the data in different columns:

pivot = pd.pivot_table(df,
index=['URL'],
columns=['Grouping'],
aggfunc="size",
fill_value=0,
)
pivot table PythonScreenshot from VS Code, November 2022

Final Thoughts

Whether you’re looking for inspiration to start learning Python, or are already leveraging it in your SEO workflows, I hope that the above examples help you along on your journey.

As promised, you can find a Google Colab notebook with all of the code snippets here.

In truth, we’ve barely scratched the surface of what’s possible, but understanding the basics of Python data analysis will give you a solid base upon which to build.

More resources:


Featured Image: mapo_japan/Shutterstock



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish