Connect with us

SEO

How To Perform A SEO SWOT Analysis

Published

on

How To Perform A SEO SWOT Analysis

For most organizations, implementing an effective SEO (search engine optimization) strategy involves collecting and analyzing significant amounts of keywords, content, analytics, and competitive data from various sources.

SEO professionals then need to use this data to prioritize keyword, content, structural, and/or linking tasks to address issues or build on existing organic search authority.

One familiar method of prioritization, which lends itself well to helping focus attention and often maximize limited SEO and marketing resources, is the SWOT (Strengths, Weaknesses, Opportunities, Threats) framework.

A SWOT, by definition, is geared to help identify items with the biggest potential impact on growth – or the most dangerous threats.

The following breakdown of organizational SEO priorities assumes keyword research has already been done and is being used for the website, SERP (Search Engine Results Page), and competitive data, which will be the foundation of an effective SWOT.

Keyword research alone is often deserving of its own SWOT process.

Strengths

One of the primary factors search engines use in determining your organic search visibility is an organization’s relative strength and authority for a topical group of keywords.

Identifying those keywords for which the organization already has some authority – or as some like to call “momentum” in the eyes of the search engines – is an excellent place to begin focusing your attention.

Authority is generally difficult to come by and takes time to establish, so why not build on what you already have.

Your first question should be, “Which pieces of content do I have that rank well (let’s say in the top 20 results) in the search engines for my primary keyword groups?”

Recognizing where you have existing strength can be leveraged in three ways:

  1. Look for opportunities to link out from or to your strongest pieces of content. This can have the dual effect of reinforcing your original piece of content by linking to more comprehensive answers to your audiences’ questions and borrowing from the authority of the strongest piece.
  2. Perform full-page keyword, technical, and link audits on all webpages that rank between positions five and 20 to see where any improvements can be made to move them higher in the SERPs. This may mean adjusting title tags, headings, or updating links to more current or relevant sources.
  3. Determine whether the “right” landing pages rank for the keywords you want to be found for. While it may seem great to have your homepage rank for several of your keywords, this is not optimal.

Searchers who land on your homepage looking for something specific will have to spend more time clicking or searching again to find the exact answer to their question.

Identify the pages you have that provide answers, and focus on having them usurp the position currently maintained by the homepage.

If you determine such pages don’t exist, then it’s time to create them.

Be sure to also pay attention to the types and characteristics of your strongest content pieces as signals to what content to create moving forward.

For example, if you have videos ranking well on Google and/or YouTube, by all means, create more videos.

If long-form blog posts dominate the top of the search results for your primary keywords, this is your cue to publish and share more of the same.

Weaknesses

We all have our weaknesses; when it comes to SEO, recognizing and admitting them early on can save us a great deal of effort, time, money, and lost business.

Keywords And Content

While there are undoubtedly keyword groups we feel we must be found for, it’s important to let go of those which will require too much time and/or effort to establish authority for.

Generally, a quick review of the search engine results will reveal keywords that are out of reach based on your competitors’ size, age, reputation, and quality of content.

In this case, looking at the more specific long-tail and intent-driven keyword alternatives may be necessary or considering other avenues (including paid) to generate visibility, traffic, and conversions.

Sometimes, the best strategy is to employ complementary paid search tactics until you can establish organic search authority.

Technical Audit

Another area of weakness, which you can readily control more, maybe the quality of your own website and content from a technical/structural, keyword relevance, or depth perspective.

You can begin identifying areas of weakness by conducting an SEO audit.

There are several excellent free and paid tools available, including Google Lighthouse and Search Console (specifically the Core Web Vitals Report and Mobile-Friendly Test), which will provide a prioritized list of issues and/or errors found in the title and heading tags, internal and external links, website code, keyword usage/density, and a myriad of mobile-friendly factors.

Screenshot of Lighthouse in Chrome Dev Tools, July 2022

As noted above, you should start by focusing on and fixing any issues found on those pages for which you already have some authority based on search engine results.

Optimizing these pages can only help improve their chances of moving up the SERPs.

You can move on to other priority web pages based on website analytics data or strategic importance.

Backlinks

Organically obtained, relevant, quality backlinks (aka inbound links) are still a search engine ranking factor as they speak to, and can enhance, the authority of the site to which they link.

As with site auditing, many good third-party backlink tools can reveal where you maintain backlinks. These are particularly useful for looking at the backlink sources of your strongest-known competitors.

Where appropriate, you may want to reach out to obtain links from the same relevant sources to leverage their authority.

Opportunities

In SEO, opportunities abound for those who know how, where, and who take the time to look.

SEO is really about moving from one opportunity to the next.

Once optimization is deemed successful for one group of keywords or pieces of content, it’s time to move to the next topic upon which authority can be established or reinforced.

Keywords And Content

Several keyword research tools like Ahrefs, Semrush, and others can discover both keyword and content opportunities or gaps based on providing your website domain, the domains of your known competitors, or a targeted list of keywords.

Most provide prioritized lists of potentially high-value keywords based on estimated monthly search volumes, organic traffic, and/or relative competition.

In other words: Which high-value keywords are your competitors ranking for which you are not?

As with the Weaknesses above, part of this analysis should consider the level of effort required to obtain authority relative to the potential return on establishing organic visibility.

Is it a worthwhile opportunity?

Semrush Keyword Gap ToolScreenshot of Semrush Keyword Gap tool, July 2022
A more manual process for discovering keyword and content opportunities is to run a reverse website audit on competitors’ websites.

Or, spend some time simply reviewing your top competitors’ primary pages, paying particular attention to the keywords used in title tags, headings, and internal link anchor text.

These are presumably the keywords that matter most to them.

However, be careful, as this strategy assumes the competition has conducted their own keyword research and has been following SEO best practices, which may or may not always be the case.

Focusing on those competitors who rank well for your primary keywords should single out the ones who are intentionally optimizing for search.

Content Refresh

Another opportunity within a web presence is the refresh of top-performing or complementary content.

First, scan the SERPs or a preferred keyword tool to identify older content that is ranking for target keywords or serves to support other primary content pages.

Then, review this content to see where there may be opportunities to update text, images, internal/external links, or any other components.

Perhaps there’s an opportunity to enhance the piece by creating and adding images or videos.

Finally, re-share this content via appropriate channels, and perhaps consider identifying new avenues – as a previously popular piece of content will likely perform well again.

Existing content offers an excellent opportunity to build authority, often with just a little extra effort.

Backlinks

While typically a manually intensive process, there is long-term value in seeking out backlinks.

Ideally, you want to identify relevant, authoritative websites/domains from which high-quality inbound links can be obtained.

There are several sources you can use to start looking for inbound links:

The SERPs for your primary keywords are a natural backlink research starting point, as the websites found here are, by definition, considered “relevant” and “authoritative” by the search engines.

Of particular interest are those sites which rank ahead of yours because they presumably have higher authority upon which you can piggyback.

Look for any non-competitive backlinking opportunities such as directories, association listings, or articles and blog posts that you may be able to contribute to, get mentioned in, or comment on.

The Google Search Console Links Report is the next best resource for backlink research, as it indicates what Google recognizes as the domains linking to your content.

Here you can validate the quality and accuracy of the links you already have, as well as determine if there are any other opportunities to obtain additional links from these same domains.

Referral sources in Google Analytics represent external sites that send you traffic but may or may not be providing an organic search boost.

Review these domains/sites regularly to see other linking opportunities.

4. As noted under Weaknesses, several third-party backlink tools can be used to identify potential backlink sources where links to your competitors can be found.

Some will even help by authority ranking and prioritizing the value of each existing and potential source, which can save significant time.

Threats

Whether done intentionally or not, there are more than a few things which can threaten organic authority in the eyes of the search engines and should be prioritized to avoid potentially damaging penalties.

Content

The primary content threat most are familiar with is duplicate content, which, as the name suggests, is content repurposed on a website without proper attribution to the original source.

To avoid being penalized for using this type of content, you must be sure to include rel canonical tags by referencing the source content in the headers of pages containing the duplicate content.

In other words: It’s okay to have some duplicate content on a website, as long as the original source is properly identified.

Backlinks

While relevant, high-quality backlinks can help boost your authority, irrelevant, low-quality inbound links from non-reputable sites (particularly those that are part of paid link schemes) can do long-lasting harm and even get you tagged with a manual penalty.

The threat here is a potential loss of organic visibility and traffic.

Further, recovering from a manual penalty is not an easy or quick process.

Simply put, you should never pay for backlinks and ensure any backlinks you acquire have not been purchased on your behalf by a third party, like a marketing agency.

As such, you should regularly review the Google Search Console Links report or other backlink reporting sources for questionable domains or those you don’t recognize as relevant.

Competitors

All online competitors creating their own content represent threats to your authority.

Even if you maintain strong organic visibility and traffic relative to your “known” competitors, there is always the potential for new, aggressive, or unknown competitors to come onto the scene.

Many of the aforementioned SEO tools provide competitor discovery tools to help quickly identify domains that consistently appear in the search results for your primary keywords.

Oftentimes, there may be competitors here you’ve never considered. You’ll naturally want to pay attention to these competitors and use the tactics noted above to see what you can learn from them.

Search engines love and reward fresh, relevant content, and Google even has a freshness algorithm to identify it.

As such, you should regularly monitor the search engine results for new entrants, which may, over time, challenge your authority and position.

Of course, the best way to combat this type of threat is by continuing to publish and update your own comprehensive content, which will give the search engines less reason to question your authority.

Actioning On The SWOT

The detailed SWOT outputs will map prioritized actions to protect and/or improve online authority, visibility, and resulting traffic, leads, and revenue.

Proactive search marketers should conduct these analyses on at least a bi-annual, if not quarterly, basis, depending on how competitive the industry is and how active the competitors are.

A well-structured SWOT can provide an excellent roadmap for where, when, and how often action needs to be taken or content needs to be created and shared to boost your organization’s primary SEO goals.

More Resources:


Featured Image: Rawpixel.com/Shutterstock



Source link

SEO

Essential Functions For SEO Data Analysis

Published

on

Essential Functions For SEO Data Analysis

Learning to code, whether with PythonJavaScript, or another programming language, has a whole host of benefits, including the ability to work with larger datasets and automate repetitive tasks.

But despite the benefits, many SEO professionals are yet to make the transition – and I completely understand why! It isn’t an essential skill for SEO, and we’re all busy people.

If you’re pressed for time, and you already know how to accomplish a task within Excel or Google Sheets, then changing tack can feel like reinventing the wheel.

When I first started coding, I initially only used Python for tasks that I couldn’t accomplish in Excel – and it’s taken several years to get to the point where it’s my defacto choice for data processing.

Looking back, I’m incredibly glad that I persisted, but at times it was a frustrating experience, with many an hour spent scanning threads on Stack Overflow.

This post is designed to spare other SEO pros the same fate.

Within it, we’ll cover the Python equivalents of the most commonly used Excel formulas and features for SEO data analysis – all of which are available within a Google Colab notebook linked in the summary.

Specifically, you’ll learn the equivalents of:

  • LEN.
  • Drop Duplicates.
  • Text to Columns.
  • SEARCH/FIND.
  • CONCATENATE.
  • Find and Replace.
  • LEFT/MID/RIGHT.
  • IF.
  • IFS.
  • VLOOKUP.
  • COUNTIF/SUMIF/AVERAGEIF.
  • Pivot Tables.

Amazingly, to accomplish all of this, we’ll primarily be using a singular library – Pandas – with a little help in places from its big brother, NumPy.

Prerequisites

For the sake of brevity, there are a few things we won’t be covering today, including:

  • Installing Python.
  • Basic Pandas, like importing CSVs, filtering, and previewing dataframes.

If you’re unsure about any of this, then Hamlet’s guide on Python data analysis for SEO is the perfect primer.

Now, without further ado, let’s jump in.

LEN

LEN provides a count of the number of characters within a string of text.

For SEO specifically, a common use case is to measure the length of title tags or meta descriptions to determine whether they’ll be truncated in search results.

Within Excel, if we wanted to count the second cell of column A, we’d enter:

=LEN(A2)
Screenshot from Microsoft Excel, November 2022

Python isn’t too dissimilar, as we can rely on the inbuilt len function, which can be combined with Pandas’ loc[] to access a specific row of data within a column:

len(df['Title'].loc[0])

In this example, we’re getting the length of the first row in the “Title” column of our dataframe.

len function python
Screenshot of VS Code, November, 2022

Finding the length of a cell isn’t that useful for SEO, though. Normally, we’d want to apply a function to an entire column!

In Excel, this would be achieved by selecting the formula cell on the bottom right-hand corner and either dragging it down or double-clicking.

When working with a Pandas dataframe, we can use str.len to calculate the length of rows within a series, then store the results in a new column:

df['Length'] = df['Title'].str.len()

Str.len is a ‘vectorized’ operation, which is designed to be applied simultaneously to a series of values. We’ll use these operations extensively throughout this article, as they almost universally end up being faster than a loop.

Another common application of LEN is to combine it with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1

In Pandas, we can achieve this by combining the str.split and str.len functions together:

df['No. Words'] = df['Title'].str.split().str.len()

We’ll cover str.split in more detail later, but essentially, what we’re doing is splitting our data based upon whitespaces within the string, then counting the number of component parts.

word count PythonScreenshot from VS Code, November 2022

Dropping Duplicates

Excel’s ‘Remove Duplicates’ feature provides an easy way to remove duplicate values within a dataset, either by deleting entirely duplicate rows (when all columns are selected) or removing rows with the same values in specific columns.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

In Pandas, this functionality is provided by drop_duplicates.

To drop duplicate rows within a dataframe type:

df.drop_duplicates(inplace=True)

To drop rows based on duplicates within a singular column, include the subset parameter:

df.drop_duplicates(subset="column", inplace=True)

Or specify multiple columns within a list:

df.drop_duplicates(subset=['column','column2'], inplace=True)

One addition above that’s worth calling out is the presence of the inplace parameter. Including inplace=True allows us to overwrite our existing dataframe without needing to create a new one.

There are, of course, times when we want to preserve our raw data. In this case, we can assign our deduped dataframe to a different variable:

df2 = df.drop_duplicates(subset="column")

Text To Columns

Another everyday essential, the ‘text to columns’ feature can be used to split a text string based on a delimiter, such as a slash, comma, or whitespace.

As an example, splitting a URL into its domain and individual subfolders.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

When dealing with a dataframe, we can use the str.split function, which creates a list for each entry within a series. This can be converted into multiple columns by setting the expand parameter to True:

df['URL'].str.split(pat="/", expand=True)
str split PythonScreenshot from VS Code, November 2022

As is often the case, our URLs in the image above have been broken up into inconsistent columns, because they don’t feature the same number of folders.

This can make things tricky when we want to save our data within an existing dataframe.

Specifying the n parameter limits the number of splits, allowing us to create a specific number of columns:

df[['Domain', 'Folder1', 'Folder2', 'Folder3']] = df['URL'].str.split(pat="/", expand=True, n=3)

Another option is to use pop to remove your column from the dataframe, perform the split, and then re-add it with the join function:

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

Duplicating the URL to a new column before the split allows us to preserve the full URL. We can then rename the new columns:🐆

df['Split'] = df['URL']

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

df.rename(columns = {0:'Domain', 1:'Folder1', 2:'Folder2', 3:'Folder3', 4:'Parameter'}, inplace=True)
Split pop join functions PythonScreenshot from VS Code, November 2022

CONCATENATE

The CONCAT function allows users to combine multiple strings of text, such as when generating a list of keywords by adding different modifiers.

In this case, we’re adding “mens” and whitespace to column A’s list of product types:

=CONCAT($F$1," ",A2)
concat Excel
Screenshot from Microsoft Excel, November 2022

Assuming we’re dealing with strings, the same can be achieved in Python using the arithmetic operator:

df['Combined] = 'mens' + ' ' + df['Keyword']

Or specify multiple columns of data:

df['Combined'] = df['Subdomain'] + df['URL']
concat PythonScreenshot from VS Code, November 2022

Pandas has a dedicated concat function, but this is more useful when trying to combine multiple dataframes with the same columns.

For instance, if we had multiple exports from our favorite link analysis tool:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

SEARCH/FIND

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These commands are commonly combined with ISNUMBER to create a Boolean column that helps filter down a dataset, which can be extremely helpful when performing tasks like log file analysis, as explained in this guide. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)
isnumber search ExcelScreenshot from Microsoft Excel, November 2022

The difference between SEARCH and FIND is that find is case-sensitive.

The equivalent Pandas function, str.contains, is case-sensitive by default:

df['Journal'] = df['URL'].str.contains('engine', na=False)

Case insensitivity can be enabled by setting the case parameter to False:

df['Journal'] = df['URL'].str.contains('engine', case=False, na=False)

In either scenario, including na=False will prevent null values from being returned within the Boolean column.

One massive advantage of using Pandas here is that, unlike Excel, regex is natively supported by this function – as it is in Google sheets via REGEXMATCH.

Chain together multiple substrings by using the pipe character, also known as the OR operator:

df['Journal'] = df['URL'].str.contains('engine|search', na=False)

Find And Replace

Excel’s “Find and Replace” feature provides an easy way to individually or bulk replace one substring with another.

find replace ExcelScreenshot from Microsoft Excel, November 2022

When processing data for SEO, we’re most likely to select an entire column and “Replace All.”

The SUBSTITUTE formula provides another option here and is useful if you don’t want to overwrite the existing column.

As an example, we can change the protocol of a URL from HTTP to HTTPS, or remove it by replacing it with nothing.

When working with dataframes in Python, we can use str.replace:

df['URL'] = df['URL'].str.replace('http://', 'https://')

Or:

df['URL'] = df['URL'].str.replace('http://', '') # replace with nothing

Again, unlike Excel, regex can be used – like with Google Sheets’ REGEXREPLACE:

df['URL'] = df['URL'].str.replace('http://|https://', '')

Alternatively, if you want to replace multiple substrings with different values, you can use Python’s replace method and provide a list.

This prevents you from having to chain multiple str.replace functions:

df['URL'] = df['URL'].replace(['http://', ' https://'], ['https://www.', 'https://www.’], regex=True)

LEFT/MID/RIGHT

Extracting a substring within Excel requires the usage of the LEFT, MID, or RIGHT functions, depending on where the substring is located within a cell.

Let’s say we want to extract the root domain and subdomain from a URL:

=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)
left mid right ExcelScreenshot from Microsoft Excel, November 2022

Using a combination of MID and multiple FIND functions, this formula is ugly, to say the least – and things get a lot worse for more complex extractions.

Again, Google Sheets does this better than Excel, because it has REGEXEXTRACT.

What a shame that when you feed it larger datasets, it melts faster than a Babybel on a hot radiator.

Thankfully, Pandas offers str.extract, which works in a similar way:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)')
str extract PythonScreenshot from VS Code, November 2022

Combine with fillna to prevent null values, as you would in Excel with IFERROR:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)').fillna('-')

If

IF statements allow you to return different values, depending on whether or not a condition is met.

To illustrate, suppose that we want to create a label for keywords that are ranking within the top three positions.

Excel IFScreenshot from Microsoft Excel, November 2022

Rather than using Pandas in this instance, we can lean on NumPy and the where function (remember to import NumPy, if you haven’t already):

df['Top 3'] = np.where(df['Position'] <= 3, 'Top 3', 'Not Top 3')

Multiple conditions can be used for the same evaluation by using the AND/OR operators, and enclosing the individual criteria within round brackets:

df['Top 3'] = np.where((df['Position'] <= 3) & (df['Position'] != 0), 'Top 3', 'Not Top 3')

In the above, we’re returning “Top 3” for any keywords with a ranking less than or equal to three, excluding any keywords ranking in position zero.

IFS

Sometimes, rather than specifying multiple conditions for the same evaluation, you may want multiple conditions that return different values.

In this case, the best solution is using IFS:

=IFS(B2<=3,"Top 3",B2<=10,"Top 10",B2<=20,"Top 20")
IFS ExcelScreenshot from Microsoft Excel, November 2022

Again, NumPy provides us with the best solution when working with dataframes, via its select function.

With select, we can create a list of conditions, choices, and an optional value for when all of the conditions are false:

conditions = [df['Position'] <= 3, df['Position'] <= 10, df['Position'] <=20]

choices = ['Top 3', 'Top 10', 'Top 20']

df['Rank'] = np.select(conditions, choices, 'Not Top 20')

It’s also possible to have multiple conditions for each of the evaluations.

Let’s say we’re working with an ecommerce retailer with product listing pages (PLPs) and product display pages (PDPs), and we want to label the type of branded pages ranking within the top 10 results.

The easiest solution here is to look for specific URL patterns, such as a subfolder or extension, but what if competitors have similar patterns?

In this scenario, we could do something like this:

conditions = [(df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0),
(df['URL'].str.contains('/product/')) & (df['Brand Rank'] > 0),
(~df['URL'].str.contains('/product/')) & (~df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0)]

choices = ['PLP', 'PDP', 'Other']

df['Brand Page Type'] = np.select(conditions, choices, None)

Above, we’re using str.contains to evaluate whether or not a URL in the top 10 matches our brand’s pattern, then using the “Brand Rank” column to exclude any competitors.

In this example, the tilde sign (~) indicates a negative match. In other words, we’re saying we want every brand URL that doesn’t match the pattern for a “PDP” or “PLP” to match the criteria for ‘Other.’

Lastly, None is included because we want non-brand results to return a null value.

np select PythonScreenshot from VS Code, November 2022

VLOOKUP

VLOOKUP is an essential tool for joining together two distinct datasets on a common column.

In this case, adding the URLs within column N to the keyword, position, and search volume data in columns A-C, using the shared “Keyword” column:

=VLOOKUP(A2,M:N,2,FALSE)
vlookup ExcelScreenshot from Microsoft Excel, November 2022

To do something similar with Pandas, we can use merge.

Replicating the functionality of an SQL join, merge is an incredibly powerful function that supports a variety of different join types.

For our purposes, we want to use a left join, which will maintain our first dataframe and only merge in matching values from our second dataframe:

mergeddf = df.merge(df2, how='left', on='Keyword')

One added advantage of performing a merge over a VLOOKUP, is that you don’t have to have the shared data in the first column of the second dataset, as with the newer XLOOKUP.

It will also pull in multiple rows of data rather than the first match in finds.

One common issue when using the function is for unwanted columns to be duplicated. This occurs when multiple shared columns exist, but you attempt to match using one.

To prevent this – and improve the accuracy of your matches – you can specify a list of columns:

mergeddf = df.merge(df2, how='left', on=['Keyword', 'Search Volume'])

In certain scenarios, you may actively want these columns to be included. For instance, when attempting to merge multiple monthly ranking reports:

mergeddf = df.merge(df2, on='Keyword', how='left', suffixes=('', '_october'))
    .merge(df3, on='Keyword', how='left', suffixes=('', '_september'))

The above code snippet executes two merges to join together three dataframes with the same columns – which are our rankings for November, October, and September.

By labeling the months within the suffix parameters, we end up with a much cleaner dataframe that clearly displays the month, as opposed to the defaults of _x and _y seen in the earlier example.

multi merge PythonScreenshot from VS Code, November 2022

COUNTIF/SUMIF/AVERAGEIF

In Excel, if you want to perform a statistical function based on a condition, you’re likely to use either COUNTIF, SUMIF, or AVERAGEIF.

Commonly, COUNTIF is used to determine how many times a specific string appears within a dataset, such as a URL.

We can accomplish this by declaring the ‘URL’ column as our range, then the URL within an individual cell as our criteria:

=COUNTIF(D:D,D2)
Excel countifScreenshot from Microsoft Excel, November 2022

In Pandas, we can achieve the same outcome by using the groupby function:

df.groupby('URL')['URL'].count()
Python groupbyScreenshot from VS Code, November 2022

Here, the column declared within the round brackets indicates the individual groups, and the column listed in the square brackets is where the aggregation (i.e., the count) is performed.

The output we’re receiving isn’t perfect for this use case, though, because it’s consolidated the data.

Typically, when using Excel, we’d have the URL count inline within our dataset. Then we can use it to filter to the most frequently listed URLs.

To do this, use transform and store the output in a column:

df['URL Count'] = df.groupby('URL')['URL'].transform('count')
Python groupby transformScreenshot from VS Code, November 2022

You can also apply custom functions to groups of data by using a lambda (anonymous) function:

df['Google Count'] = df.groupby(['URL'])['URL'].transform(lambda x: x[x.str.contains('google')].count())

In our examples so far, we’ve been using the same column for our grouping and aggregations, but we don’t have to. Similarly to COUNTIFS/SUMIFS/AVERAGEIFS in Excel, it’s possible to group using one column, then apply our statistical function to another.

Going back to the earlier search engine results page (SERP) example, we may want to count all ranking PDPs on a per-keyword basis and return this number alongside our existing data:

df['PDP Count'] = df.groupby(['Keyword'])['URL'].transform(lambda x: x[x.str.contains('/product/|/prd/|/pd/')].count())
Python groupby countifsScreenshot from VS Code, November 2022

Which in Excel parlance, would look something like this:

=SUM(COUNTIFS(A:A,[@Keyword],D:D,{"*/product/*","*/prd/*","*/pd/*"}))

Pivot Tables

Last, but by no means least, it’s time to talk pivot tables.

In Excel, a pivot table is likely to be our first port of call if we want to summarise a large dataset.

For instance, when working with ranking data, we may want to identify which URLs appear most frequently, and their average ranking position.

pivot table ExcelScreenshot from Microsoft Excel, November 2022

Again, Pandas has its own pivot tables equivalent – but if all you want is a count of unique values within a column, this can be accomplished using the value_counts function:

count = df['URL'].value_counts()

Using groupby is also an option.

Earlier in the article, performing a groupby that aggregated our data wasn’t what we wanted – but it’s precisely what’s required here:

grouped = df.groupby('URL').agg(
     url_frequency=('Keyword', 'count'),
     avg_position=('Position', 'mean'),
     )

grouped.reset_index(inplace=True)
groupby-pivot PythonScreenshot from VS Code, November 2022

Two aggregate functions have been applied in the example above, but this could easily be expanded upon, and 13 different types are available.

There are, of course, times when we do want to use pivot_table, such as when performing multi-dimensional operations.

To illustrate what this means, let’s reuse the ranking groupings we made using conditional statements and attempt to display the number of times a URL ranks within each group.

ranking_groupings = df.groupby(['URL', 'Grouping']).agg(
     url_frequency=('Keyword', 'count'),
     )
python groupby groupingScreenshot from VS Code, November 2022

This isn’t the best format to use, as multiple rows have been created for each URL.

Instead, we can use pivot_table, which will display the data in different columns:

pivot = pd.pivot_table(df,
index=['URL'],
columns=['Grouping'],
aggfunc="size",
fill_value=0,
)
pivot table PythonScreenshot from VS Code, November 2022

Final Thoughts

Whether you’re looking for inspiration to start learning Python, or are already leveraging it in your SEO workflows, I hope that the above examples help you along on your journey.

As promised, you can find a Google Colab notebook with all of the code snippets here.

In truth, we’ve barely scratched the surface of what’s possible, but understanding the basics of Python data analysis will give you a solid base upon which to build.

More resources:


Featured Image: mapo_japan/Shutterstock



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish