Connect with us

SEO

10 Ways Coding Skills Can Improve SEO Efforts

Published

on

10 Ways Coding Skills Can Improve SEO Efforts

It’s not necessary to know how to code to be a good SEO.

Coding skills are not a prerequisite for SEO competency, but additional skills always make one more effective.

Here are 10 ways that understanding code can help turn a good SEO into a great one.

1. HTML Coding Standards And SEO Go Together

An SEO familiar with HTML understands how a web document should be structured and is alert to the consequences of poor coding practices.

An important building block of a webpage is the HTML elements, which are to a webpage what a foundation, door, floor, and roof are to a house.

Search engines may be unable to properly crawl a web page if HTML elements are used incorrectly.

The official HTML specifications limit what HTML elements are used in the <head> section (location of metadata that only browsers and bots see) and which HTML elements are used in the <body> section (the document itself that users see).

But when you put <body> elements (like <a> or <div>) inside the <head> section where the metadata is supposed to be, search engines will begin rendering the webpage from the normally hidden <head> section, resulting in the metadata being indexed as part of the content itself. It means that Google will fail to index that webpage the way it’s supposed to be indexed.

That error can happen when a Facebook pixel code is placed in the wrong place within the <head> section of a webpage.

Another example of how a lack of coding knowledge influences SEO is the 400 error response message.

Some SEOs believe a 400 error code is a bad thing because they see that word “error” and instantly think it needs to be fixed because we understand errors as something to be fixed, especially when they’re displayed in Google Search Console as errors.

But an SEO who knows HTML coding standards understands that the 400 error response code only means that the browser REQUEST for a page is in error (because the page does not exist).

In most cases, that’s a good thing, it’s what’s supposed to happen, and there is nothing to fix.

Knowing HTML standards makes a person a better SEO because they have the ability to spot even more problems than an SEO who lacks coding knowledge.

They are also better positioned to dismiss common SEO misinformation that springs from a lack of coding ability.

2. Structured Data

Structured data is a markup language, which means the code has rules that govern how it is written.

There are a few different ways to express Schema.org structured data, but Google’s preference, JSON-LD structured data, is arguably the easiest to understand, which makes it easier to troubleshoot.

Like HTML, JSON-LD has rules that govern how it is written, with a nested structure where you have a subject of the structured data (called a Type) and then the attributes of that subject (called a Property).

Understanding JSON-LD structured data is easy, regardless if you know HTML or any other markup language.

The benefits of understanding how to code structured data cannot be overstated.

Correct structured data markup is essential for achieving many of the highly coveted rich results positions at the top of Google’s search engine results pages (a.k.a. SERPs).

Incorrect structured data markup will make that webpage ineligible for rich results.

One can rely on Google’s structured data markup checker to verify if the JSON-LD structured data is valid and if it’s eligible for a rich result.

But just because the tool says the code is valid doesn’t mean it’s eligible for rich results. This is where the ability to analyze JSON-LD comes into play to fix the structured data, so that rich results become an option.

Manual troubleshooting ability is important because Google’s structured data checker tells you when it’s broken and provides a general idea of where it’s broken. Still, it doesn’t tell you how to fix it.

One can rely on plugins, of course. There are benefits to setting something and forgetting about it.

But structured data specifications constantly evolve, and plugins don’t always keep up fast enough. Also, they aren’t always specific enough for every situation.

When ranking high in the search results, it’s generally best to know how to code JSON-LD structured data to obtain the highest advantage over the competition.

3. Communicate Better With Clients

Knowing how to code enables a person to simplify an explanation so that a non-coding client can understand the why of a particular problem and the solution.

One cannot explain what they do not understand.

For example, knowing how to code structured data empowers the SEO to explain that not only it is okay to combine structured data, but also explain to explain the benefits of doing so and how to do it.

Knowing how to code allows one to explain that a client only needs to drop in a few lines of code into their WordPress website’s child theme functions.php file to avoid installing a bloated plugin to do the same thing.

Leaving aside that an SEO without coding skills wouldn’t even know about the functions.php file solution, a person who codes and is literate in PHP can understand when it’s better to use a plugin over the coding solution and then explain it to the client.

Knowing how to code confers the ability to look at the HTML code and zero in on why the site isn’t indexed adequately or is performing poorly.

I once audited an ecommerce site that used a custom-made template and (poorly) featured a crazy level of incompetent coding. Just fixing those codes sitewide enabled the site to have its content indexed accurately.

Knowing HTML allowed me to catch the errors and then explain to the client why it was broken and how they could fix it.

4. .htaccess Knowledge Is Power

.htaccess is (in my opinion) a tricky language to learn but reasonably easy to understand how to use it.

Simply learning about the benefits of .htaccess and what it’s useful for, and then how to add it to a file can generally take a person far.

For example, you can use a plugin to redirect HTTP to HTTPS, a plugin to redirect specific pages that changed, and a plugin to fix broken URLs to the correct URL.

But all that can be accomplished with a .htaccess file.

Taking the time to educate oneself on .htaccess can help understand how to improve a website without resorting to another plugin.

A .htaccess file can also be used to prevent other sites from linking to your images and other media files (hotlinking).

The use of a .htaccess file can even be used to stop rogue bots from copying your content by blocking the IP address ranges of bad bots that repeatedly access a website.

Doing something like that with a .htaccess file is significantly better than using a plugin or mod that writes the IP addresses to a database because adding tens of thousands to millions of IP addresses to a database will dramatically slow your site down.

5. Diagnose Hidden Problems

In general, coding-related problems are tucked away from view in the HTML code.

Because most sites are templated, the errors will be multiplied across every page that shares the templated structure. Learning how to use an HTML validator is straightforward, but understanding HTML is important for interpreting the results.

Coding errors can be glaring and obvious, like omitting a closing bracket (>).

Or it could be subtle, like the use of a non-standard character in the code, like a smart quote, the curly type of quotation mark (“ ”) instead of the expected straight form of quotation mark (” “).

This error commonly occurs when someone copies code from a software device that inserts smart quotes as a default feature.

The curly quotes issue can dramatically disrupt how a webpage is indexed and parsed.

That means that if you use something like this in the HTML code:

<meta name=robots content=noindex>

Google will not see it because the curly quotes (smart quotes) stop it from seeing it as a meta robots tag and will therefore proceed to index the content.

Here’s another example.

If you code a link in this manner:

<a href="https://www.searchenginejournal.com/coding-skills-for-seo/458232/example.com/test.htm">example</a>

The link will be interpreted like this:

https://example.com/test.htm

If, however, you use curly quotes for the same code:

<a href=“example.com/test.htm”>example</a>

The link will be interpreted like this:

https://%E2%80%9Cexample.com/test.htm%E2%80%9D

These kinds of errors are not the kind of thing that an auditing tool is going to automatically find and conveniently add to a list.

You need to know how to code to recognize broken code on a visual inspection or at scale if it shows up as an anomaly on a Screaming Frog scan.

Otherwise, the source of a crawling error will stay hidden until someone who can read HTML or understands the output from an HTML validator can inspect the site.

6. Coding Can Help Break SEO Stalemates

The word stalemate is from the game of chess. It describes a situation where the gameplay is brought to a standstill in which neither side can move to win. It’s essentially a state that counts as a tie.

The same situation happens in competitive industries where everyone uses the same publishing platforms, the same optimization plugins, the same content strategies, and the same link promotion strategies.

The competition between the sites is largely equal, with no site having a clear advantage over the other.

An SEO with coding skills can break that kind of stalemate.

Coding skills allow an SEO to implement solutions that improve templates, CSS, and JavaScript.

For example, many templates ship with liberal use of headings for things that don’t require a heading element, like the navigation on the side panel.

With coding skills, it’s easy to create a child theme and fix the rogue heading elements so that they use CSS and not headings for styling on-page elements.

I’ve used my coding skill to completely change sections of a template so that it’s more user-friendly, change the colors of various on-page elements so that they’re more accessible for color-blind visitors, and add dynamic bits of content using PHP to custom-make title tags as well as to remove superfluous parts of a webpage.

Coding skills help provide a ranking edge to any site and can be used to improve the user experience beyond what a template offers.

It is especially important in competitive niches where competitors are optimized to the highest degree and where squeezing out advantage is at a premium.

7. Troubleshoot A Hacked Site

Website security doesn’t seem something an SEO should be concerned about.

But it becomes very clear that website security is indeed an SEO problem when the search rankings of a hacked site start to disappear.

Knowing how to code, particularly with gaining a general understanding of how PHP files work within a given content management system (CMS), can help demystify a hacking event.

Just knowing the broad outlines of how PHP works and how all the parts of the CMS work together goes a long way to understanding what went wrong and how to fix the problems.

Knowledge of JavaScript is also helpful. Many hacks are based on uploading JavaScript files or injecting JavaScript into other files.

Analyzing recently modified JavaScript files can help confirm that a site has been hacked. More to the point, it can help pinpoint if a specific plugin or WordPress itself is responsible for the hacking.

Some vulnerabilities can lay hidden for months or years before they are discovered. WordPress 5.9.2 was released to address cross-site scripting vulnerabilities that were in the WordPress core itself.

In the case of the WordPress vulnerability, the problem arose due to an arcane coding mistake where the order in which security processes were coded created the situation where a hacker could bypass those same security measures.

It illustrates how mistakes can sneak in through legitimate software and not necessarily be caught in time to prevent a hacking event.

Google might notify the site owner through Google Search Console about a hacked site, but Google Search Console won’t fix it for you.

Some knowledge of how HTML, JavaScript, and/or PHP works can go a long way toward confidently troubleshooting a hacked site.

8. Knowing How To Code Provides Control

When working in a corporate or educational environment where the templates are locked in, and one can’t plug in their way out of a predicament, knowing how to code can speed up the otherwise painful process of publishing webpages.

Whether one works in a Drupal or WordPress environment, having the ability to keep a cheat sheet of code snippets saves so much time, even with something trivial like changing a link without having to go through 10 steps using the native WYSIWYG interface and dealing with idiosyncratic code.

9. Optimize For Page Speed

The suggestions for improving page speed that Google’s PageSpeed Insights provides will no longer be cryptic once one learns how to code.

It’s not like one has to learn how to code an entire website from scratch, either.

All it takes is a general understanding of JavaScript, CSS, and HTML to make sense of what one is supposed to do to make a website work faster.

Concepts like inlining CSS, combining JavaScript, and minifying JavaScript makes more sense when one understands how servers deliver webpages and browsers render the data for site visitors.

10. Master Python

Python is a programming language that can be used to automate a wide range of SEO tasks from crawling, data analysis, natural language processing (NLP), and much more.

One of the great things about Python is that there might not be a need to code a tool from scratch because there are many Python SEO scripts that can be downloaded online.

A great thing about Python is that one doesn’t have to code scripts for all the different SEO tasks that are needed. Many of those scripts are available as downloadable Python libraries containing the relevant modules.

A Python library is a collection of modules. Python modules are the files themselves.

According to Ruth Everett in her Introduction to Python, these are some useful Python libraries:

  • “Pandas: Used for data manipulation and analysis.
  • NumPy: Useful for scientific computing.
  • SciPy: Used for scientific and technical computing.
  • SciKit Learn: Machine learning for data mining and analysis.
  • SpaCy: A great natural language processing library.
  • Requests: A library for making HTTP requests.
  • Beautiful Soup: Used to extract data from HTML and XML files.
  • Matplotlib: For creating visualizations from data.”

Another important Python library is TensorFlow, a free and open source library that can be used for creating machine learning applications.

With TensorFlow, a search marketer can build a neural network or a recommender system.

Directly related to SEO, TensorFlow can be used to automate the process of creating title tags at scale.

A skilled SEO who learns how to use Python will be able to scale their existing skills to new levels.

Learn How To Code

Gaining the ability to code is (arguably) optional, and one can still be a competent SEO without that knowledge.

A person who can code is not necessarily a better search marketer than one who doesn’t know how to code.

But learning how to code can make a good SEO an even better one because knowledge provides advantages.

More Resources:


Featured Image: ASDF_MEDIA/Shutterstock



Source link

SEO

Essential Functions For SEO Data Analysis

Published

on

Essential Functions For SEO Data Analysis

Learning to code, whether with PythonJavaScript, or another programming language, has a whole host of benefits, including the ability to work with larger datasets and automate repetitive tasks.

But despite the benefits, many SEO professionals are yet to make the transition – and I completely understand why! It isn’t an essential skill for SEO, and we’re all busy people.

If you’re pressed for time, and you already know how to accomplish a task within Excel or Google Sheets, then changing tack can feel like reinventing the wheel.

When I first started coding, I initially only used Python for tasks that I couldn’t accomplish in Excel – and it’s taken several years to get to the point where it’s my defacto choice for data processing.

Looking back, I’m incredibly glad that I persisted, but at times it was a frustrating experience, with many an hour spent scanning threads on Stack Overflow.

This post is designed to spare other SEO pros the same fate.

Within it, we’ll cover the Python equivalents of the most commonly used Excel formulas and features for SEO data analysis – all of which are available within a Google Colab notebook linked in the summary.

Specifically, you’ll learn the equivalents of:

  • LEN.
  • Drop Duplicates.
  • Text to Columns.
  • SEARCH/FIND.
  • CONCATENATE.
  • Find and Replace.
  • LEFT/MID/RIGHT.
  • IF.
  • IFS.
  • VLOOKUP.
  • COUNTIF/SUMIF/AVERAGEIF.
  • Pivot Tables.

Amazingly, to accomplish all of this, we’ll primarily be using a singular library – Pandas – with a little help in places from its big brother, NumPy.

Prerequisites

For the sake of brevity, there are a few things we won’t be covering today, including:

  • Installing Python.
  • Basic Pandas, like importing CSVs, filtering, and previewing dataframes.

If you’re unsure about any of this, then Hamlet’s guide on Python data analysis for SEO is the perfect primer.

Now, without further ado, let’s jump in.

LEN

LEN provides a count of the number of characters within a string of text.

For SEO specifically, a common use case is to measure the length of title tags or meta descriptions to determine whether they’ll be truncated in search results.

Within Excel, if we wanted to count the second cell of column A, we’d enter:

=LEN(A2)
Screenshot from Microsoft Excel, November 2022

Python isn’t too dissimilar, as we can rely on the inbuilt len function, which can be combined with Pandas’ loc[] to access a specific row of data within a column:

len(df['Title'].loc[0])

In this example, we’re getting the length of the first row in the “Title” column of our dataframe.

len function python
Screenshot of VS Code, November, 2022

Finding the length of a cell isn’t that useful for SEO, though. Normally, we’d want to apply a function to an entire column!

In Excel, this would be achieved by selecting the formula cell on the bottom right-hand corner and either dragging it down or double-clicking.

When working with a Pandas dataframe, we can use str.len to calculate the length of rows within a series, then store the results in a new column:

df['Length'] = df['Title'].str.len()

Str.len is a ‘vectorized’ operation, which is designed to be applied simultaneously to a series of values. We’ll use these operations extensively throughout this article, as they almost universally end up being faster than a loop.

Another common application of LEN is to combine it with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1

In Pandas, we can achieve this by combining the str.split and str.len functions together:

df['No. Words'] = df['Title'].str.split().str.len()

We’ll cover str.split in more detail later, but essentially, what we’re doing is splitting our data based upon whitespaces within the string, then counting the number of component parts.

word count PythonScreenshot from VS Code, November 2022

Dropping Duplicates

Excel’s ‘Remove Duplicates’ feature provides an easy way to remove duplicate values within a dataset, either by deleting entirely duplicate rows (when all columns are selected) or removing rows with the same values in specific columns.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

In Pandas, this functionality is provided by drop_duplicates.

To drop duplicate rows within a dataframe type:

df.drop_duplicates(inplace=True)

To drop rows based on duplicates within a singular column, include the subset parameter:

df.drop_duplicates(subset="column", inplace=True)

Or specify multiple columns within a list:

df.drop_duplicates(subset=['column','column2'], inplace=True)

One addition above that’s worth calling out is the presence of the inplace parameter. Including inplace=True allows us to overwrite our existing dataframe without needing to create a new one.

There are, of course, times when we want to preserve our raw data. In this case, we can assign our deduped dataframe to a different variable:

df2 = df.drop_duplicates(subset="column")

Text To Columns

Another everyday essential, the ‘text to columns’ feature can be used to split a text string based on a delimiter, such as a slash, comma, or whitespace.

As an example, splitting a URL into its domain and individual subfolders.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

When dealing with a dataframe, we can use the str.split function, which creates a list for each entry within a series. This can be converted into multiple columns by setting the expand parameter to True:

df['URL'].str.split(pat="/", expand=True)
str split PythonScreenshot from VS Code, November 2022

As is often the case, our URLs in the image above have been broken up into inconsistent columns, because they don’t feature the same number of folders.

This can make things tricky when we want to save our data within an existing dataframe.

Specifying the n parameter limits the number of splits, allowing us to create a specific number of columns:

df[['Domain', 'Folder1', 'Folder2', 'Folder3']] = df['URL'].str.split(pat="/", expand=True, n=3)

Another option is to use pop to remove your column from the dataframe, perform the split, and then re-add it with the join function:

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

Duplicating the URL to a new column before the split allows us to preserve the full URL. We can then rename the new columns:🐆

df['Split'] = df['URL']

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

df.rename(columns = {0:'Domain', 1:'Folder1', 2:'Folder2', 3:'Folder3', 4:'Parameter'}, inplace=True)
Split pop join functions PythonScreenshot from VS Code, November 2022

CONCATENATE

The CONCAT function allows users to combine multiple strings of text, such as when generating a list of keywords by adding different modifiers.

In this case, we’re adding “mens” and whitespace to column A’s list of product types:

=CONCAT($F$1," ",A2)
concat Excel
Screenshot from Microsoft Excel, November 2022

Assuming we’re dealing with strings, the same can be achieved in Python using the arithmetic operator:

df['Combined] = 'mens' + ' ' + df['Keyword']

Or specify multiple columns of data:

df['Combined'] = df['Subdomain'] + df['URL']
concat PythonScreenshot from VS Code, November 2022

Pandas has a dedicated concat function, but this is more useful when trying to combine multiple dataframes with the same columns.

For instance, if we had multiple exports from our favorite link analysis tool:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

SEARCH/FIND

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These commands are commonly combined with ISNUMBER to create a Boolean column that helps filter down a dataset, which can be extremely helpful when performing tasks like log file analysis, as explained in this guide. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)
isnumber search ExcelScreenshot from Microsoft Excel, November 2022

The difference between SEARCH and FIND is that find is case-sensitive.

The equivalent Pandas function, str.contains, is case-sensitive by default:

df['Journal'] = df['URL'].str.contains('engine', na=False)

Case insensitivity can be enabled by setting the case parameter to False:

df['Journal'] = df['URL'].str.contains('engine', case=False, na=False)

In either scenario, including na=False will prevent null values from being returned within the Boolean column.

One massive advantage of using Pandas here is that, unlike Excel, regex is natively supported by this function – as it is in Google sheets via REGEXMATCH.

Chain together multiple substrings by using the pipe character, also known as the OR operator:

df['Journal'] = df['URL'].str.contains('engine|search', na=False)

Find And Replace

Excel’s “Find and Replace” feature provides an easy way to individually or bulk replace one substring with another.

find replace ExcelScreenshot from Microsoft Excel, November 2022

When processing data for SEO, we’re most likely to select an entire column and “Replace All.”

The SUBSTITUTE formula provides another option here and is useful if you don’t want to overwrite the existing column.

As an example, we can change the protocol of a URL from HTTP to HTTPS, or remove it by replacing it with nothing.

When working with dataframes in Python, we can use str.replace:

df['URL'] = df['URL'].str.replace('http://', 'https://')

Or:

df['URL'] = df['URL'].str.replace('http://', '') # replace with nothing

Again, unlike Excel, regex can be used – like with Google Sheets’ REGEXREPLACE:

df['URL'] = df['URL'].str.replace('http://|https://', '')

Alternatively, if you want to replace multiple substrings with different values, you can use Python’s replace method and provide a list.

This prevents you from having to chain multiple str.replace functions:

df['URL'] = df['URL'].replace(['http://', ' https://'], ['https://www.', 'https://www.’], regex=True)

LEFT/MID/RIGHT

Extracting a substring within Excel requires the usage of the LEFT, MID, or RIGHT functions, depending on where the substring is located within a cell.

Let’s say we want to extract the root domain and subdomain from a URL:

=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)
left mid right ExcelScreenshot from Microsoft Excel, November 2022

Using a combination of MID and multiple FIND functions, this formula is ugly, to say the least – and things get a lot worse for more complex extractions.

Again, Google Sheets does this better than Excel, because it has REGEXEXTRACT.

What a shame that when you feed it larger datasets, it melts faster than a Babybel on a hot radiator.

Thankfully, Pandas offers str.extract, which works in a similar way:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)')
str extract PythonScreenshot from VS Code, November 2022

Combine with fillna to prevent null values, as you would in Excel with IFERROR:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)').fillna('-')

If

IF statements allow you to return different values, depending on whether or not a condition is met.

To illustrate, suppose that we want to create a label for keywords that are ranking within the top three positions.

Excel IFScreenshot from Microsoft Excel, November 2022

Rather than using Pandas in this instance, we can lean on NumPy and the where function (remember to import NumPy, if you haven’t already):

df['Top 3'] = np.where(df['Position'] <= 3, 'Top 3', 'Not Top 3')

Multiple conditions can be used for the same evaluation by using the AND/OR operators, and enclosing the individual criteria within round brackets:

df['Top 3'] = np.where((df['Position'] <= 3) & (df['Position'] != 0), 'Top 3', 'Not Top 3')

In the above, we’re returning “Top 3” for any keywords with a ranking less than or equal to three, excluding any keywords ranking in position zero.

IFS

Sometimes, rather than specifying multiple conditions for the same evaluation, you may want multiple conditions that return different values.

In this case, the best solution is using IFS:

=IFS(B2<=3,"Top 3",B2<=10,"Top 10",B2<=20,"Top 20")
IFS ExcelScreenshot from Microsoft Excel, November 2022

Again, NumPy provides us with the best solution when working with dataframes, via its select function.

With select, we can create a list of conditions, choices, and an optional value for when all of the conditions are false:

conditions = [df['Position'] <= 3, df['Position'] <= 10, df['Position'] <=20]

choices = ['Top 3', 'Top 10', 'Top 20']

df['Rank'] = np.select(conditions, choices, 'Not Top 20')

It’s also possible to have multiple conditions for each of the evaluations.

Let’s say we’re working with an ecommerce retailer with product listing pages (PLPs) and product display pages (PDPs), and we want to label the type of branded pages ranking within the top 10 results.

The easiest solution here is to look for specific URL patterns, such as a subfolder or extension, but what if competitors have similar patterns?

In this scenario, we could do something like this:

conditions = [(df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0),
(df['URL'].str.contains('/product/')) & (df['Brand Rank'] > 0),
(~df['URL'].str.contains('/product/')) & (~df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0)]

choices = ['PLP', 'PDP', 'Other']

df['Brand Page Type'] = np.select(conditions, choices, None)

Above, we’re using str.contains to evaluate whether or not a URL in the top 10 matches our brand’s pattern, then using the “Brand Rank” column to exclude any competitors.

In this example, the tilde sign (~) indicates a negative match. In other words, we’re saying we want every brand URL that doesn’t match the pattern for a “PDP” or “PLP” to match the criteria for ‘Other.’

Lastly, None is included because we want non-brand results to return a null value.

np select PythonScreenshot from VS Code, November 2022

VLOOKUP

VLOOKUP is an essential tool for joining together two distinct datasets on a common column.

In this case, adding the URLs within column N to the keyword, position, and search volume data in columns A-C, using the shared “Keyword” column:

=VLOOKUP(A2,M:N,2,FALSE)
vlookup ExcelScreenshot from Microsoft Excel, November 2022

To do something similar with Pandas, we can use merge.

Replicating the functionality of an SQL join, merge is an incredibly powerful function that supports a variety of different join types.

For our purposes, we want to use a left join, which will maintain our first dataframe and only merge in matching values from our second dataframe:

mergeddf = df.merge(df2, how='left', on='Keyword')

One added advantage of performing a merge over a VLOOKUP, is that you don’t have to have the shared data in the first column of the second dataset, as with the newer XLOOKUP.

It will also pull in multiple rows of data rather than the first match in finds.

One common issue when using the function is for unwanted columns to be duplicated. This occurs when multiple shared columns exist, but you attempt to match using one.

To prevent this – and improve the accuracy of your matches – you can specify a list of columns:

mergeddf = df.merge(df2, how='left', on=['Keyword', 'Search Volume'])

In certain scenarios, you may actively want these columns to be included. For instance, when attempting to merge multiple monthly ranking reports:

mergeddf = df.merge(df2, on='Keyword', how='left', suffixes=('', '_october'))
    .merge(df3, on='Keyword', how='left', suffixes=('', '_september'))

The above code snippet executes two merges to join together three dataframes with the same columns – which are our rankings for November, October, and September.

By labeling the months within the suffix parameters, we end up with a much cleaner dataframe that clearly displays the month, as opposed to the defaults of _x and _y seen in the earlier example.

multi merge PythonScreenshot from VS Code, November 2022

COUNTIF/SUMIF/AVERAGEIF

In Excel, if you want to perform a statistical function based on a condition, you’re likely to use either COUNTIF, SUMIF, or AVERAGEIF.

Commonly, COUNTIF is used to determine how many times a specific string appears within a dataset, such as a URL.

We can accomplish this by declaring the ‘URL’ column as our range, then the URL within an individual cell as our criteria:

=COUNTIF(D:D,D2)
Excel countifScreenshot from Microsoft Excel, November 2022

In Pandas, we can achieve the same outcome by using the groupby function:

df.groupby('URL')['URL'].count()
Python groupbyScreenshot from VS Code, November 2022

Here, the column declared within the round brackets indicates the individual groups, and the column listed in the square brackets is where the aggregation (i.e., the count) is performed.

The output we’re receiving isn’t perfect for this use case, though, because it’s consolidated the data.

Typically, when using Excel, we’d have the URL count inline within our dataset. Then we can use it to filter to the most frequently listed URLs.

To do this, use transform and store the output in a column:

df['URL Count'] = df.groupby('URL')['URL'].transform('count')
Python groupby transformScreenshot from VS Code, November 2022

You can also apply custom functions to groups of data by using a lambda (anonymous) function:

df['Google Count'] = df.groupby(['URL'])['URL'].transform(lambda x: x[x.str.contains('google')].count())

In our examples so far, we’ve been using the same column for our grouping and aggregations, but we don’t have to. Similarly to COUNTIFS/SUMIFS/AVERAGEIFS in Excel, it’s possible to group using one column, then apply our statistical function to another.

Going back to the earlier search engine results page (SERP) example, we may want to count all ranking PDPs on a per-keyword basis and return this number alongside our existing data:

df['PDP Count'] = df.groupby(['Keyword'])['URL'].transform(lambda x: x[x.str.contains('/product/|/prd/|/pd/')].count())
Python groupby countifsScreenshot from VS Code, November 2022

Which in Excel parlance, would look something like this:

=SUM(COUNTIFS(A:A,[@Keyword],D:D,{"*/product/*","*/prd/*","*/pd/*"}))

Pivot Tables

Last, but by no means least, it’s time to talk pivot tables.

In Excel, a pivot table is likely to be our first port of call if we want to summarise a large dataset.

For instance, when working with ranking data, we may want to identify which URLs appear most frequently, and their average ranking position.

pivot table ExcelScreenshot from Microsoft Excel, November 2022

Again, Pandas has its own pivot tables equivalent – but if all you want is a count of unique values within a column, this can be accomplished using the value_counts function:

count = df['URL'].value_counts()

Using groupby is also an option.

Earlier in the article, performing a groupby that aggregated our data wasn’t what we wanted – but it’s precisely what’s required here:

grouped = df.groupby('URL').agg(
     url_frequency=('Keyword', 'count'),
     avg_position=('Position', 'mean'),
     )

grouped.reset_index(inplace=True)
groupby-pivot PythonScreenshot from VS Code, November 2022

Two aggregate functions have been applied in the example above, but this could easily be expanded upon, and 13 different types are available.

There are, of course, times when we do want to use pivot_table, such as when performing multi-dimensional operations.

To illustrate what this means, let’s reuse the ranking groupings we made using conditional statements and attempt to display the number of times a URL ranks within each group.

ranking_groupings = df.groupby(['URL', 'Grouping']).agg(
     url_frequency=('Keyword', 'count'),
     )
python groupby groupingScreenshot from VS Code, November 2022

This isn’t the best format to use, as multiple rows have been created for each URL.

Instead, we can use pivot_table, which will display the data in different columns:

pivot = pd.pivot_table(df,
index=['URL'],
columns=['Grouping'],
aggfunc="size",
fill_value=0,
)
pivot table PythonScreenshot from VS Code, November 2022

Final Thoughts

Whether you’re looking for inspiration to start learning Python, or are already leveraging it in your SEO workflows, I hope that the above examples help you along on your journey.

As promised, you can find a Google Colab notebook with all of the code snippets here.

In truth, we’ve barely scratched the surface of what’s possible, but understanding the basics of Python data analysis will give you a solid base upon which to build.

More resources:


Featured Image: mapo_japan/Shutterstock



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish