Connect with us

SEO

What is Customer Relationship Management?

Published

on

What is Customer Relationship Management?


Customer relationship management (CRM) is the system that a company sets up to manage prospect and customer interactions and analyze their data to improve business relationships.

By setting up this system, the company streamlines the process of staying connected to their prospects and customers, ultimately driving up retention and sales. According to Agile CRM and Salesforce, a company can boost sales by 29% and conversion rates by up to 300% with proper CRM implementation.

This blog post is an in-depth guide on customer relationship management, how it can help your business, my software recommendations, and more.

  1. What a CRM system does
  2. How customer relationship management helps your business
  3. Types of CRM platforms
  4. My recommended CRM software
  5. Key takeaway

What a CRM system does

A CRM system consists of two things: a strategy and a software. Without a strategy, CRM software cannot be maximized by the marketing and sales team. Without software, managing customer relationships and doing data analysis would be nearly impossible to do especially when your customers are in the thousands.

So, what exactly does a CRM system do?

To help you visualize these functions, I will show you photos of the customer relationship management software I use: Drip.

Manages leads and customers

A customer first enters your CRM software through a sign-up for a free demo, lead magnet download, account creation, and so on and so forth. Their data then enters your chosen software and gets updated either through their actions or yours.

Screenshot user profile Drip

For example, this subscriber first entered last August 13 when they visited my website. They placed an order the following day, updated their information, then subscribed to my marketing emails.

They went through various Workflows and received my marketing emails after that, depending on the actions they took. And since they are active in engaging my emails, they also automatically got the appropriate tags and was eventually noted by Drip as a lead.

Having a CRM software then isn’t just for storing customer data, but for applying some of the strategies that the team has planned to ensure that customers will remain as such, leads will be followed up on, and opportunities for sales will be maximized.

You can also create customer segments for a more personalized approach. For example, your marketing and sales strategy towards highly engaged leads and disengaged ones would be very different. You would want to coax back disengaged ones (or prune them) and turn the highly engaged leads into sales.

Highly engaged leads in Drip

Automates marketing

I mentioned above that our subscriber received my marketing emails depending on the actions they took. Instead of me or my marketing team having to check every single action of every single customer, we already discuss our strategies and ensure that we are at least one or two steps ahead of our customers.

This helps us automate tasks that can be automated, such as creating loyalty tiers for people who accumulate lifetime values. This also ensures that a marketing email is automatically sent to those who sign up for anything on our website, and follow-up emails are sent to those who purchased a product or downloaded our lead magnets.

State of SEO 2022 report

Optimizes processes through Workflows

This is perhaps my favorite feature when it comes to customer relationship management software: Workflows.

Customer relationship management Workflow

According to Pipefy, “You can think of a CRM Workflow as a line of dominoes you need to knock over to get from an initial trigger to your desired outcome.”

As you can see, I just added a trigger (Segments), and two actions (Delay and Email). Once a day, Drip will check the segment that I selected if there are new people, and the Workflow will automatically activate for those customers.

There are multiple uses for Workflows. You can use it to add tags to users who complete your preferred actions, send warm-up and follow-up emails to your customers, create abandoned cart strategies, and so much more.

Analytics

And of course, it’s incredibly important that we are able to see the performance of our marketing and customer relationship management efforts.

Drip analytics

Here in Drip for example, the Analytics shows different kinds of data that I may need to see to gauge how well I’m performing. Let’s say that I need to see how well my past two emails have been performing. I just need to click Email Metrics and look at my two most recent emails.

Customer relationship management analytics

When the Analytics show me that there are issues with the emails I’m sending, then perhaps I should start shifting strategies. This is why even if a CRM software can pretty much do everything that you need it to do, it’s still up to you—the user—to come up with good strategies that will ensure your customer relationship management is as fruitful as it can be.

Streamlines processes through integration

Lastly, a CRM software makes things easier for marketing and sales teams by streamlining processes through integration of different software. Going back to our Workflow, you can see that I can choose another software and choose from the list of actions that the integration permits.

Integrations Drip

How customer relationship management helps your business

Now that you know what a CRM system does, it’s time to discuss its top benefits for your business.

Improved customer service and relations

According to SuperOffice, 74% of companies using CRM report improved customer service. This is due to the fact that data analysis, segmentation, and automation give companies the ability to personalize and better customer experience every step of the way.

And of course, if your customer’s experience of your company is positive, then relations also improve. Because of your strategy and the CRM software that you’re using, you’re able to really show up for your customers as individuals and respond to their needs.

Increased customer retention

According to Act-On, 99% of B2B top performing companies utilize CRM for customer retention. Part of improving their experience, responding to their needs, and building your relationship with your customers is understanding their preferences and resolving their issues as fast as possible. Since you have a plethora of features to choose from in a CRM software plus your strategy, you get to attend to your customers’ needs in a way that is unparalleled.

Customer relationship management also includes reaching out to customers and ensuring that they repeat purchases through recommendations, or reconnecting to older customers who may have been neglected. According to SuperOffice, keeping an existing customer is about 6 to 7 times cheaper than acquiring new ones, so that is also something to keep in mind.

Heightened productivity

Since CRM software have integrations, automations, and other features, then it takes away some load off of marketing and sales teams. According to destinationCRM, 91% of businesses with more than 10 employees in the US have a CRM system in place as it already automates repetitive tasks and other manual processes.

Imagine not having to manually encode customer emails, send follow ups one by one, and check individually if the customer is ready to purchase!

Increased ROI

Last but not the least, proper CRM implementation can eventually yield your company an ROI exceeding 245%, although the timeline is around four months to a year according to Martech Zone. But still, that’s a massive increase that is well worth the effort, especially since the increased ROI is a result of a productive marketing and sales team, and better customer relations.

Types of CRM platforms

There are three kinds of CRM platforms: Open-source CRM, on-premise CRM, and cloud-based CRM.

Open-source CRM

Open-source CRM simply means its source code is publicly available and can be “distributed, modified, and redistributed by users according to their needs.”

One major advantage of choosing an open-source CRM is the price. It’s (more often than not) free or much more affordable than their proprietary counterparts. Considering that open-source CRMs have the same functions as proprietary CRM plus an immense amount of flexibility due to its nature, many companies opt to go for open-source.

One major disadvantage of open-source CRM is implementation. Considering that it doesn’t come “boxed” unlike proprietary CRM, you need to ensure that you have developers on your team (or at least, that the product has an active developer community) to ensure that you can maximize your software and set it up properly.

On-premise CRM

On-premise CRM means that the software is running on your office servers (hence, on-premise). This means that you have to be in your office to be able to do some customer relationship management.

On-premise CRM also needs plenty of IT attention. Since it’s on-premise, the company’s team is basically in charge of every aspect of managing the CRM platform. This is also why big businesses are usually the ones that opt for this type of CRM—they have the people, the budget, and the need.

It’s not all disadvantages, though. On-premise CRM, as it’s setup and handled by the company, can be customized and configured to fit the company’s processes better. This means that the CRM can be tweaked to fit and support a specific company’s needs.

Cloud-based CRM

On the other hand, cloud-based CRM is just that—cloud-based. Unlike on-premise CRM, the only thing you really need to access the software is internet connection.

You also don’t need a full in-house IT team to manage the CRM in this case. If things go wrong, you can simply contact the vendor’s customer service and ask for their assistance.

Aside from that, it’s easy to setup and can be accessed by anyone in the team who has an account. Typically, there is one owner who can add member users to their account and set the privileges of those users.

What are the disadvantages? For one, faulty internet connection will make using a cloud-based CRM frustrating. There’s also less customization as most likely you are purchasing a “ready-to-go” type of CRM. Lastly, in the event that there’s a bug or some other issue with the platform, you will have to wait for their customer service to come up with a solution. I have had the unfortunate experience once where I came across a bug that the vendor’s team did not know how to fix, and it was a huge headache for everyone involved.

My recommended CRM software

I have two recommended customer relationship management software: Drip and Overloop.

Drip: E-commerce customer relationship management

Drip is the platform that I currently use for my company. It markets itself as an ECRM—a platform that is built for the KPIs of e-commerce. According to Drip, “With ECRM, you can see what your customers are doing all the time—what links they’re clicking, what pages they’re visiting, what emails they’re clicking through, what products they’re yearning after. ”

Drip customer relationship management

Then Drip’s dashboards give you a sense of where your customer is in terms of your sales funnel.

Where Drip shines is in its segmentation capabilities.

Drip segmentation

There are a plethora of filters to choose from that makes segmentation so much easier. According to their website, “You win with intimacy.”

I agree.

Overloop (formerly Prospect.io): Made for conversations

I reviewed Overloop before back when it was still Prospect.io (they changed names back in October 2021). It’s incredibly user-friendly and intuitive, and they even have an extension available in the Chrome web store that you can use to easily find prospects.

Overloop CRM

Where Overloop shines is how they center their platform on conversation. Aside from being robust in features, they also have four areas that are for connecting to and reaching out to customers: Cold Email Campaigns, Email Finder, Live Chats, and Web Forms.

Overloop features

Aside from that, instead of having to use different platforms to check your pipeline and your tasks, you can just use Overloop as they have all those things integrated, making it a full-fledged CRM platform.

Try Overloop with our referral code!

Key takeaway

Customer relationship management isn’t just about using a software to keep customer data, it’s a holistic strategy that includes a CRM software to ensure that your relationship and interactions with your customers are kept healthy and alive. With all the areas and features of CRM, the main benefit really is that your customers get to have a good experience with your company, inevitably increasing sales and retention.

Don’t know how to implement a CRM system for your company? You can opt to let us do it for you!



Source link

SEO

Essential Functions For SEO Data Analysis

Published

on

Essential Functions For SEO Data Analysis

Learning to code, whether with PythonJavaScript, or another programming language, has a whole host of benefits, including the ability to work with larger datasets and automate repetitive tasks.

But despite the benefits, many SEO professionals are yet to make the transition – and I completely understand why! It isn’t an essential skill for SEO, and we’re all busy people.

If you’re pressed for time, and you already know how to accomplish a task within Excel or Google Sheets, then changing tack can feel like reinventing the wheel.

When I first started coding, I initially only used Python for tasks that I couldn’t accomplish in Excel – and it’s taken several years to get to the point where it’s my defacto choice for data processing.

Looking back, I’m incredibly glad that I persisted, but at times it was a frustrating experience, with many an hour spent scanning threads on Stack Overflow.

This post is designed to spare other SEO pros the same fate.

Within it, we’ll cover the Python equivalents of the most commonly used Excel formulas and features for SEO data analysis – all of which are available within a Google Colab notebook linked in the summary.

Specifically, you’ll learn the equivalents of:

  • LEN.
  • Drop Duplicates.
  • Text to Columns.
  • SEARCH/FIND.
  • CONCATENATE.
  • Find and Replace.
  • LEFT/MID/RIGHT.
  • IF.
  • IFS.
  • VLOOKUP.
  • COUNTIF/SUMIF/AVERAGEIF.
  • Pivot Tables.

Amazingly, to accomplish all of this, we’ll primarily be using a singular library – Pandas – with a little help in places from its big brother, NumPy.

Prerequisites

For the sake of brevity, there are a few things we won’t be covering today, including:

  • Installing Python.
  • Basic Pandas, like importing CSVs, filtering, and previewing dataframes.

If you’re unsure about any of this, then Hamlet’s guide on Python data analysis for SEO is the perfect primer.

Now, without further ado, let’s jump in.

LEN

LEN provides a count of the number of characters within a string of text.

For SEO specifically, a common use case is to measure the length of title tags or meta descriptions to determine whether they’ll be truncated in search results.

Within Excel, if we wanted to count the second cell of column A, we’d enter:

=LEN(A2)
Screenshot from Microsoft Excel, November 2022

Python isn’t too dissimilar, as we can rely on the inbuilt len function, which can be combined with Pandas’ loc[] to access a specific row of data within a column:

len(df['Title'].loc[0])

In this example, we’re getting the length of the first row in the “Title” column of our dataframe.

len function python
Screenshot of VS Code, November, 2022

Finding the length of a cell isn’t that useful for SEO, though. Normally, we’d want to apply a function to an entire column!

In Excel, this would be achieved by selecting the formula cell on the bottom right-hand corner and either dragging it down or double-clicking.

When working with a Pandas dataframe, we can use str.len to calculate the length of rows within a series, then store the results in a new column:

df['Length'] = df['Title'].str.len()

Str.len is a ‘vectorized’ operation, which is designed to be applied simultaneously to a series of values. We’ll use these operations extensively throughout this article, as they almost universally end up being faster than a loop.

Another common application of LEN is to combine it with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1

In Pandas, we can achieve this by combining the str.split and str.len functions together:

df['No. Words'] = df['Title'].str.split().str.len()

We’ll cover str.split in more detail later, but essentially, what we’re doing is splitting our data based upon whitespaces within the string, then counting the number of component parts.

word count PythonScreenshot from VS Code, November 2022

Dropping Duplicates

Excel’s ‘Remove Duplicates’ feature provides an easy way to remove duplicate values within a dataset, either by deleting entirely duplicate rows (when all columns are selected) or removing rows with the same values in specific columns.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

In Pandas, this functionality is provided by drop_duplicates.

To drop duplicate rows within a dataframe type:

df.drop_duplicates(inplace=True)

To drop rows based on duplicates within a singular column, include the subset parameter:

df.drop_duplicates(subset="column", inplace=True)

Or specify multiple columns within a list:

df.drop_duplicates(subset=['column','column2'], inplace=True)

One addition above that’s worth calling out is the presence of the inplace parameter. Including inplace=True allows us to overwrite our existing dataframe without needing to create a new one.

There are, of course, times when we want to preserve our raw data. In this case, we can assign our deduped dataframe to a different variable:

df2 = df.drop_duplicates(subset="column")

Text To Columns

Another everyday essential, the ‘text to columns’ feature can be used to split a text string based on a delimiter, such as a slash, comma, or whitespace.

As an example, splitting a URL into its domain and individual subfolders.

Excel drop duplicatesScreenshot from Microsoft Excel, November 2022

When dealing with a dataframe, we can use the str.split function, which creates a list for each entry within a series. This can be converted into multiple columns by setting the expand parameter to True:

df['URL'].str.split(pat="/", expand=True)
str split PythonScreenshot from VS Code, November 2022

As is often the case, our URLs in the image above have been broken up into inconsistent columns, because they don’t feature the same number of folders.

This can make things tricky when we want to save our data within an existing dataframe.

Specifying the n parameter limits the number of splits, allowing us to create a specific number of columns:

df[['Domain', 'Folder1', 'Folder2', 'Folder3']] = df['URL'].str.split(pat="/", expand=True, n=3)

Another option is to use pop to remove your column from the dataframe, perform the split, and then re-add it with the join function:

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

Duplicating the URL to a new column before the split allows us to preserve the full URL. We can then rename the new columns:🐆

df['Split'] = df['URL']

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

df.rename(columns = {0:'Domain', 1:'Folder1', 2:'Folder2', 3:'Folder3', 4:'Parameter'}, inplace=True)
Split pop join functions PythonScreenshot from VS Code, November 2022

CONCATENATE

The CONCAT function allows users to combine multiple strings of text, such as when generating a list of keywords by adding different modifiers.

In this case, we’re adding “mens” and whitespace to column A’s list of product types:

=CONCAT($F$1," ",A2)
concat Excel
Screenshot from Microsoft Excel, November 2022

Assuming we’re dealing with strings, the same can be achieved in Python using the arithmetic operator:

df['Combined] = 'mens' + ' ' + df['Keyword']

Or specify multiple columns of data:

df['Combined'] = df['Subdomain'] + df['URL']
concat PythonScreenshot from VS Code, November 2022

Pandas has a dedicated concat function, but this is more useful when trying to combine multiple dataframes with the same columns.

For instance, if we had multiple exports from our favorite link analysis tool:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

SEARCH/FIND

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These commands are commonly combined with ISNUMBER to create a Boolean column that helps filter down a dataset, which can be extremely helpful when performing tasks like log file analysis, as explained in this guide. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)
isnumber search ExcelScreenshot from Microsoft Excel, November 2022

The difference between SEARCH and FIND is that find is case-sensitive.

The equivalent Pandas function, str.contains, is case-sensitive by default:

df['Journal'] = df['URL'].str.contains('engine', na=False)

Case insensitivity can be enabled by setting the case parameter to False:

df['Journal'] = df['URL'].str.contains('engine', case=False, na=False)

In either scenario, including na=False will prevent null values from being returned within the Boolean column.

One massive advantage of using Pandas here is that, unlike Excel, regex is natively supported by this function – as it is in Google sheets via REGEXMATCH.

Chain together multiple substrings by using the pipe character, also known as the OR operator:

df['Journal'] = df['URL'].str.contains('engine|search', na=False)

Find And Replace

Excel’s “Find and Replace” feature provides an easy way to individually or bulk replace one substring with another.

find replace ExcelScreenshot from Microsoft Excel, November 2022

When processing data for SEO, we’re most likely to select an entire column and “Replace All.”

The SUBSTITUTE formula provides another option here and is useful if you don’t want to overwrite the existing column.

As an example, we can change the protocol of a URL from HTTP to HTTPS, or remove it by replacing it with nothing.

When working with dataframes in Python, we can use str.replace:

df['URL'] = df['URL'].str.replace('http://', 'https://')

Or:

df['URL'] = df['URL'].str.replace('http://', '') # replace with nothing

Again, unlike Excel, regex can be used – like with Google Sheets’ REGEXREPLACE:

df['URL'] = df['URL'].str.replace('http://|https://', '')

Alternatively, if you want to replace multiple substrings with different values, you can use Python’s replace method and provide a list.

This prevents you from having to chain multiple str.replace functions:

df['URL'] = df['URL'].replace(['http://', ' https://'], ['https://www.', 'https://www.’], regex=True)

LEFT/MID/RIGHT

Extracting a substring within Excel requires the usage of the LEFT, MID, or RIGHT functions, depending on where the substring is located within a cell.

Let’s say we want to extract the root domain and subdomain from a URL:

=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)
left mid right ExcelScreenshot from Microsoft Excel, November 2022

Using a combination of MID and multiple FIND functions, this formula is ugly, to say the least – and things get a lot worse for more complex extractions.

Again, Google Sheets does this better than Excel, because it has REGEXEXTRACT.

What a shame that when you feed it larger datasets, it melts faster than a Babybel on a hot radiator.

Thankfully, Pandas offers str.extract, which works in a similar way:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)')
str extract PythonScreenshot from VS Code, November 2022

Combine with fillna to prevent null values, as you would in Excel with IFERROR:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)').fillna('-')

If

IF statements allow you to return different values, depending on whether or not a condition is met.

To illustrate, suppose that we want to create a label for keywords that are ranking within the top three positions.

Excel IFScreenshot from Microsoft Excel, November 2022

Rather than using Pandas in this instance, we can lean on NumPy and the where function (remember to import NumPy, if you haven’t already):

df['Top 3'] = np.where(df['Position'] <= 3, 'Top 3', 'Not Top 3')

Multiple conditions can be used for the same evaluation by using the AND/OR operators, and enclosing the individual criteria within round brackets:

df['Top 3'] = np.where((df['Position'] <= 3) & (df['Position'] != 0), 'Top 3', 'Not Top 3')

In the above, we’re returning “Top 3” for any keywords with a ranking less than or equal to three, excluding any keywords ranking in position zero.

IFS

Sometimes, rather than specifying multiple conditions for the same evaluation, you may want multiple conditions that return different values.

In this case, the best solution is using IFS:

=IFS(B2<=3,"Top 3",B2<=10,"Top 10",B2<=20,"Top 20")
IFS ExcelScreenshot from Microsoft Excel, November 2022

Again, NumPy provides us with the best solution when working with dataframes, via its select function.

With select, we can create a list of conditions, choices, and an optional value for when all of the conditions are false:

conditions = [df['Position'] <= 3, df['Position'] <= 10, df['Position'] <=20]

choices = ['Top 3', 'Top 10', 'Top 20']

df['Rank'] = np.select(conditions, choices, 'Not Top 20')

It’s also possible to have multiple conditions for each of the evaluations.

Let’s say we’re working with an ecommerce retailer with product listing pages (PLPs) and product display pages (PDPs), and we want to label the type of branded pages ranking within the top 10 results.

The easiest solution here is to look for specific URL patterns, such as a subfolder or extension, but what if competitors have similar patterns?

In this scenario, we could do something like this:

conditions = [(df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0),
(df['URL'].str.contains('/product/')) & (df['Brand Rank'] > 0),
(~df['URL'].str.contains('/product/')) & (~df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0)]

choices = ['PLP', 'PDP', 'Other']

df['Brand Page Type'] = np.select(conditions, choices, None)

Above, we’re using str.contains to evaluate whether or not a URL in the top 10 matches our brand’s pattern, then using the “Brand Rank” column to exclude any competitors.

In this example, the tilde sign (~) indicates a negative match. In other words, we’re saying we want every brand URL that doesn’t match the pattern for a “PDP” or “PLP” to match the criteria for ‘Other.’

Lastly, None is included because we want non-brand results to return a null value.

np select PythonScreenshot from VS Code, November 2022

VLOOKUP

VLOOKUP is an essential tool for joining together two distinct datasets on a common column.

In this case, adding the URLs within column N to the keyword, position, and search volume data in columns A-C, using the shared “Keyword” column:

=VLOOKUP(A2,M:N,2,FALSE)
vlookup ExcelScreenshot from Microsoft Excel, November 2022

To do something similar with Pandas, we can use merge.

Replicating the functionality of an SQL join, merge is an incredibly powerful function that supports a variety of different join types.

For our purposes, we want to use a left join, which will maintain our first dataframe and only merge in matching values from our second dataframe:

mergeddf = df.merge(df2, how='left', on='Keyword')

One added advantage of performing a merge over a VLOOKUP, is that you don’t have to have the shared data in the first column of the second dataset, as with the newer XLOOKUP.

It will also pull in multiple rows of data rather than the first match in finds.

One common issue when using the function is for unwanted columns to be duplicated. This occurs when multiple shared columns exist, but you attempt to match using one.

To prevent this – and improve the accuracy of your matches – you can specify a list of columns:

mergeddf = df.merge(df2, how='left', on=['Keyword', 'Search Volume'])

In certain scenarios, you may actively want these columns to be included. For instance, when attempting to merge multiple monthly ranking reports:

mergeddf = df.merge(df2, on='Keyword', how='left', suffixes=('', '_october'))
    .merge(df3, on='Keyword', how='left', suffixes=('', '_september'))

The above code snippet executes two merges to join together three dataframes with the same columns – which are our rankings for November, October, and September.

By labeling the months within the suffix parameters, we end up with a much cleaner dataframe that clearly displays the month, as opposed to the defaults of _x and _y seen in the earlier example.

multi merge PythonScreenshot from VS Code, November 2022

COUNTIF/SUMIF/AVERAGEIF

In Excel, if you want to perform a statistical function based on a condition, you’re likely to use either COUNTIF, SUMIF, or AVERAGEIF.

Commonly, COUNTIF is used to determine how many times a specific string appears within a dataset, such as a URL.

We can accomplish this by declaring the ‘URL’ column as our range, then the URL within an individual cell as our criteria:

=COUNTIF(D:D,D2)
Excel countifScreenshot from Microsoft Excel, November 2022

In Pandas, we can achieve the same outcome by using the groupby function:

df.groupby('URL')['URL'].count()
Python groupbyScreenshot from VS Code, November 2022

Here, the column declared within the round brackets indicates the individual groups, and the column listed in the square brackets is where the aggregation (i.e., the count) is performed.

The output we’re receiving isn’t perfect for this use case, though, because it’s consolidated the data.

Typically, when using Excel, we’d have the URL count inline within our dataset. Then we can use it to filter to the most frequently listed URLs.

To do this, use transform and store the output in a column:

df['URL Count'] = df.groupby('URL')['URL'].transform('count')
Python groupby transformScreenshot from VS Code, November 2022

You can also apply custom functions to groups of data by using a lambda (anonymous) function:

df['Google Count'] = df.groupby(['URL'])['URL'].transform(lambda x: x[x.str.contains('google')].count())

In our examples so far, we’ve been using the same column for our grouping and aggregations, but we don’t have to. Similarly to COUNTIFS/SUMIFS/AVERAGEIFS in Excel, it’s possible to group using one column, then apply our statistical function to another.

Going back to the earlier search engine results page (SERP) example, we may want to count all ranking PDPs on a per-keyword basis and return this number alongside our existing data:

df['PDP Count'] = df.groupby(['Keyword'])['URL'].transform(lambda x: x[x.str.contains('/product/|/prd/|/pd/')].count())
Python groupby countifsScreenshot from VS Code, November 2022

Which in Excel parlance, would look something like this:

=SUM(COUNTIFS(A:A,[@Keyword],D:D,{"*/product/*","*/prd/*","*/pd/*"}))

Pivot Tables

Last, but by no means least, it’s time to talk pivot tables.

In Excel, a pivot table is likely to be our first port of call if we want to summarise a large dataset.

For instance, when working with ranking data, we may want to identify which URLs appear most frequently, and their average ranking position.

pivot table ExcelScreenshot from Microsoft Excel, November 2022

Again, Pandas has its own pivot tables equivalent – but if all you want is a count of unique values within a column, this can be accomplished using the value_counts function:

count = df['URL'].value_counts()

Using groupby is also an option.

Earlier in the article, performing a groupby that aggregated our data wasn’t what we wanted – but it’s precisely what’s required here:

grouped = df.groupby('URL').agg(
     url_frequency=('Keyword', 'count'),
     avg_position=('Position', 'mean'),
     )

grouped.reset_index(inplace=True)
groupby-pivot PythonScreenshot from VS Code, November 2022

Two aggregate functions have been applied in the example above, but this could easily be expanded upon, and 13 different types are available.

There are, of course, times when we do want to use pivot_table, such as when performing multi-dimensional operations.

To illustrate what this means, let’s reuse the ranking groupings we made using conditional statements and attempt to display the number of times a URL ranks within each group.

ranking_groupings = df.groupby(['URL', 'Grouping']).agg(
     url_frequency=('Keyword', 'count'),
     )
python groupby groupingScreenshot from VS Code, November 2022

This isn’t the best format to use, as multiple rows have been created for each URL.

Instead, we can use pivot_table, which will display the data in different columns:

pivot = pd.pivot_table(df,
index=['URL'],
columns=['Grouping'],
aggfunc="size",
fill_value=0,
)
pivot table PythonScreenshot from VS Code, November 2022

Final Thoughts

Whether you’re looking for inspiration to start learning Python, or are already leveraging it in your SEO workflows, I hope that the above examples help you along on your journey.

As promised, you can find a Google Colab notebook with all of the code snippets here.

In truth, we’ve barely scratched the surface of what’s possible, but understanding the basics of Python data analysis will give you a solid base upon which to build.

More resources:


Featured Image: mapo_japan/Shutterstock



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish