SEO

Essential Functions For SEO Data Analysis

Published

1 year ago

December 3, 2022

Essential Functions For SEO Data Analysis

Learning to code, whether with Python, JavaScript, or another programming language, has a whole host of benefits, including the ability to work with larger datasets and automate repetitive tasks.

But despite the benefits, many SEO professionals are yet to make the transition – and I completely understand why! It isn’t an essential skill for SEO, and we’re all busy people.

If you’re pressed for time, and you already know how to accomplish a task within Excel or Google Sheets, then changing tack can feel like reinventing the wheel.

When I first started coding, I initially only used Python for tasks that I couldn’t accomplish in Excel – and it’s taken several years to get to the point where it’s my defacto choice for data processing.

Looking back, I’m incredibly glad that I persisted, but at times it was a frustrating experience, with many an hour spent scanning threads on Stack Overflow.

This post is designed to spare other SEO pros the same fate.

Within it, we’ll cover the Python equivalents of the most commonly used Excel formulas and features for SEO data analysis – all of which are available within a Google Colab notebook linked in the summary.

Specifically, you’ll learn the equivalents of:

LEN.
Drop Duplicates.
Text to Columns.
SEARCH/FIND.
CONCATENATE.
Find and Replace.
LEFT/MID/RIGHT.
IF.
IFS.
VLOOKUP.
COUNTIF/SUMIF/AVERAGEIF.
Pivot Tables.

Amazingly, to accomplish all of this, we’ll primarily be using a singular library – Pandas – with a little help in places from its big brother, NumPy.

Prerequisites

For the sake of brevity, there are a few things we won’t be covering today, including:

Installing Python.
Basic Pandas, like importing CSVs, filtering, and previewing dataframes.

If you’re unsure about any of this, then Hamlet’s guide on Python data analysis for SEO is the perfect primer.

Now, without further ado, let’s jump in.

LEN

LEN provides a count of the number of characters within a string of text.

For SEO specifically, a common use case is to measure the length of title tags or meta descriptions to determine whether they’ll be truncated in search results.

Within Excel, if we wanted to count the second cell of column A, we’d enter:

=LEN(A2)

Screenshot from Microsoft Excel, November 2022

Python isn’t too dissimilar, as we can rely on the inbuilt len function, which can be combined with Pandas’ loc[] to access a specific row of data within a column:

len(df['Title'].loc[0])

In this example, we’re getting the length of the first row in the “Title” column of our dataframe.

: Screenshot of VS Code, November, 2022

Finding the length of a cell isn’t that useful for SEO, though. Normally, we’d want to apply a function to an entire column!

In Excel, this would be achieved by selecting the formula cell on the bottom right-hand corner and either dragging it down or double-clicking.

When working with a Pandas dataframe, we can use str.len to calculate the length of rows within a series, then store the results in a new column:

df['Length'] = df['Title'].str.len()

Str.len is a ‘vectorized’ operation, which is designed to be applied simultaneously to a series of values. We’ll use these operations extensively throughout this article, as they almost universally end up being faster than a loop.

Another common application of LEN is to combine it with SUBSTITUTE to count the number of words in a cell:

=LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1

In Pandas, we can achieve this by combining the str.split and str.len functions together:

df['No. Words'] = df['Title'].str.split().str.len()

We’ll cover str.split in more detail later, but essentially, what we’re doing is splitting our data based upon whitespaces within the string, then counting the number of component parts.

Screenshot from VS Code, November 2022

Dropping Duplicates

Excel’s ‘Remove Duplicates’ feature provides an easy way to remove duplicate values within a dataset, either by deleting entirely duplicate rows (when all columns are selected) or removing rows with the same values in specific columns.

Screenshot from Microsoft Excel, November 2022

In Pandas, this functionality is provided by drop_duplicates.

To drop duplicate rows within a dataframe type:

df.drop_duplicates(inplace=True)

To drop rows based on duplicates within a singular column, include the subset parameter:

df.drop_duplicates(subset="column", inplace=True)

Or specify multiple columns within a list:

df.drop_duplicates(subset=['column','column2'], inplace=True)

One addition above that’s worth calling out is the presence of the inplace parameter. Including inplace=True allows us to overwrite our existing dataframe without needing to create a new one.

There are, of course, times when we want to preserve our raw data. In this case, we can assign our deduped dataframe to a different variable:

df2 = df.drop_duplicates(subset="column")

Text To Columns

Another everyday essential, the ‘text to columns’ feature can be used to split a text string based on a delimiter, such as a slash, comma, or whitespace.

As an example, splitting a URL into its domain and individual subfolders.

Screenshot from Microsoft Excel, November 2022

When dealing with a dataframe, we can use the str.split function, which creates a list for each entry within a series. This can be converted into multiple columns by setting the expand parameter to True:

df['URL'].str.split(pat="/", expand=True)

Screenshot from VS Code, November 2022

As is often the case, our URLs in the image above have been broken up into inconsistent columns, because they don’t feature the same number of folders.

This can make things tricky when we want to save our data within an existing dataframe.

Specifying the n parameter limits the number of splits, allowing us to create a specific number of columns:

df[['Domain', 'Folder1', 'Folder2', 'Folder3']] = df['URL'].str.split(pat="/", expand=True, n=3)

Another option is to use pop to remove your column from the dataframe, perform the split, and then re-add it with the join function:

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

Duplicating the URL to a new column before the split allows us to preserve the full URL. We can then rename the new columns:🐆

df['Split'] = df['URL']

df = df.join(df.pop('Split').str.split(pat="/", expand=True))

df.rename(columns = {0:'Domain', 1:'Folder1', 2:'Folder2', 3:'Folder3', 4:'Parameter'}, inplace=True)

Screenshot from VS Code, November 2022

CONCATENATE

The CONCAT function allows users to combine multiple strings of text, such as when generating a list of keywords by adding different modifiers.

In this case, we’re adding “mens” and whitespace to column A’s list of product types:

=CONCAT($F$1," ",A2)

: Screenshot from Microsoft Excel, November 2022

Assuming we’re dealing with strings, the same can be achieved in Python using the arithmetic operator:

df['Combined] = 'mens' + ' ' + df['Keyword']

Or specify multiple columns of data:

df['Combined'] = df['Subdomain'] + df['URL']

Screenshot from VS Code, November 2022

Pandas has a dedicated concat function, but this is more useful when trying to combine multiple dataframes with the same columns.

For instance, if we had multiple exports from our favorite link analysis tool:

df = pd.read_csv('data.csv')
df2 = pd.read_csv('data2.csv')
df3 = pd.read_csv('data3.csv')

dflist = [df, df2, df3]

df = pd.concat(dflist, ignore_index=True)

SEARCH/FIND

The SEARCH and FIND formulas provide a way of locating a substring within a text string.

These commands are commonly combined with ISNUMBER to create a Boolean column that helps filter down a dataset, which can be extremely helpful when performing tasks like log file analysis, as explained in this guide. E.g.:

=ISNUMBER(SEARCH("searchthis",A2)

Screenshot from Microsoft Excel, November 2022

The difference between SEARCH and FIND is that find is case-sensitive.

The equivalent Pandas function, str.contains, is case-sensitive by default:

df['Journal'] = df['URL'].str.contains('engine', na=False)

Case insensitivity can be enabled by setting the case parameter to False:

df['Journal'] = df['URL'].str.contains('engine', case=False, na=False)

In either scenario, including na=False will prevent null values from being returned within the Boolean column.

One massive advantage of using Pandas here is that, unlike Excel, regex is natively supported by this function – as it is in Google sheets via REGEXMATCH.

Chain together multiple substrings by using the pipe character, also known as the OR operator:

df['Journal'] = df['URL'].str.contains('engine|search', na=False)

Find And Replace

Excel’s “Find and Replace” feature provides an easy way to individually or bulk replace one substring with another.

Screenshot from Microsoft Excel, November 2022

When processing data for SEO, we’re most likely to select an entire column and “Replace All.”

The SUBSTITUTE formula provides another option here and is useful if you don’t want to overwrite the existing column.

As an example, we can change the protocol of a URL from HTTP to HTTPS, or remove it by replacing it with nothing.

When working with dataframes in Python, we can use str.replace:

df['URL'] = df['URL'].str.replace('http://', 'https://')

Or:

df['URL'] = df['URL'].str.replace('http://', '') # replace with nothing

Again, unlike Excel, regex can be used – like with Google Sheets’ REGEXREPLACE:

df['URL'] = df['URL'].str.replace('http://|https://', '')

Alternatively, if you want to replace multiple substrings with different values, you can use Python’s replace method and provide a list.

This prevents you from having to chain multiple str.replace functions:

df['URL'] = df['URL'].replace(['http://', ' https://'], ['https://www.', 'https://www.’], regex=True)

LEFT/MID/RIGHT

Extracting a substring within Excel requires the usage of the LEFT, MID, or RIGHT functions, depending on where the substring is located within a cell.

Let’s say we want to extract the root domain and subdomain from a URL:

=MID(A2,FIND(":",A2,4)+3,FIND("/",A2,9)-FIND(":",A2,4)-3)

Screenshot from Microsoft Excel, November 2022

Using a combination of MID and multiple FIND functions, this formula is ugly, to say the least – and things get a lot worse for more complex extractions.

Again, Google Sheets does this better than Excel, because it has REGEXEXTRACT.

What a shame that when you feed it larger datasets, it melts faster than a Babybel on a hot radiator.

Thankfully, Pandas offers str.extract, which works in a similar way:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)')

Screenshot from VS Code, November 2022

Combine with fillna to prevent null values, as you would in Excel with IFERROR:

df['Domain'] = df['URL'].str.extract('.*://?([^/]+)').fillna('-')

If

IF statements allow you to return different values, depending on whether or not a condition is met.

To illustrate, suppose that we want to create a label for keywords that are ranking within the top three positions.

Screenshot from Microsoft Excel, November 2022

Rather than using Pandas in this instance, we can lean on NumPy and the where function (remember to import NumPy, if you haven’t already):

df['Top 3'] = np.where(df['Position'] <= 3, 'Top 3', 'Not Top 3')

Multiple conditions can be used for the same evaluation by using the AND/OR operators, and enclosing the individual criteria within round brackets:

df['Top 3'] = np.where((df['Position'] <= 3) & (df['Position'] != 0), 'Top 3', 'Not Top 3')

In the above, we’re returning “Top 3” for any keywords with a ranking less than or equal to three, excluding any keywords ranking in position zero.

IFS

Sometimes, rather than specifying multiple conditions for the same evaluation, you may want multiple conditions that return different values.

In this case, the best solution is using IFS:

=IFS(B2<=3,"Top 3",B2<=10,"Top 10",B2<=20,"Top 20")

Screenshot from Microsoft Excel, November 2022

Again, NumPy provides us with the best solution when working with dataframes, via its select function.

With select, we can create a list of conditions, choices, and an optional value for when all of the conditions are false:

conditions = [df['Position'] <= 3, df['Position'] <= 10, df['Position'] <=20]

choices = ['Top 3', 'Top 10', 'Top 20']

df['Rank'] = np.select(conditions, choices, 'Not Top 20')

It’s also possible to have multiple conditions for each of the evaluations.

Let’s say we’re working with an ecommerce retailer with product listing pages (PLPs) and product display pages (PDPs), and we want to label the type of branded pages ranking within the top 10 results.

The easiest solution here is to look for specific URL patterns, such as a subfolder or extension, but what if competitors have similar patterns?

In this scenario, we could do something like this:

conditions = [(df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0),
(df['URL'].str.contains('/product/')) & (df['Brand Rank'] > 0),
(~df['URL'].str.contains('/product/')) & (~df['URL'].str.contains('/category/')) & (df['Brand Rank'] > 0)]

choices = ['PLP', 'PDP', 'Other']

df['Brand Page Type'] = np.select(conditions, choices, None)

Above, we’re using str.contains to evaluate whether or not a URL in the top 10 matches our brand’s pattern, then using the “Brand Rank” column to exclude any competitors.

In this example, the tilde sign (~) indicates a negative match. In other words, we’re saying we want every brand URL that doesn’t match the pattern for a “PDP” or “PLP” to match the criteria for ‘Other.’

Lastly, None is included because we want non-brand results to return a null value.

Screenshot from VS Code, November 2022

VLOOKUP

VLOOKUP is an essential tool for joining together two distinct datasets on a common column.

In this case, adding the URLs within column N to the keyword, position, and search volume data in columns A-C, using the shared “Keyword” column:

=VLOOKUP(A2,M:N,2,FALSE)

Screenshot from Microsoft Excel, November 2022

To do something similar with Pandas, we can use merge.

Replicating the functionality of an SQL join, merge is an incredibly powerful function that supports a variety of different join types.

For our purposes, we want to use a left join, which will maintain our first dataframe and only merge in matching values from our second dataframe:

mergeddf = df.merge(df2, how='left', on='Keyword')

One added advantage of performing a merge over a VLOOKUP, is that you don’t have to have the shared data in the first column of the second dataset, as with the newer XLOOKUP.

It will also pull in multiple rows of data rather than the first match in finds.

One common issue when using the function is for unwanted columns to be duplicated. This occurs when multiple shared columns exist, but you attempt to match using one.

To prevent this – and improve the accuracy of your matches – you can specify a list of columns:

mergeddf = df.merge(df2, how='left', on=['Keyword', 'Search Volume'])

In certain scenarios, you may actively want these columns to be included. For instance, when attempting to merge multiple monthly ranking reports:

mergeddf = df.merge(df2, on='Keyword', how='left', suffixes=('', '_october'))
    .merge(df3, on='Keyword', how='left', suffixes=('', '_september'))

The above code snippet executes two merges to join together three dataframes with the same columns – which are our rankings for November, October, and September.

By labeling the months within the suffix parameters, we end up with a much cleaner dataframe that clearly displays the month, as opposed to the defaults of _x and _y seen in the earlier example.

Screenshot from VS Code, November 2022

COUNTIF/SUMIF/AVERAGEIF

In Excel, if you want to perform a statistical function based on a condition, you’re likely to use either COUNTIF, SUMIF, or AVERAGEIF.

Commonly, COUNTIF is used to determine how many times a specific string appears within a dataset, such as a URL.

We can accomplish this by declaring the ‘URL’ column as our range, then the URL within an individual cell as our criteria:

=COUNTIF(D:D,D2)

Screenshot from Microsoft Excel, November 2022

In Pandas, we can achieve the same outcome by using the groupby function:

df.groupby('URL')['URL'].count()

Screenshot from VS Code, November 2022

Here, the column declared within the round brackets indicates the individual groups, and the column listed in the square brackets is where the aggregation (i.e., the count) is performed.

The output we’re receiving isn’t perfect for this use case, though, because it’s consolidated the data.

Typically, when using Excel, we’d have the URL count inline within our dataset. Then we can use it to filter to the most frequently listed URLs.

To do this, use transform and store the output in a column:

df['URL Count'] = df.groupby('URL')['URL'].transform('count')

Screenshot from VS Code, November 2022

You can also apply custom functions to groups of data by using a lambda (anonymous) function:

df['Google Count'] = df.groupby(['URL'])['URL'].transform(lambda x: x[x.str.contains('google')].count())

In our examples so far, we’ve been using the same column for our grouping and aggregations, but we don’t have to. Similarly to COUNTIFS/SUMIFS/AVERAGEIFS in Excel, it’s possible to group using one column, then apply our statistical function to another.

Going back to the earlier search engine results page (SERP) example, we may want to count all ranking PDPs on a per-keyword basis and return this number alongside our existing data:

df['PDP Count'] = df.groupby(['Keyword'])['URL'].transform(lambda x: x[x.str.contains('/product/|/prd/|/pd/')].count())

Screenshot from VS Code, November 2022 Which in Excel parlance, would look something like this:

=SUM(COUNTIFS(A:A,[@Keyword],D:D,{"*/product/*","*/prd/*","*/pd/*"}))

Pivot Tables

Last, but by no means least, it’s time to talk pivot tables.

In Excel, a pivot table is likely to be our first port of call if we want to summarise a large dataset.

For instance, when working with ranking data, we may want to identify which URLs appear most frequently, and their average ranking position.

Screenshot from Microsoft Excel, November 2022

Again, Pandas has its own pivot tables equivalent – but if all you want is a count of unique values within a column, this can be accomplished using the value_counts function:

count = df['URL'].value_counts()

Using groupby is also an option.

Earlier in the article, performing a groupby that aggregated our data wasn’t what we wanted – but it’s precisely what’s required here:

grouped = df.groupby('URL').agg(
     url_frequency=('Keyword', 'count'),
     avg_position=('Position', 'mean'),
     )

grouped.reset_index(inplace=True)

Screenshot from VS Code, November 2022

Two aggregate functions have been applied in the example above, but this could easily be expanded upon, and 13 different types are available.

There are, of course, times when we do want to use pivot_table, such as when performing multi-dimensional operations.

To illustrate what this means, let’s reuse the ranking groupings we made using conditional statements and attempt to display the number of times a URL ranks within each group.

ranking_groupings = df.groupby(['URL', 'Grouping']).agg(
     url_frequency=('Keyword', 'count'),
     )

Screenshot from VS Code, November 2022

This isn’t the best format to use, as multiple rows have been created for each URL.

Instead, we can use pivot_table, which will display the data in different columns:

pivot = pd.pivot_table(df,
index=['URL'],
columns=['Grouping'],
aggfunc="size",
fill_value=0,
)

Screenshot from VS Code, November 2022

Final Thoughts

Whether you’re looking for inspiration to start learning Python, or are already leveraging it in your SEO workflows, I hope that the above examples help you along on your journey.

As promised, you can find a Google Colab notebook with all of the code snippets here.

In truth, we’ve barely scratched the surface of what’s possible, but understanding the basics of Python data analysis will give you a solid base upon which to build.

More resources:

Featured Image: mapo_japan/Shutterstock

Related Topics:analysis Data essential Functions seo

Up Next

What 2022 SEO Shifts Could Mean For 2023 & Beyond [Webinar]

Don't Miss

How to Achieve 7-Figures with Your Law Firm Website

Click to comment

You must be logged in to post a comment Login

SEO

Google’s Search Engine Market Share Drops As Competitors’ Grows

Published

1 second ago

May 2, 2024

Max

According to data from GS Statcounter, Google’s search engine market share has fallen to 86.99%, the lowest point since the firm began tracking search engine share in 2009.

The drop represents a more than 4% decrease from the previous month, marking the largest single-month decline on record.

Googles Search Engine Market Share Drops As Competitors Grows

Screenshot from: https://gs.statcounter.com/search-engine-market-share/, May 2024.

U.S. Market Impact

The decline is most significant in Google’s key market, the United States, where its share of searches across all devices fell by nearly 10%, reaching 77.52%.

1714669058 226 Googles Search Engine Market Share Drops As Competitors Grows

Screenshot from: https://gs.statcounter.com/search-engine-market-share/, May 2024.

Concurrently, competitors Microsoft Bing and Yahoo Search have seen gains. Bing reached a 13% market share in the U.S. and 5.8% globally, its highest since launching in 2009.

Yahoo Search’s worldwide share nearly tripled to 3.06%, a level not seen since July 2015.

1714669058 375 Googles Search Engine Market Share Drops As Competitors Grows

Screenshot from: https://gs.statcounter.com/search-engine-market-share/, May 2024.

Search Quality Concerns

Many industry experts have recently expressed concerns about the declining quality of Google’s search results.

A portion of the SEO community believes that the search giant’s results have worsened following the latest update.

These concerns have begun to extend to average internet users, who are increasingly voicing complaints about the state of their search results.

Alternative Perspectives

Web analytics platform SimilarWeb provided additional context on X (formerly Twitter), stating that its data for the US for March 2024 suggests Google’s decline may not be as severe as initially reported.

From our data (Search Engine website category, US, March 2024) it doesn’t look like we’re there yet: pic.twitter.com/RBUJp4ZLeb
— Similarweb (@Similarweb) May 1, 2024

SimilarWeb also highlighted Yahoo’s strong performance, categorizing it as a News and Media platform rather than a direct competitor to Google in the Search Engine category.

Don’t underestimate Yahoo. They’re doing great. On our platform they’re categorized as News and Media, and hence not a direct competitor to Google in the Search Engine category. But they rank #10 worldwide, #6 in the US, and #1 in their category. Much higher than Bing and OpenAI. pic.twitter.com/O4yJu5QEK6
— Similarweb (@Similarweb) May 2, 2024

At the same time, Google is slightly declining 👀 pic.twitter.com/9i7paeU1QG
— Similarweb (@Similarweb) May 2, 2024

Why It Matters

The shifting search engine market trends can impact businesses, marketers, and regular users.

Google has been on top for a long time, shaping how we find things online and how users behave.

However, as its market share drops and other search engines gain popularity, publishers may need to rethink their online strategies and optimize for multiple search platforms besides Google.

Users are becoming vocal about Google’s declining search quality over time. As people start trying alternate search engines, the various platforms must prioritize keeping users satisfied if they want to maintain or grow their market position.

It will be interesting to see how they respond to this boost in market share.

What It Means for SEO Pros

As Google’s competitors gain ground, SEO strategies may need to adapt by accounting for how each search engine’s algorithms and ranking factors work.

This could involve diversifying SEO efforts across multiple platforms and staying up-to-date on best practices for each one.

The increased focus on high-quality search results emphasizes the need to create valuable, user-focused content that meets the needs of the target audience.

SEO pros must prioritize informative, engaging, trustworthy content that meets search engine algorithms and user expectations.

Remain flexible, adaptable, and proactive to navigate these shifts. Keeping a pulse on industry trends, user behaviors, and competing search engine strategies will be key for successful SEO campaigns.

Featured Image: Tada Images/Shutterstock

SEO

How To Drive Pipeline With A Silo-Free Strategy

Published

20 hours ago

May 1, 2024

Max

How To Drive Pipeline With A Silo-Free Strategy

When it comes to B2B strategy, a holistic approach is the only approach.

Revenue organizations usually operate with siloed teams, and often expect a one-size-fits-all solution (usually buying clicks with paid media).

However, without cohesive brand, infrastructure, and pipeline generation efforts, they’re pretty much doomed to fail.

It’s just like rowing crew, where each member of the team must synchronize their movements to propel the boat forward – successful B2B marketing requires an integrated strategy.

So if you’re ready to ditch your disjointed marketing efforts and try a holistic approach, we’ve got you covered.

Join us on May 15, for an insightful live session with Digital Reach Agency on how to craft a compelling brand and PMF.

We’ll walk through the critical infrastructure you need, and the reliances and dependences of the core digital marketing disciplines.

Key takeaways from this webinar:

Thinking Beyond Traditional Silos: Learn why traditional marketing silos are no longer viable and how they spell doom for modern revenue organizations.
How To Identify and Fix Silos: Discover actionable strategies for pinpointing and sealing the gaps in your marketing silos.
The Power of Integration: Uncover the secrets to successfully integrating brand strategy, digital infrastructure, and pipeline generation efforts.

Ben Childs, President and Founder of Digital Reach Agency, and Jordan Gibson, Head of Growth at Digital Reach Agency, will show you how to seamlessly integrate various elements of your marketing strategy for optimal results.

Don’t make the common mistake of using traditional marketing silos – sign up now and learn what it takes to transform your B2B go-to-market.

You’ll also get the opportunity to ask Ben and Jordan your most pressing questions, following the presentation.

And if you can’t make it to the live event, register anyway and we’ll send you a recording shortly after the webinar.

SEO

Why Big Companies Make Bad Content

Published

24 hours ago

May 1, 2024

Entireweb News Bot

It’s like death and taxes: inevitable. The bigger a company gets, the worse its content marketing becomes.

HubSpot teaching you how to type the shrug emoji or buy bitcoin stock. Salesforce sharing inspiring business quotes. GoDaddy helping you use Bing AI, or Zendesk sharing catchy sales slogans.

Judged by content marketing best practice, these articles are bad.

They won’t resonate with decision-makers. Nobody will buy a HubSpot license after Googling “how to buy bitcoin stock.” It’s the very definition of vanity traffic: tons of visits with no obvious impact on the business.

So why does this happen?

I did a double-take the first time I discovered this article on the HubSpot blog.

There’s an obvious (but flawed) answer to this question: big companies are inefficient.

As companies grow, they become more complicated, and writing good, relevant content becomes harder. I’ve experienced this firsthand:

extra rounds of legal review and stakeholder approval creeping into processes.
content watered down to serve an ever-more generic “brand voice”.
growing misalignment between search and content teams.
a lack of content leadership within the company as early employees leave.

Similarly, funded companies have to grow, even when they’re already huge. Content has to feed the machine, continually increasing traffic… even if that traffic never contributes to the bottom line.

There’s an element of truth here, but I’ve come to think that both these arguments are naive, and certainly not the whole story.

It is wrong to assume that the same people that grew the company suddenly forgot everything they once knew about content, and wrong to assume that companies willfully target useless keywords just to game their OKRs.

Instead, let’s assume that this strategy is deliberate, and not oversight. I think bad content—and the vanity traffic it generates—is actually good for business.

There are benefits to driving tons of traffic, even if that traffic never directly converts. Or put in meme format:

Programmatic SEO is a good example. Why does Dialpad create landing pages for local phone numbers?

1714584366 91 Why Big Companies Make Bad Content

Why does Wise target exchange rate keywords?

1714584366 253 Why Big Companies Make Bad Content

Why do we have a list of most popular websites pages?

1714584367 988 Why Big Companies Make Bad Content

As this Twitter user points out, these articles will never convert…

The conversion rate for this must be awful though. Like there is no purchase intent behind these searches, and it’s most likely all consumers who aren’t their target market and don’t then want to buy an ‘AI powered collaboration platform’. These are basically vanity visits…
— Aaron Beashel (@aaronbeashel) February 28, 2024

…but they don’t need to.

Every published URL and targeted keyword is a new doorway from the backwaters of the internet into your website. It’s a chance to acquire backlinks that wouldn’t otherwise exist, and an opportunity to get your brand in front of thousands of new, otherwise unfamiliar people.

These benefits might not directly translate into revenue, but over time, in aggregate, they can have a huge indirect impact on revenue. They can:

Strengthen domain authority and the search performance of every other page on the website.
Boost brand awareness, and encourage serendipitous interactions that land your brand in front of the right person at the right time.
Deny your competitors traffic and dilute their share of voice.

These small benefits become more worthwhile when multiplied across many hundreds or thousands of pages. If you can minimize the cost of the content, there is relatively little downside.

What about topical authority?

“But what about topical authority?!” I hear you cry. “If you stray too far from your area of expertise, won’t rankings suffer for it?”

I reply simply with this screenshot of Forbes’ “health” subfolder, generating almost 4 million estimated monthly organic pageviews:

1714584367 695 Why Big Companies Make Bad Content

And big companies can minimize cost. For large, established brands, the marginal cost of content creation is relatively low.

Many companies scale their output through networks of freelancer writers, avoiding the cost of fully loaded employees. They have established, efficient processes for research, briefing, editorial review, publication and maintenance. The cost of an additional “unit” of content—or ten, or a hundred—is not that great, especially relative to other marketing channels.

There is also relatively little opportunity cost to consider: the fact that energy spent on “vanity” traffic could be better spent elsewhere, on more business-relevant topics.

In reality, many of the companies engaging in this strategy have already plucked the low-hanging fruit and written almost every product-relevant topic. There are a finite number of high traffic, high relevance topics; blog consistently for a decade and you too will reach these limits.

On top of that, the HubSpots and Salesforces of the world have very established, very efficient sales processes. Content gating, lead capture and scoring, and retargeting allow them to put very small conversion rates to relatively good use.

1714584367 376 Why Big Companies Make Bad Content

Even HubSpot’s article on Bitcoin stock has its own relevant call-to-action—and for HubSpot, building a database of aspiring investors is more valuable than it sounds, because…

The bigger a company grows, the bigger its audience needs to be to continue sustaining that growth rate.

Companies generally expand their total addressable market (TAM) as they grow, like HubSpot broadening from marketing to sales and customer success, launching new product lines for new—much bigger—audiences. This means the target audience for their content marketing grows alongside.

As Peep Laja put its:

When in early stages, you have to focus.
But as you grow in revenue and want to become an absolute monster of a company… they all seem to become for “everyone”. Salesforce, Hubspot, Zendesk, Freshworks, etc etc.
Any exceptions here? $1B+ B2B SaaS companies that are narrowly…
— Pe:p Laja (@peeplaja) April 4, 2024

But for the biggest companies, this principle is taken to an extreme. When a company gears up to IPO, its target audience expands to… pretty much everyone.

This was something Janessa Lantz (ex-HubSpot and dbt Labs) helped me understand: the target audience for a post-IPO company is not just end users, but institutional investors, market analysts, journalists, even regular Jane investors.

These are people who can influence the company’s worth in ways beyond simply buying a subscription: they can invest or encourage others to invest and dramatically influence the share price. These people are influenced by billboards, OOH advertising and, you guessed it, seemingly “bad” content showing up whenever they Google something.

You can think of this as a second, additional marketing funnel for post-IPO companies:

Illustration: When companies IPO, the traditional marketing funnel is accompanied by a second funnel. Website visitors contribute value through stock appreciation, not just revenue.

These visitors might not purchase a software subscription when they see your article in the SERP, but they will notice your brand, and maybe listen more attentively the next time your stock ticker appears on the news.

They won’t become power users, but they might download your eBook and add an extra unit to the email subscribers reported in your S1.

They might not contribute revenue now, but they will in the future: in the form of stock appreciation, or becoming the target audience for a future product line.

Vanity traffic does create value, but in a form most content marketers are not used to measuring.

If any of these benefits apply, then it makes sense to acquire them for your company—but also to deny them to your competitors.

SEO is an arms race: there are a finite number of keywords and topics, and leaving a rival to claim hundreds, even thousands of SERPs uncontested could very quickly create a headache for your company.

SEO can quickly create a moat of backlinks and brand awareness that can be virtually impossible to challenge; left unchecked, the gap between your company and your rival can accelerate at an accelerating pace.

Pumping out “bad” content and chasing vanity traffic is a chance to deny your rivals unchallenged share of voice, and make sure your brand always has a seat at the table.

Final thoughts

These types of articles are miscategorized—instead of thinking of them as bad content, it’s better to think of them as cheap digital billboards with surprisingly great attribution.

Big companies chasing “vanity traffic” isn’t an accident or oversight—there are good reasons to invest energy into content that will never convert. There is benefit, just not in the format most content marketers are used to.

This is not an argument to suggest that every company should invest in hyper-broad, high-traffic keywords. But if you’ve been blogging for a decade, or you’re gearing up for an IPO, then “bad content” and the vanity traffic it creates might not be so bad.