Connect with us

SEO

Competitor Backlink Analysis With Python [Complete Script]

Published

on

In my last article, we analyzed our backlinks using data from Ahrefs.

This time around, we’re including the competitor backlinks in our analysis using the same Ahrefs data source for comparison.

Like last time, we defined the value of a site’s backlinks for SEO as a product of quality and quantity.

Quality is domain authority (or Ahrefs’ equivalent domain rating) and quantity is the number of referring domains.

Again, we’ll evaluate the link quality with the available data before evaluating the quantity.

Time to code.

import re
import time
import random
import pandas as pd
import numpy as np
import datetime
from datetime import timedelta
from plotnine import *
import matplotlib.pyplot as plt
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype
import uritools  

pd.set_option('display.max_colwidth', None)
%matplotlib inline
root_domain = 'johnsankey.co.uk'
hostdomain = 'www.johnsankey.co.uk'
hostname="johnsankey"
full_domain = 'https://www.johnsankey.co.uk'
target_name="John Sankey"

Data Import & Cleaning

We set up the file directories to read multiple Ahrefs exported data files in one folder, which is much faster, less boring, and more efficient than reading each file individually.

Especially when you have more than 10 of them!

ahrefs_path="data/"

The listdir( ) function from the OS module allows us to list all files in a subdirectory.

ahrefs_filenames = os.listdir(ahrefs_path)
ahrefs_filenames.remove('.DS_Store')
ahrefs_filenames

File names now listed below:

['www.davidsonlondon.com--refdomains-subdomain__2022-03-13_23-37-29.csv',
 'www.stephenclasper.co.uk--refdomains-subdoma__2022-03-13_23-47-28.csv',
 'www.touchedinteriors.co.uk--refdomains-subdo__2022-03-13_23-42-05.csv',
 'www.lushinteriors.co--refdomains-subdomains__2022-03-13_23-44-34.csv',
 'www.kassavello.com--refdomains-subdomains__2022-03-13_23-43-19.csv',
 'www.tulipinterior.co.uk--refdomains-subdomai__2022-03-13_23-41-04.csv',
 'www.tgosling.com--refdomains-subdomains__2022-03-13_23-38-44.csv',
 'www.onlybespoke.com--refdomains-subdomains__2022-03-13_23-45-28.csv',
 'www.williamgarvey.co.uk--refdomains-subdomai__2022-03-13_23-43-45.csv',
 'www.hadleyrose.co.uk--refdomains-subdomains__2022-03-13_23-39-31.csv',
 'www.davidlinley.com--refdomains-subdomains__2022-03-13_23-40-25.csv',
 'johnsankey.co.uk-refdomains-subdomains__2022-03-18_15-15-47.csv']

With the files listed, we’ll now read each one individually using a for loop, and add these to a dataframe.

While reading in the file we’ll use some string manipulation to create a new column with the site name of the data we’re importing.

ahrefs_df_lst = list()
ahrefs_colnames = list()

for filename in ahrefs_filenames:
    df = pd.read_csv(ahrefs_path + filename)
    df['site'] = filename
    df['site'] = df['site'].str.replace('www.', '', regex = False)    
    df['site'] = df['site'].str.replace('.csv', '', regex = False)
    df['site'] = df['site'].str.replace('-.+', '', regex = True)
    ahrefs_colnames.append(df.columns)
    ahrefs_df_lst.append(df)

ahrefs_df_raw = pd.concat(ahrefs_df_lst)
ahrefs_df_raw
ahrefs dofollow raw data

Image from Ahrefs, May 2022

Now we have the raw data from each site in a single dataframe. The next step is to tidy up the column names and make them a bit friendlier to work with.

Although the repetition could be eliminated with a custom function or a list comprehension, it is good practice and easier for beginner SEO Pythonistas to see what’s happening step by step. As they say, “repetition is the mother of mastery,” so get practicing!

competitor_ahrefs_cleancols = ahrefs_df_raw
competitor_ahrefs_cleancols.columns = [col.lower() for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace(' ','_') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('.','_') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('__','_') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('(','') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace(')','') for col in competitor_ahrefs_cleancols.columns]
competitor_ahrefs_cleancols.columns = [col.replace('%','') for col in competitor_ahrefs_cleancols.columns]

The count column and having a single value column (‘project’) are useful for groupby and aggregation operations.

competitor_ahrefs_cleancols['rd_count'] = 1
competitor_ahrefs_cleancols['project'] = target_name

competitor_ahrefs_cleancols
Ahrefs competitor dataImage from Ahrefs, May 2022Ahrefs competitor data

The columns are cleaned up, so now we’ll clean up the row data.

competitor_ahrefs_clean_dtypes = competitor_ahrefs_cleancols

For referring domains, we’re replacing hyphens with zero and setting the data type as an integer (i.e., whole number).

This will be repeated for linked domains, also.

competitor_ahrefs_clean_dtypes['dofollow_ref_domains'] = np.where(competitor_ahrefs_clean_dtypes['dofollow_ref_domains'] == '-',
                                                           0, competitor_ahrefs_clean_dtypes['dofollow_ref_domains'])
competitor_ahrefs_clean_dtypes['dofollow_ref_domains'] = competitor_ahrefs_clean_dtypes['dofollow_ref_domains'].astype(int)



# linked_domains

competitor_ahrefs_clean_dtypes['dofollow_linked_domains'] = np.where(competitor_ahrefs_clean_dtypes['dofollow_linked_domains'] == '-',
                                                           0, competitor_ahrefs_clean_dtypes['dofollow_linked_domains'])
competitor_ahrefs_clean_dtypes['dofollow_linked_domains'] = competitor_ahrefs_clean_dtypes['dofollow_linked_domains'].astype(int)

 

First seen gives us a date point at which links were found, which we can use for time series plotting and deriving the link age.

We’ll convert to date format using the to_datetime function.

# first_seen
competitor_ahrefs_clean_dtypes['first_seen'] = pd.to_datetime(competitor_ahrefs_clean_dtypes['first_seen'], 
                                                              format="%d/%m/%Y %H:%M")
competitor_ahrefs_clean_dtypes['first_seen'] = competitor_ahrefs_clean_dtypes['first_seen'].dt.normalize()
competitor_ahrefs_clean_dtypes['month_year'] = competitor_ahrefs_clean_dtypes['first_seen'].dt.to_period('M')


To calculate the link_age we’ll simply deduct the first seen date from today’s date and convert the difference into a number.

# link age
competitor_ahrefs_clean_dtypes['link_age'] = dt.datetime.now() - competitor_ahrefs_clean_dtypes['first_seen']
competitor_ahrefs_clean_dtypes['link_age'] = competitor_ahrefs_clean_dtypes['link_age']
competitor_ahrefs_clean_dtypes['link_age'] = competitor_ahrefs_clean_dtypes['link_age'].astype(int)
competitor_ahrefs_clean_dtypes['link_age'] = (competitor_ahrefs_clean_dtypes['link_age']/(3600 * 24 * 1000000000)).round(0)

The target column helps us distinguish the “client” site vs competitors which is useful for visualization later.

competitor_ahrefs_clean_dtypes['target'] = np.where(competitor_ahrefs_clean_dtypes['site'].str.contains('johns'),
                                                                                            1, 0)
competitor_ahrefs_clean_dtypes['target'] = competitor_ahrefs_clean_dtypes['target'].astype('category')

competitor_ahrefs_clean_dtypes
Ahrefs clean data typesImage from Ahrefs, May 2022Ahrefs clean data types

Now that the data is cleaned up both in terms of column titles and row values we’re ready to set forth and start analyzing.

Link Quality

We start with Link Quality which we’ll accept Domain Rating (DR) as the measure.

Let’s start by inspecting the distributive properties of DR by plotting their distribution using the geom_bokplot function.

comp_dr_dist_box_plt = (
    ggplot(competitor_ahrefs_analysis.loc[competitor_ahrefs_analysis['dr'] > 0], 
           aes(x = 'reorder(site, dr)', y = 'dr', colour="target")) + 
    geom_boxplot(alpha = 0.6) +
    scale_y_continuous() +   
    theme(legend_position = 'none', 
          axis_text_x=element_text(rotation=90, hjust=1)
         ))

comp_dr_dist_box_plt.save(filename="images/4_comp_dr_dist_box_plt.png", 
                           height=5, width=10, units="in", dpi=1000)
comp_dr_dist_box_plt
competition distribution typesImage from Ahrefs, May 2022competition distribution types

The plot compares the site’s statistical properties side by side, and most notably, the interquartile range showing where most referring domains fall in terms of domain rating.

We also see that John Sankey has the fourth-highest median domain rating, which compares well with link quality against other sites.

William Garvey has the most diverse range of DR compared with other domains, indicating ever so slightly more relaxed criteria for link acquisition. Who knows.

Link Volumes

That’s quality. What about the volume of links from referring domains?

To tackle that, we’ll compute a running sum of referring domains using the groupby function.

competitor_count_cumsum_df = competitor_ahrefs_analysis

competitor_count_cumsum_df = competitor_count_cumsum_df.groupby(['site', 'month_year'])['rd_count'].sum().reset_index()

The expanding function allows the calculation window to grow with the number of rows which is how we achieve our running sum.

competitor_count_cumsum_df['count_runsum'] = competitor_count_cumsum_df['rd_count'].expanding().sum()

competitor_count_cumsum_df
Ahrefs cumulative sum dataImage from Ahrefs, May 2022Ahrefs cumulative sum data

The result is a data frame with the site, month_year and count_runsum (the running sum), which is in the perfect format to feed the graph.

competitor_count_cumsum_plt = (
    ggplot(competitor_count_cumsum_df, aes(x = 'month_year', y = 'count_runsum', 
                                           group = 'site', colour="site")) + 
    geom_line(alpha = 0.6, size = 2) +
    labs(y = 'Running Sum of Referring Domains', x = 'Month Year') + 
    scale_y_continuous() + 
    scale_x_date() +
    theme(legend_position = 'right', 
          axis_text_x=element_text(rotation=90, hjust=1)
         ))
competitor_count_cumsum_plt.save(filename="images/5_count_cumsum_smooth_plt.png", 
                           height=5, width=10, units="in", dpi=1000)

competitor_count_cumsum_plt
competitor graph Image from Ahrefs, May 2022competitor graph

The plot shows the number of referring domains for each site since 2014.

I find quite interesting the different starting positions for each site when they start acquiring links.

For example, William Garvey started with over 5,000 domains. I’d love to know who their PR agency is!

We can also see the rate of growth. For example, although Hadley Rose started link acquisition in 2018, things really took off around mid-2021.

More, More, And More

You can always do more scientific analysis.

For example, one immediate and natural extension of the above would be to combine both the quality (DR) and the quantity (volume) for a more holistic view of how the sites compare in terms of offsite SEO.

Other extensions would be to model the qualities of those referring domains for both your own and your competitor sites to see which link features (such as the number of words or relevance of the linking content) could explain the difference in visibility between you and your competitors.

This model extension would be a good application of these machine learning techniques.

More resources:


Featured Image: F8 studio/Shutterstock

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window,document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);

if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
}else{
fbq(‘dataProcessingOptions’, []);
}

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘competitor-backlinks-python’,
content_category: ‘linkbuilding marketing-analytics seo’
});

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

HubSpot Rolls Out AI-Powered Marketing Tools

Published

on

By

HubSpot Rolls Out AI-Powered Marketing Tools

HubSpot announced a push into AI this week at its annual Inbound marketing conference, launching “Breeze.”

Breeze is an artificial intelligence layer integrated across the company’s marketing, sales, and customer service software.

According to HubSpot, the goal is to provide marketers with easier, faster, and more unified solutions as digital channels become oversaturated.

Karen Ng, VP of Product at HubSpot, tells Search Engine Journal in an interview:

“We’re trying to create really powerful tools for marketers to rise above the noise that’s happening now with a lot of this AI-generated content. We might help you generate titles or a blog content…but we do expect kind of a human there to be a co-assist in that.”

Breeze AI Covers Copilot, Workflow Agents, Data Enrichment

The Breeze layer includes three main components.

Breeze Copilot

An AI assistant that provides personalized recommendations and suggestions based on data in HubSpot’s CRM.

Ng explained:

“It’s a chat-based AI companion that assists with tasks everywhere – in HubSpot, the browser, and mobile.”

Breeze Agents

A set of four agents that can automate entire workflows like content generation, social media campaigns, prospecting, and customer support without human input.

Ng added the following context:

“Agents allow you to automate a lot of those workflows. But it’s still, you know, we might generate for you a content backlog. But taking a look at that content backlog, and knowing what you publish is still a really important key of it right now.”

Breeze Intelligence

Combines HubSpot customer data with third-party sources to build richer profiles.

Ng stated:

“It’s really important that we’re bringing together data that can be trusted. We know your AI is really only as good as the data that it’s actually trained on.”

Addressing AI Content Quality

While prioritizing AI-driven productivity, Ng acknowledged the need for human oversight of AI content:

“We really do need eyes on it still…We think of that content generation as still human-assisted.”

Marketing Hub Updates

Beyond Breeze, HubSpot is updating Marketing Hub with tools like:

  • Content Remix to repurpose videos into clips, audio, blogs, and more.
  • AI video creation via integration with HeyGen
  • YouTube and Instagram Reels publishing
  • Improved marketing analytics and attribution

The announcements signal HubSpot’s AI-driven vision for unifying customer data.

But as Ng tells us, “We definitely think a lot about the data sources…and then also understand your business.”

HubSpot’s updates are rolling out now, with some in public beta.


Featured Image: Poetra.RH/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Holistic Marketing Strategies That Drive Revenue [SaaS Case Study]

Published

on

By

Holistic Marketing Strategies That Drive Revenue [SaaS Case Study]

Brands are seeing success driving quality pipeline and revenue growth. It’s all about building an intentional customer journey, aligning sales + marketing, plus measuring ROI. 

Check out this executive panel on-demand, as we show you how we do it. 

With Ryann Hogan, senior demand generation manager at CallRail, and our very own Heather Campbell and Jessica Cromwell, we chatted about driving demand, lead gen, revenue, and proper attribution

This B2B leadership forum provided insights you can use in your strategy tomorrow, like:

  • The importance of the customer journey, and the keys to matching content to your ideal personas.
  • How to align marketing and sales efforts to guide leads through an effective journey to conversion.
  • Methods to measure ROI and determine if your strategies are delivering results.

While the case study is SaaS, these strategies are for any brand.

Watch on-demand and be part of the conversation. 

Join Us For Our Next Webinar!

Navigating SERP Complexity: How to Leverage Search Intent for SEO

Join us live as we break down all of these complexities and reveal how to identify valuable opportunities in your space. We’ll show you how to tap into the searcher’s motivation behind each query (and how Google responds to it in kind).

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

What Marketers Need to Learn From Hunter S. Thompson

Published

on

What Marketers Need to Learn From Hunter S. Thompson

We’ve passed the high-water mark of content marketing—at least, content marketing in its current form.

After thirteen years in content marketing, I think it’s fair to say that most of the content on company blogs was created by people with zero firsthand experience of their subject matter. We have built a profession of armchair commentators, a class of marketers who exist almost entirely in a world of theory and abstraction.

I count myself among their number. I have hundreds of bylines about subfloor moisture management, information security, SaaS pricing models, agency resource management. I am an expert in none of these topics.

This has been the happy reality of content marketing for over a decade, a natural consequence of the incentives created by early Google Search. Historically, being a great content marketer required precisely no subject matter expertise. It was enough to read widely and write quickly.

Mountains of organic traffic have been built on the backs of armchair commentators like myself. Time spent doing deep, detailed research was, generally speaking, wasted, because 80% of the returns came from simply shuffling other people’s ideas around and slapping a few keyword-targeted H2s in the right places.

But this doesn’t work today.

For all of its flaws, generative AI is an excellent, truly world-class armchair commentator. If the job-to-be-done is reading a dozen articles and how-to’s and turning them into something semi-original and fairly coherent, AI really is the best tool for the job. Humans cannot out-copycat generative AI.

Put another way, the role of the content marketer as a curator has been rendered obsolete. So where do we go from here?

“The only way to write honestly about the scene is to be part of it.”
—Hunter S. Thompson, Hell’s Angels“The only way to write honestly about the scene is to be part of it.”
—Hunter S. Thompson, Hell’s Angels

Hunter S. Thompson popularised the idea of gonzo journalism, “a style of journalism that is written without claims of objectivity, often including the reporter as part of the story using a first-person narrative.”

In other words, Hunter was the story.

When asked to cover the rising phenomenon of the Hell’s Angels, he became a Hell’s Angel. During his coverage of the ‘72 presidential campaign, he openly supported his preferred candidate, George McGovern, and actively disparaged Richard Nixon. His chronicle of the Kentucky Derby focused almost entirely on his own debauchery and chaos-making—a story that has outlasted any factual account of the race itself.

In the same vein, content marketers today need to become their stories.

It’s a content marketing truism that it’s unreasonable to expect writers to become experts. There’s a superficial level of truth to that claim—no content marketer can acquire a decade’s worth of experience in a few days or weeks—but there are great benefits awaiting any company willing to challenge that truism very, very seriously.

As Thompson proved, short, intense periods of firsthand experience can yield incredible insights and stories. So what would happen if you radically reduced your content output and dedicated half of your content team’s time to research and experimentation? If their job was doing things worth writing about, instead of just writing? If skin-in-the-game, no matter how small, was a prerequisite of the role?

We’re already seeing this shift.

“The closest analogy to the ideal would be a film director/producer who writes his own scripts, does his own camera work and somehow manages to film himself in action, as the protagonist or at least a main character.”
—Hunter S. Thompson, The Great Shark Hunt“The closest analogy to the ideal would be a film director/producer who writes his own scripts, does his own camera work and somehow manages to film himself in action, as the protagonist or at least a main character.”
—Hunter S. Thompson, The Great Shark Hunt

Every week, I see more companies hiring marketers who are true, bonafide subject matter experts (I include the Ahrefs content team here—for the majority of our team, “writing” is a skill secondary to a decade of hands-on search and marketing experience). They are expensive, hard to find, and in the era of AI, worth every cent.

I see a growing expectation that marketers will document their experiences and experiments on social media, creating meta-content that often outperforms the “real” content. I see more companies willing to share subjective experiences and stories, and avoid competing solely on the sharing of objective, factual information. I see companies spending money to promote the personal brands of in-house creators, actively encouraging parasocial relationships as their corporate brand accounts lay dormant.

These are ideas that made no sense in the old model of content marketing, but they make much more sense today. This level of effort is fast becoming the only way to gain any kind of moat, creating material that doesn’t already exist on a dozen other company blogs.

In the era of information abundance, our need for information is relatively easy to sate; but we have a near-limitless hunger for entertainment, and personal interaction, and weird, pattern-interrupting experiences.

Gonzo content marketing can deliver.

“But what was the story? Nobody had bothered to say. So we would have to drum it up on our own.”
—Hunter S. Thompson, Fear and Loathing in Las Vegas“But what was the story? Nobody had bothered to say. So we would have to drum it up on our own.”
—Hunter S. Thompson, Fear and Loathing in Las Vegas

 

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending