Connect with us

SEO

Using Python + Streamlit To Find Striking Distance Keyword Opportunities

Published

on

Using Python + Streamlit To Find Striking Distance Keyword Opportunities

Python is an excellent tool to automate repetitive tasks as well as gain additional insights into data.

In this article, you’ll learn how to build a tool to check which keywords are close to ranking in positions one to three and advises whether there is an opportunity to naturally work those keywords into the page.

It’s perfect for Python beginners and pros alike and is a great introduction to using Python for SEO.

If you’d just like to get stuck in there’s a handy Streamlit app available for the code. This is simple to use and requires no coding experience.

There’s also a Google Colaboratory Sheet if you’d like to poke around with the code. If you can crawl a website, you can use this script!

Here’s an example of what we’ll be making today:

Screenshot from Microsoft Excel, October 2021An Excel sheet documenting onpage keywords opportunites generated with Python

These keywords are found in the page title and H1, but not in the copy. Adding these keywords naturally to the existing copy would be an easy way to increase relevancy for these keywords.

By taking the hint from search engines and naturally including any missing keywords a site already ranks for, we increase the confidence of search engines to rank those keywords higher in the SERPs.

Advertisement

This report can be created manually, but it’s pretty time-consuming.

So, we’re going to automate the process using a Python SEO script.

Preview Of The Output

This is a sample of what the final output will look like after running the report:

Excel sheet showing and example of keywords that can be optimised by using the striking distance reportScreenshot from Microsoft Excel, October 2021Excel sheet showing and example of keywords that can be optimised by using the striking distance report

The final output takes the top five opportunities by search volume for each page and neatly lays each one horizontally along with the estimated search volume.

It also shows the total search volume of all keywords a page has within striking distance, as well as the total number of keywords within reach.

The top five keywords by search volume are then checked to see if they are found in the title, H1, or copy, then flagged TRUE or FALSE.

This is great for finding quick wins! Just add the missing keyword naturally into the page copy, title, or H1.

Getting Started

The setup is fairly straightforward. We just need a crawl of the site (ideally with a custom extraction for the copy you’d like to check), and an exported file of all keywords a site ranks for.

This post will walk you through the setup, the code, and will link to a Google Colaboratory sheet if you just want to get stuck in without coding it yourself.

Advertisement

To get started you will need:

We’ve named this the Striking Distance Report as it flags keywords that are easily within striking distance.

(We have defined striking distance as keywords that rank in positions four to 20, but have made this a configurable option in case you would like to define your own parameters.)

Striking Distance SEO Report: Getting Started

1. Crawl The Target Website

  • Set a custom extractor for the page copy (optional, but recommended).
  • Filter out pagination pages from the crawl.

2. Export All Keywords The Site Ranks For Using Your Favorite Provider

  • Filter keywords that trigger as a site link.
  • Remove keywords that trigger as an image.
  • Filter branded keywords.
  • Use both exports to create an actionable Striking Distance report from the keyword and crawl data with Python.

Crawling The Site

I’ve opted to use Screaming Frog to get the initial crawl. Any crawler will work, so long as the CSV export uses the same column names or they’re renamed to match.

The script expects to find the following columns in the crawl CSV export:

"Address", "Title 1", "H1-1", "Copy 1", "Indexability"

Crawl Settings

The first thing to do is to head over to the main configuration settings within Screaming Frog:

Configuration > Spider > Crawl

The main settings to use are:

Crawl Internal Links, Canonicals, and the Pagination (Rel Next/Prev) setting.

Advertisement

(The script will work with everything else selected, but the crawl will take longer to complete!)

Recommended Screaming Frog Crawl SettingsScreenshot from Screaming Frog, October 2021Recommended Screaming Frog Crawl Settings

Next, it’s on to the Extraction tab.

Configuration > Spider > Extraction

Recommended Screaming Frog Extraction Crawl SettingsScreenshot from Screaming Frog, October 2021Recommended Screaming Frog Extraction Crawl Settings

At a bare minimum, we need to extract the page title, H1, and calculate whether the page is indexable as shown below.

Indexability is useful because it’s an easy way for the script to identify which URLs to drop in one go, leaving only keywords that are eligible to rank in the SERPs.

If the script cannot find the indexability column, it’ll still work as normal but won’t differentiate between pages that can and cannot rank.

Setting A Custom Extractor For Page Copy

In order to check whether a keyword is found within the page copy, we need to set a custom extractor in Screaming Frog.

Configuration > Custom > Extraction

Name the extractor “Copy” as seen below.

Screaming Frog Custom Extraction Showing Default Options for Extracting the Page CopyScreenshot from Screaming Frog, October 2021Screaming Frog Custom Extraction Showing Default Options for Extracting the Page Copy

Important: The script expects the extractor to be named “Copy” as above, so please double check!

Lastly, make sure Extract Text is selected to export the copy as text, rather than HTML.

Advertisement

There are many guides on using custom extractors online if you need help setting one up, so I won’t go over it again here.

Once the extraction has been set it’s time to crawl the site and export the HTML file in CSV format.

Exporting The CSV File

Exporting the CSV file is as easy as changing the drop-down menu displayed underneath Internal to HTML and pressing the Export button.

Internal > HTML > Export

Screaming Frog - Export Internal HTML SettingsScreenshot from Screaming Frog, October 2021Screaming Frog - Export Internal HTML Settings

After clicking Export, It’s important to make sure the type is set to CSV format.

The export screen should look like the below:

Screaming Frog Internal HTML CSV Export SettingsScreenshot from Screaming Frog, October 2021Screaming Frog Internal HTML CSV Export Settings

Tip 1: Filtering Out Pagination Pages

I recommend filtering out pagination pages from your crawl either by selecting Respect Next/Prev under the Advanced settings (or just deleting them from the CSV file, if you prefer).

Screaming Frog Settings to Respect Rel / PrevScreenshot from Screaming Frog, October 2021Screaming Frog Settings to Respect Rel / Prev

Tip 2: Saving The Crawl Settings

Once you have set the crawl up, it’s worth just saving the crawl settings (which will also remember the custom extraction).

This will save a lot of time if you want to use the script again in the future.

Advertisement

File > Configuration > Save As

How to save a configuration file in screaming frogScreenshot from Screaming Frog, October 2021How to save a configuration file in screaming frog

Exporting Keywords

Once we have the crawl file, the next step is to load your favorite keyword research tool and export all of the keywords a site ranks for.

The goal here is to export all the keywords a site ranks for, filtering out branded keywords and any which triggered as a sitelink or image.

For this example, I’m using the Organic Keyword Report in Ahrefs, but it will work just as well with Semrush if that’s your preferred tool.

In Ahrefs, enter the domain you’d like to check in Site Explorer and choose Organic Keywords.

Ahrefs Site Explorer SettingsScreenshot from Ahrefs.com, October 2021Ahrefs Site Explorer Settings

Site Explorer > Organic Keywords

Ahrefs - How Setting to Export Organic Keywords a Site Ranks ForScreenshot from Ahrefs.com, October 2021Ahrefs - How Setting to Export Organic Keywords a Site Ranks For

This will bring up all keywords the site is ranking for.

Filtering Out Sitelinks And Image links

The next step is to filter out any keywords triggered as a sitelink or an image pack.

The reason we need to filter out sitelinks is that they have no influence on the parent URL ranking. This is because only the parent page technically ranks for the keyword, not the sitelink URLs displayed under it.

Filtering out sitelinks will ensure that we are optimizing the correct page.

Ahrefs Screenshot Demonstrating Pages Ranking for Sitelink KeywordsScreenshot from Ahrefs.com, October 2021Ahrefs Screenshot Demonstrating Pages Ranking for Sitelink Keywords

Here’s how to do it in Ahrefs.

Image showing how to exclude images and sitelinks from a keyword exportScreenshot from Ahrefs.com, October 2021Image showing how to exclude images and sitelinks from a keyword export

Lastly, I recommend filtering out any branded keywords. You can do this by filtering the CSV output directly, or by pre-filtering in the keyword tool of your choice before the export.

Finally, when exporting make sure to choose Full Export and the UTF-8 format as shown below.

Advertisement
Image showing how to export keywords in UTF-8 format as a csv fileScreenshot from Ahrefs.com, October 2021Image showing how to export keywords in UTF-8 format as a csv file

By default, the script works with Ahrefs (v1/v2) and Semrush keyword exports. It can work with any keyword CSV file as long as the column names the script expects are present.

Processing

The following instructions pertain to running a Google Colaboratory sheet to execute the code.

There is now a simpler option for those that prefer it in the form of a Streamlit app. Simply follow the instructions provided to upload your crawl and keyword file.

Now that we have our exported files, all that’s left to be done is to upload them to the Google Colaboratory sheet for processing.

Select Runtime > Run all from the top navigation to run all cells in the sheet.

Image showing how to run the stirking distance Python script from Google CollaboratoryScreenshot from Colab.research.google.com, October 2021Image showing how to run the stirking distance Python script from Google Collaboratory

The script will prompt you to upload the keyword CSV from Ahrefs or Semrush first and the crawl file afterward.

Image showing how to upload the csv files to Google CollaboratoryScreenshot from Colab.research.google.com, October 2021Image showing how to upload the csv files to Google Collaboratory

That’s it! The script will automatically download an actionable CSV file you can use to optimize your site.

Image showing the Striking Distance final outputScreenshot from Microsoft Excel, October 2021Image showing the Striking Distance final output

Once you’re familiar with the whole process, using the script is really straightforward.

Code Breakdown And Explanation

If you’re learning Python for SEO and interested in what the code is doing to produce the report, stick around for the code walkthrough!

Install The Libraries

Let’s install pandas to get the ball rolling.

!pip install pandas

Import The Modules

Next, we need to import the required modules.

import pandas as pd
from pandas import DataFrame, Series
from typing import Union
from google.colab import files

Set The Variables

Now it’s time to set the variables.

Advertisement

The script considers any keywords between positions four and 20 as within striking distance.

Changing the variables here will let you define your own range if desired. It’s worth experimenting with the settings to get the best possible output for your needs.

# set all variables here
min_volume = 10  # set the minimum search volume
min_position = 4  # set the minimum position  / default = 4
max_position = 20 # set the maximum position  / default = 20
drop_all_true = True  # If all checks (h1/title/copy) are true, remove the recommendation (Nothing to do)
pagination_filters = "filterby|page|p="  # filter patterns used to detect and drop paginated pages

Upload The Keyword Export CSV File

The next step is to read in the list of keywords from the CSV file.

It is set up to accept an Ahrefs report (V1 and V2) as well as a Semrush export.

This code reads in the CSV file into a Pandas DataFrame.

upload = files.upload()
upload = list(upload.keys())[0]
df_keywords = pd.read_csv(
    (upload),
    error_bad_lines=False,
    low_memory=False,
    encoding="utf8",
    dtype={
        "URL": "str",
        "Keyword": "str",
        "Volume": "str",
        "Position": int,
        "Current URL": "str",
        "Search Volume": int,
    },
)
print("Uploaded Keyword CSV File Successfully!")

If everything went to plan, you’ll see a preview of the DataFrame created from the keyword CSV export. 

Dataframe showing sucessful upload of the keyword export fileScreenshot from Colab.research.google.com, October 2021Dataframe showing sucessful upload of the keyword export file

Upload The Crawl Export CSV File

Once the keywords have been imported, it’s time to upload the crawl file.

This fairly simple piece of code reads in the crawl with some error handling option and creates a Pandas DataFrame named df_crawl.

upload = files.upload()
upload = list(upload.keys())[0]
df_crawl = pd.read_csv(
    (upload),
        error_bad_lines=False,
        low_memory=False,
        encoding="utf8",
        dtype="str",
    )
print("Uploaded Crawl Dataframe Successfully!")

Once the CSV file has finished uploading, you’ll see a preview of the DataFrame.

Image showing a dataframe of the crawl file being uploaded successfullyScreenshot from Colab.research.google.com, October 2021Image showing a dataframe of the crawl file being uploaded successfully

Clean And Standardize The Keyword Data

The next step is to rename the column names to ensure standardization between the most common types of file exports.

Essentially, we’re getting the keyword DataFrame into a good state and filtering using cutoffs defined by the variables.

Advertisement
df_keywords.rename(
    columns={
        "Current position": "Position",
        "Current URL": "URL",
        "Search Volume": "Volume",
    },
    inplace=True,
)

# keep only the following columns from the keyword dataframe
cols = "URL", "Keyword", "Volume", "Position"
df_keywords = df_keywords.reindex(columns=cols)

try:
    # clean the data. (v1 of the ahrefs keyword export combines strings and ints in the volume column)
    df_keywords["Volume"] = df_keywords["Volume"].str.replace("0-10", "0")
except AttributeError:
    pass

# clean the keyword data
df_keywords = df_keywords[df_keywords["URL"].notna()]  # remove any missing values
df_keywords = df_keywords[df_keywords["Volume"].notna()]  # remove any missing values
df_keywords = df_keywords.astype({"Volume": int})  # change data type to int
df_keywords = df_keywords.sort_values(by="Volume", ascending=False)  # sort by highest vol to keep the top opportunity

# make new dataframe to merge search volume back in later
df_keyword_vol = df_keywords[["Keyword", "Volume"]]

# drop rows if minimum search volume doesn't match specified criteria
df_keywords.loc[df_keywords["Volume"] < min_volume, "Volume_Too_Low"] = "drop"
df_keywords = df_keywords[~df_keywords["Volume_Too_Low"].isin(["drop"])]

# drop rows if minimum search position doesn't match specified criteria
df_keywords.loc[df_keywords["Position"] <= min_position, "Position_Too_High"] = "drop"
df_keywords = df_keywords[~df_keywords["Position_Too_High"].isin(["drop"])]
# drop rows if maximum search position doesn't match specified criteria
df_keywords.loc[df_keywords["Position"] >= max_position, "Position_Too_Low"] = "drop"
df_keywords = df_keywords[~df_keywords["Position_Too_Low"].isin(["drop"])]

Clean And Standardize The Crawl Data

Next, we need to clean and standardize the crawl data.

Essentially, we use reindex to only keep the “Address,” “Indexability,” “Page Title,” “H1-1,” and “Copy 1” columns, discarding the rest.

We use the handy “Indexability” column to only keep rows that are indexable. This will drop canonicalized URLs, redirects, and so on. I recommend enabling this option in the crawl.

Lastly, we standardize the column names so they’re a little nicer to work with.

# keep only the following columns from the crawl dataframe
cols = "Address", "Indexability", "Title 1", "H1-1", "Copy 1"
df_crawl = df_crawl.reindex(columns=cols)
# drop non-indexable rows
df_crawl = df_crawl[~df_crawl["Indexability"].isin(["Non-Indexable"])]
# standardise the column names
df_crawl.rename(columns={"Address": "URL", "Title 1": "Title", "H1-1": "H1", "Copy 1": "Copy"}, inplace=True)
df_crawl.head()

Group The Keywords

As we approach the final output, it’s necessary to group our keywords together to calculate the total opportunity for each page.

Here, we’re calculating how many keywords are within striking distance for each page, along with the combined search volume.

# groups the URLs (remove the dupes and combines stats)
# make a copy of the keywords dataframe for grouping - this ensures stats can be merged back in later from the OG df
df_keywords_group = df_keywords.copy()
df_keywords_group["KWs in Striking Dist."] = 1  # used to count the number of keywords in striking distance
df_keywords_group = (
    df_keywords_group.groupby("URL")
    .agg({"Volume": "sum", "KWs in Striking Dist.": "count"})
    .reset_index()
)
df_keywords_group.head()
DataFrame showing how many keywords were found within striking distanceScreenshot from Colab.research.google.com, October 2021DataFrame showing how many keywords were found within striking distance

Once complete, you’ll see a preview of the DataFrame.

Display Keywords In Adjacent Rows

We use the grouped data as the basis for the final output. We use Pandas.unstack to reshape the DataFrame to display the keywords in the style of a GrepWords export.

DataFrame showing a grepwords type-view of keywords laid out horizontallyScreenshot from Colab.research.google.com, October 2021DataFrame showing a grepwords type-view of keywords laid out horizontally
# create a new df, combine the merged data with the original data. display in adjacent rows ala grepwords
df_merged_all_kws = df_keywords_group.merge(
    df_keywords.groupby("URL")["Keyword"]
    .apply(lambda x: x.reset_index(drop=True))
    .unstack()
    .reset_index()
)

# sort by biggest opportunity
df_merged_all_kws = df_merged_all_kws.sort_values(
    by="KWs in Striking Dist.", ascending=False
)

# reindex the columns to keep just the top five keywords
cols = "URL", "Volume", "KWs in Striking Dist.", 0, 1, 2, 3, 4
df_merged_all_kws = df_merged_all_kws.reindex(columns=cols)

# create union and rename the columns
df_striking: Union[Series, DataFrame, None] = df_merged_all_kws.rename(
    columns={
        "Volume": "Striking Dist. Vol",
        0: "KW1",
        1: "KW2",
        2: "KW3",
        3: "KW4",
        4: "KW5",
    }
)

# merges striking distance df with crawl df to merge in the title, h1 and category description
df_striking = pd.merge(df_striking, df_crawl, on="URL", how="inner")

Set The Final Column Order And Insert Placeholder Columns

Lastly, we set the final column order and merge in the original keyword data.

There are a lot of columns to sort and create!

Advertisement
# set the final column order and merge the keyword data in

cols = [
    "URL",
    "Title",
    "H1",
    "Copy",
    "Striking Dist. Vol",
    "KWs in Striking Dist.",
    "KW1",
    "KW1 Vol",
    "KW1 in Title",
    "KW1 in H1",
    "KW1 in Copy",
    "KW2",
    "KW2 Vol",
    "KW2 in Title",
    "KW2 in H1",
    "KW2 in Copy",
    "KW3",
    "KW3 Vol",
    "KW3 in Title",
    "KW3 in H1",
    "KW3 in Copy",
    "KW4",
    "KW4 Vol",
    "KW4 in Title",
    "KW4 in H1",
    "KW4 in Copy",
    "KW5",
    "KW5 Vol",
    "KW5 in Title",
    "KW5 in H1",
    "KW5 in Copy",
]

# re-index the columns to place them in a logical order + inserts new blank columns for kw checks.
df_striking = df_striking.reindex(columns=cols)

Merge In The Keyword Data For Each Column

This code merges the keyword volume data back into the DataFrame. It’s more or less the equivalent of an Excel VLOOKUP function.

# merge in keyword data for each keyword column (KW1 - KW5)
df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW1", right_on="Keyword", how="left")
df_striking['KW1 Vol'] = df_striking['Volume']
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW2", right_on="Keyword", how="left")
df_striking['KW2 Vol'] = df_striking['Volume']
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW3", right_on="Keyword", how="left")
df_striking['KW3 Vol'] = df_striking['Volume']
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW4", right_on="Keyword", how="left")
df_striking['KW4 Vol'] = df_striking['Volume']
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
df_striking = pd.merge(df_striking, df_keyword_vol, left_on="KW5", right_on="Keyword", how="left")
df_striking['KW5 Vol'] = df_striking['Volume']
df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)

Clean The Data Some More

The data requires additional cleaning to populate empty values, (NaNs), as empty strings. This improves the readability of the final output by creating blank cells, instead of cells populated with NaN string values.

Next, we convert the columns to lowercase so that they match when checking whether a target keyword is featured in a specific column.

# replace nan values with empty strings
df_striking = df_striking.fillna("")
# drop the title, h1 and category description to lower case so kws can be matched to them
df_striking["Title"] = df_striking["Title"].str.lower()
df_striking["H1"] = df_striking["H1"].str.lower()
df_striking["Copy"] = df_striking["Copy"].str.lower()

Check Whether The Keyword Appears In The Title/H1/Copy and Return True Or False

This code checks if the target keyword is found in the page title/H1 or copy.

It’ll flag true or false depending on whether a keyword was found within the on-page elements.

df_striking["KW1 in Title"] = df_striking.apply(lambda row: row["KW1"] in row["Title"], axis=1)
df_striking["KW1 in H1"] = df_striking.apply(lambda row: row["KW1"] in row["H1"], axis=1)
df_striking["KW1 in Copy"] = df_striking.apply(lambda row: row["KW1"] in row["Copy"], axis=1)
df_striking["KW2 in Title"] = df_striking.apply(lambda row: row["KW2"] in row["Title"], axis=1)
df_striking["KW2 in H1"] = df_striking.apply(lambda row: row["KW2"] in row["H1"], axis=1)
df_striking["KW2 in Copy"] = df_striking.apply(lambda row: row["KW2"] in row["Copy"], axis=1)
df_striking["KW3 in Title"] = df_striking.apply(lambda row: row["KW3"] in row["Title"], axis=1)
df_striking["KW3 in H1"] = df_striking.apply(lambda row: row["KW3"] in row["H1"], axis=1)
df_striking["KW3 in Copy"] = df_striking.apply(lambda row: row["KW3"] in row["Copy"], axis=1)
df_striking["KW4 in Title"] = df_striking.apply(lambda row: row["KW4"] in row["Title"], axis=1)
df_striking["KW4 in H1"] = df_striking.apply(lambda row: row["KW4"] in row["H1"], axis=1)
df_striking["KW4 in Copy"] = df_striking.apply(lambda row: row["KW4"] in row["Copy"], axis=1)
df_striking["KW5 in Title"] = df_striking.apply(lambda row: row["KW5"] in row["Title"], axis=1)
df_striking["KW5 in H1"] = df_striking.apply(lambda row: row["KW5"] in row["H1"], axis=1)
df_striking["KW5 in Copy"] = df_striking.apply(lambda row: row["KW5"] in row["Copy"], axis=1)

Delete True/False Values If There Is No Keyword

This will delete true/false values when there is no keyword adjacent.

# delete true / false values if there is no keyword
df_striking.loc[df_striking["KW1"] == "", ["KW1 in Title", "KW1 in H1", "KW1 in Copy"]] = ""
df_striking.loc[df_striking["KW2"] == "", ["KW2 in Title", "KW2 in H1", "KW2 in Copy"]] = ""
df_striking.loc[df_striking["KW3"] == "", ["KW3 in Title", "KW3 in H1", "KW3 in Copy"]] = ""
df_striking.loc[df_striking["KW4"] == "", ["KW4 in Title", "KW4 in H1", "KW4 in Copy"]] = ""
df_striking.loc[df_striking["KW5"] == "", ["KW5 in Title", "KW5 in H1", "KW5 in Copy"]] = ""
df_striking.head()

Drop Rows If All Values == True

This configurable option is really useful for reducing the amount of QA time required for the final output by dropping the keyword opportunity from the final output if it is found in all three columns.

def true_dropper(col1, col2, col3):
    drop = df_striking.drop(
        df_striking[
            (df_striking[col1] == True)
            & (df_striking[col2] == True)
            & (df_striking[col3] == True)
        ].index
    )
    return drop

if drop_all_true == True:
    df_striking = true_dropper("KW1 in Title", "KW1 in H1", "KW1 in Copy")
    df_striking = true_dropper("KW2 in Title", "KW2 in H1", "KW2 in Copy")
    df_striking = true_dropper("KW3 in Title", "KW3 in H1", "KW3 in Copy")
    df_striking = true_dropper("KW4 in Title", "KW4 in H1", "KW4 in Copy")
    df_striking = true_dropper("KW5 in Title", "KW5 in H1", "KW5 in Copy")

Download The CSV File

The last step is to download the CSV file and start the optimization process.

Advertisement
df_striking.to_csv('Keywords in Striking Distance.csv', index=False)
files.download("Keywords in Striking Distance.csv")

Conclusion

If you are looking for quick wins for any website, the striking distance report is a really easy way to find them.

Don’t let the number of steps fool you. It’s not as complex as it seems. It’s as simple as uploading a crawl and keyword export to the supplied Google Colab sheet or using the Streamlit app.

The results are definitely worth it!

More Resources:


Featured Image: aurielaki/Shutterstock

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window,document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);

if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
}else{
fbq(‘dataProcessingOptions’, []);
}

fbq(‘init’, ‘1321385257908563’);

Advertisement

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘striking-distance-keywords-python’,
content_category: ‘seo-strategy technical-seo
});

Source link

SEO

Building a Better Search Engine: Lessons From Neeva’s CEO

Published

on

Building a Better Search Engine: Lessons From Neeva’s CEO

After helping to grow Google’s advertising business for over 15 years, Sridhar Ramaswamy began to feel Google’s dependence on ads was limiting the quality of search results.

Determined to prove he could achieve a better search experience without ads, Ramaswamy co-founded and launched Neeva in 2019.

As the CEO of his own search company, Ramaswamy is accountable to the users of his product who pay a monthly subscription to access Neeva.

“No ads” means Neeva doesn’t have an incentive to collect data on its users, making it the only search engine on the market that’s both ad-free with a privacy option.

Ramaswamy is currently on the conference circuit raising awareness about Neeva, and we managed to catch up with him at Collision last week in Toronto.

We profiled Neeva once before and welcomed Ramaswamy as a guest on the Search Engine Journal Show in December.

However, each time we only scratched the surface. Now, we want to dig deeper.

Advertisement

So, what makes Neeva different from the other companies — and what makes Neeva a viable alternative to Google and Bing?

What Are Neeva’s Core Values?

Many companies enter the market making lofty claims of how they’ll do right by users. Even Google once had “don’t be evil” written into its code of conduct: a promise to which some critics argue it hasn’t lived up. Google has de-emphasized “Don’t be evil” in its code of conduct, though it was never removed.

In 2021, Google was sued by three former employees over its “Don’t be evil” motto. They allege that failure to live up to the motto is the equivalent of a breach of contract.

To better understand how Neeva will continue delivering a product that puts users’ needs first, I asked Ramaswamy what Neeva’s core values are.

“It’s not something we have published, but this is something I’ve talked about a lot with Vivek [Raghunathan, co-founder of Neeva], and I feel good about saying it,” Ramaswamy began. “At our core, we think that, as a company, we want to make technology serve people.”

“I think many other technology companies, especially in the last 25, have turned rather exploitative,” he continued. “I think the ad model exemplifies this. Basically, if I can convince you and get you hooked on my product, I can pretty much do anything.”

“It’s Technology Serving People”

Make no mistake: Neeva is a for-profit organization, though Ramaswamy says its subscription-based revenue model is designed to serve people rather than advertisers.

“Yes, companies are for-profit, but I think if you set up your values to be aligned with your user, to be aligned with your customer, you’ll always serve them,” he said. “To me, that part is important. If you had to say, ‘Hey, what exemplifies what you do?’ It’s technology serving people. This is why we do things like offer a flat price for the search utility you get from us.”

Advertisement

“Technology At Scale Is Quite Inexpensive”

Many companies within the sector lead consumers to believe scaling technology is expensive, which is how some justify charging higher fees, for instance, as they grow.

It doesn’t have to be that way, Ramaswamy says, as he believes the cost of technology at scale is overblown.

“It’s our belief that technology at scale is actually quite inexpensive,” he noted. “That’s the magic of technology, but right now, the way all of these companies are structured — as they scale, they squeeze more money from you.”

“It’s not like you’re getting more value, though obviously there are exceptions,” Ramaswamy continued. “But it’s really back to the basics of how you create products that delight people. And to me, that’s an honorable living.”

From left to right: Brittany Kaiser, Own Your Data; Sridhar Ramaswamy, Neeva; Ashley Gold, Axios.

What Does Neeva Do To ‘Serve People’?

Neeva’s definition of ‘technology serving people’ is exemplified by its feedback system.

Roughly 20% of the Neeva team is tasked solely with listening to customer feedback and using it to shape the product experience.

On the other hand, many criticize Google for not giving users what they want out of a search experience.

I asked Ramaswamy if he could give examples of specific customer feedback that helped shape Neeva into what it is today.

“There’s tons of feedback that comes to us. Sometimes we feel bad about not being about to take care of all of it,” he started. “But to give an example: We did a currency converter because, believe it or not, it was a top request. Initially, I did not understand this feedback. I was like, ‘Really? It’s that hard for you to click on a link and then type in your numbers and get your currency converted?”

Advertisement

“But then,” Ramaswamy said, “I realized a larger truth about how people think about the internet.”

“People Fear Clicking On Links”

Neeva was initially against going the Google route of delivering content directly in the SERPs, but has had to make some concessions.

Through listening to customer feedback, the Neeva team learned there’s a real apprehension toward clicking on links in search results.

“Clicking on a link has now become an adversarial task. People actually fear clicking on links because they don’t know what’s on the other side,” Ramaswamy said. “Is it going to be a pop-up? Is it going to tell you that your computer has a virus? Is it something else? That’s the reason why we put [a currency converter] right into the search engine. So that’s one example.”

Another perk offered to Neeva subscribers is access to a Slack channel where customers can engage in group discussions with developers.

“A lot of people said, ‘We want to be able to offer feedback to [improve] your search results,’” Ramaswamy said. “So we built a community feedback feature that’s released to some people; it’s not released to everybody.”

The way it works, he explained, is users “can say, ‘Hey, this result is not relevant.’ Or, ‘This result is the top result for this query.’”

“This list sort of goes on and on,” Ramaswamy said. “Customers are really a source of lots of ideas.”

Advertisement
Building a Better Search Engine: Lessons From Neeva’s CEOFrom left to right: Brittany Kaiser, Own Your Data; Sridhar Ramaswamy, Neeva; Ashley Gold, Axios.

Neeva Is A Customer-Guided Product

At Collision, Ramaswamy described what he eventually aims to accomplish with Neeva, and how it differs from the goals of larger search engines like Google.

After speaking with him, I asked if he could clarify what he meant by wanting to “let society figure out” what to do with Neeva.

“I spoke about it more in the spirit of: Google spends a billion, makes a hundred billion. My thing was more: We want to make a couple of billion and let society figure out what it wants to do with the service,” Ramaswamy explained. “It’s more of a general argument around not captive capitalism, but competitive capitalism.”

“The beautiful thing about technology is creating a product for 100 million people is not wildly different from creating a product for a billion people,” he continued. “That’s the magic of scale and technology.”

Being paid for by the people who use it gives Neeva unique flexibility regarding future growth.

Users don’t have as much influence over a product like Google Search, considering they typically don’t pay to use it.

Although even for a free product, Ramasamy argues that Google could be doing much more to give users value.

“My point was a customer-paid product makes it much easier for us to release the product to the whole world [and] still run a profitable company, but not at the kind of obscene scale that I see Facebook or Google operating,” he said. “People always say …  ‘Well, Google gives me free Gmail. Will they stop giving it?’ And my rough answer is: Well, I’m sure, with 100 billion dollars, a bunch of us are going to make really good decisions about how to use that money.”

Ramaswamy said that users “don’t need a monopolist to make that decision and decide they want to give you free Gmail. We don’t need charity from rich companies in order to do this; we need competition, so more of the money that is being spent on this comes to us.”

Advertisement
Building a Better Search Engine: Lessons From Neeva’s CEOFrom left to right: Brittany Kaiser, Own Your Data; Sridhar Ramaswamy, Neeva; Ashley Gold, Axios.

Will Neeva Keep Its Privacy Promises?

DuckDuckGo, another search engine that touts privacy as its key selling point, was recently a source of controversy after it was discovered to be passing along a minor amount of data to Microsoft.

That stemmed from the deal DuckDuckGo has to use Bing’s search index.

I asked Ramasamy what measures Neeva has in place to keep its zero data collection promises.

“Not serving ads is the biggest measure we have in place. And, we are building our own index,” he said, adding that the company is actively “writing down human ratings and getting data back.”

“We truly want to create a differentiated product,” Ramasamy emphasized. “We started with using the Bing API for search [but] in many ways, I think we would have been better off investing in search from day one. We are a product company, and we want to become a much better search engine. That’s the big differentiator.”

“We’re Making Foundational Investments In Search”

In addition to keeping Neeva ad-free, it will be able to maintain its zero-data promise by building its own search index.

DuckDuckGo, for example, ran into trouble because it’s wholly dependent on Microsoft for search results. Ramasamy says Neeva is the only company outside Google and Bing crawling and indexing the web.

That claim is backed up by an October 2020 report on digital competition by the House Judiciary Committee’s Subcommittee on Antitrust. The report states:

“The high cost of maintaining a fresh index, and the decision by many large webpages to block most crawlers, significantly limits new search engine entrants. Today, the only English-language search engines that maintain their own comprehensive webpage index are Google and Bing.”

He acknowledged that, in response, many ask, “What’s the big deal? What difference does it make?”

Advertisement

“It lets us do things like creating a much better shopping experience,” Ramasamy explained, noting that, for instance, Neeva “launched Reddit links in search results … because we work with Reddit to get their index. So we have an index of all the web pages they’re serving.”

Ramasamy said that users can receive better-quality results for such queries as, “What are the most interesting Reddit posts that correspond to this query?”

Neeva can “launch features like that, because we’re making foundational investments in search; pretty much the only company outside of Google and Microsoft to be doing this.”

“We increasingly use Bing as a fallback when we cannot answer queries,” Ramasamy acknowledged. However, he said, “Over time, our aspiration is to be able to do more and more of the search results ourselves.”

Neeva’s Sole Focus Is Traditional Web Content (For Now)

With people’s search behavior turning more toward short videos, I asked Ramaswamy if Neeva has any plans to index content like Web Stories or TikTok videos.

For now, Neeva’s sole focus is to solve search for text-based web content.

“Solving for search, especially things like spoken search, is enough of a large problem that we have not quite gone there,” Ramaswamy said. “We have working arrangements. We have partnerships with companies like Twitter and companies like Reddit to better surface their content.

Twitter, he pointed out, “Has a lot of real-time information. So we’re focused on things like that right now and less on video. That would be a fun project to do.”

Advertisement

Neeva’s Greatest Challenge Is Awareness

As we wrapped up our conversation, I asked Ramaswamy: What’s the most significant hurdle for Neeva to overcome on its journey toward mass adoption?

Ramaswamy’s answer: “It really is about competition.”

The product, he said, is not the issue.

“We have a great product. Compared to ad-supported options … the free Neeva search engine is infinitely better,” Ramaswamy explained. “The place where we struggle is getting the word out, getting people to know us as an option, and getting people to set us as the default search in Safari, which is impossible.

“Demand More Choice”

As Ramaswamy explained, there’s no incentive for a company like Google to innovate if it doesn’t have any challengers.

Companies tend to improve their products when faced with more robust competition. But the only way for more competitors to enter the search market is for consumers to demand more options.

“To me, this is the biggest ask that I would have,” Ramaswamy said, “is to demand more choice, because competition produces better products.”

In turn, he said, “That competition creates better products for us. An incumbent that is doing very well has no incentive to innovate [or] to disrupt.”

Advertisement

Conversely, over at Neeva, “We have nothing to lose,” Ramaswamy told me. “We’re going to swing for the fences [and make it] easier for people to switch, for them to try Neeva, for them to decide for themselves if they want it or not.”

What’s Next For Neeva?

Before parting ways, I had to ask what we could expect next from Neeva.

“There’s a lot I’ve learned from Google My Business in terms of local businesses – even in terms of Search Console – that I feel confident we can do better,” Ramaswamy said, adding that “GMB, as you know, is a real problem for lots of people. Especially agencies that want to update information for a bunch of companies that they work with.”

The hope, Ramaswamy said, is that “we’ll have better tools. But not yet.”

fbq('init', '1321385257908563');

fbq('track', 'PageView');

fbq('trackSingle', '1321385257908563', 'ViewContent', { content_name: 'building-a-better-search-engine-lessons-from-neevas-ceo', content_category: 'seo' }); }



Source link

Continue Reading

DON'T MISS ANY IMPORTANT NEWS!
Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address

Trending

en_USEnglish