SEO

How To Automate Ecommerce Category Page Creation With Python

Published

2 years ago

May 27, 2022

Clustering product inventory and automatically aligning SKUs to search demand is a great way to find opportunities to create new ecommerce categories.

Niche category pages are a proven way for ecommerce sites to align with organic search demand while simultaneously assisting users in purchasing.

If a site stocks a range of products and there is search demand, creating a dedicated landing page is an easy way to align with the demand.

But how can SEO professionals find this opportunity?

Sure, you can eyeball it, but you’ll usually leave a lot of opportunity on the table.

This problem motivated me to script something in Python, which I’m sharing today in a simple to use Streamlit application. (No coding experience required!)

The app linked above created the following output automatically using nothing more than two crawl exports!

Screenshot from Microsoft Excel, May 2022

Notice how the suggested categories are automatically tied back to the existing parent category?

Screenshot from Microsoft Excel, May 2022

The app even shows how many products are available to populate the category.

Screenshot from Microsoft Excel, May 2022

Benefits And Uses

Improve relevancy to high-demand, competitive queries by creating new landing pages.
Increase the chance of relevant site links displaying underneath the parent category.
Reduce CPCs to the landing page through increased relevancy.
Potential to inform merchandising decisions. (If there is high search demand vs. low product count – there is a potential to widen the range.0

Mock-up Screenshot from Google Chrome, May 2022

Creating the suggested subcategories for the parent sofa category would align the site to an additional 3,500 searches per month with relatively little effort.

Features

Create subcategory suggestions automatically.
Tie subcategories back to the parent category (cuts out a lot of guesswork!).
Match to a minimum of X products before recommending a category.
Check similarity to an existing category (X % fuzzy match) before recommending a new category.
Set minimum search volume/CPC cut-off for category suggestions.
Supports search volume and CPC data from multiple countries.

Getting Started/Prepping The Files

To use this app you need two things.

At a high level, the goal is to crawl the target website with two custom extractions.

The internal_html.csv report is exported, along with an inlinks.csv export.

These exports are then uploaded to the Streamlit app, where the opportunities are processed.

Crawl And Extraction Setup

When crawling the site, you’ll need to set two extractions in Screaming Frog – one to uniquely identify product pages and another to uniquely identify category pages.

The Streamlit app understands the difference between the two types of pages when making recommendations for new pages.

The trick is to find a unique element for each page type.

(For a product page, this is usually the price or the returns policy, and for a category page, it’s usually a filter sort element.)

Extracting The Unique Page Elements

Screaming Frog allows for custom extractions of content or code from a web page when crawled.

This section may be daunting if you are unfamiliar with custom extractions, but it’s essential for getting the correct data into the Streamlit app.

The goal is to end up with something looking like the below image.

(A unique extraction for product and category pages with no overlap.)

Screenshot from Screaming Frog SEO Spider, May 2022

The steps below walk you through manually extracting the price element for a product page.

Then, repeat for a category page afterward.

If you’re stuck or would like to read more about the web scraper tool in Screaming Frog, the official documentation is worth your time.

Manually Extracting Page Elements

Let’s start by extracting a unique element only found on a product page (usually the price).

Highlight the price element on the page with the mouse, right-click and choose Inspect.

Screenshot from Google Chrome, May 2022

This will open up the elements window with the correct HTML line already selected.

Right-click the pre-selected line and choose Copy > Copy selector. That’s it!

Screenshot from Google Chrome, May 2022

Open Screaming Frog and paste the copied selector into the custom extraction section. (Configuration > Custom > Extraction).

Screenshot from Screaming Frog SEO Spider, May 2022

Name the extractor as “product,” select the CSSPath drop down and choose Extract Text.

Repeat the process to extract a unique element from a category page. It should look like this once completed for both product and category pages.

Screenshot from Screaming Frog SEO Spider, May 2022

Finally, start the crawl.

The crawl should look like this when viewing the Custom Extraction tab.

Screenshot from Screaming Frog SEO Spider, May 2022

Notice how the extractions are unique to each page type? Perfect.

The script uses the extractor to identify the page type.

Internally the app will convert the extractor to tags.

(I mention this to stress that the extractors can be anything as long as they uniquely identify both page types.)

Screenshot from Microsoft Excel, May 2022

Exporting The Files

Once the crawl has been completed, the last step is to export two types of CSV files.

internal_html.csv.
inlinks to product pages.

Go to the Custom Extraction tab in Screaming Frog and highlight all URLs that have an extraction for products.

(You will need to sort the column to group it.)

Screenshot from Screaming Frog SEO Spider, May 2022

Lastly, right-click the product URLs, select Export, and then Inlinks.

Screenshot from Screaming Frog SEO Spider, May 2022

You should now have a file called inlinks.csv.

Finally, we just need to export the internal_html.csv file.

Click the Internal tab, select HTML from the dropdown menu below and click on the adjacent Export button.

Finally, choose the option to save the file as a .csv

Screenshot from Screaming Frog SEO Spider, May 2022

Congratulations! You are now ready to use the Streamlit app!

Using The Streamlit App

Using the Streamlit app is relatively simple.

The various options are set to reasonable defaults, but feel free to adjust the cut-offs to better suit your needs.

I would highly recommend using a Keywords Everywhere API key (although it is not strictly necessary as this can be looked up manually later with an existing tool if preferred.

(The script pre-qualifies opportunity by checking for search volume. If the key is missing, the final output will contain more irrelevant words.)

If you want to use a key, this is the section on the left to pay attention to.

Screenshot from Streamlit.io, May 2022

Once you have entered the API key and adjusted the cut-offs to your links, upload the inlinks.csv crawl.

Screenshot from Streamlit.io, May 2022

Once complete, a new prompt will appear adjacent to it, prompting you to upload the internal_html.csv crawl file.

Screenshot from Streamlit.io, May 2022

Finally, a new box will appear asking you to select the product and column names from the uploaded crawl file to be mapped correctly.

Screenshot from Streamlit.io, May 2022

Click submit and the script will run. Once complete, you will see the following screen and can download a handy .csv export.

Screenshot from Streamlit.io, May 2022

How The Script Works

Before we dive into the script’s output, it will help to explain what’s going on under the hood at a high level.

At a glance:

Generate thousands of keywords by generating n-grams from product page H1 headings.
Qualify keywords by checking whether the word is in an exact or fuzzy match in a product heading.
Further qualify keywords by checking for search volume using the Keywords Everywhere API (optional but recommended).
Check whether an existing category already exists using a fuzzy match (can find words out of order, different tenses, etc.).
Uses the inlinks report to assign suggestions to a parent category automatically.

N-gram Generation

The script creates hundreds of thousands of n-grams from the product page H1s, most of which are completely nonsensical.

In my example for this article, n-grams generated 48,307 words – so this will need to be filtered!

Screenshot from Microsoft Excel, May 2022

The first step in the filtering process is to check whether the keywords generated via n-grams are found at least X times within the product name column.

(This can be in an exact or fuzzy match.)

Anything not found is immediately discarded, which usually removes around 90% of the generated keywords.

The second filtering stage is to check whether the remaining keywords have search demand.

Any keywords without search demand are then discarded too.

(This is why I recommend using the Keywords Everywhere API when running the script, which results in a more refined output.)

It’s worth noting you can do this manually afterward by searching Semrush/Ahrefs etc., discarding any keywords without search volume, and running a VLOOKUP in Microsoft Excel.

Cheaper if you have an existing subscription.

Recommendations Tied To Specific Landing Pages

Once the keyword list has been filtered the script uses the inlinks report to tie the suggested subcategory back to the landing page.

Earlier versions did not do this, but I realized that leveraging the inlinks.csv report meant it was possible.

It really helps understand the context of the suggestion at a glance during QA.

This is the reason the script requires two exports to work correctly.

Limitations

Not checking search volumes will result in more results for QA. (Even if you don’t use the Keywords Everywhere API, I recommend shortlisting by filtering out 0 search volume afterward.)
Some irrelevant keywords will have search volume and appear in the final report, even if keyword volume has been checked.
Words will typically appear in the singular sense for the final output (because products are singular and categories are pluralized if they sell more than a single product). It’s easy enough to add an “s” to the end of the suggestion though.

User Configurable Variables

I’ve selected what I consider to be sensible default options.

But here is a run down if you’d like to tweak and experiment.

Minimum products to match to (exact match) – The minimum number of products that must exist before suggesting the new category in an exact match.
Minimum products to match to (fuzzy match) – The minimum number of products that must exist before suggesting the new category in a fuzzy match, (words can be found in any order).
Minimum similarity to an existing category – This checks whether a category already exists in a fuzzy match before making the recommendation. The closer to 100 = stricter matching.
Minimum CPC in $ – The minimum dollar amount of the suggested category keyword. (Requires the Keywords Everywhere API.)
Minimum search volume – The minimum search volume of the suggested category keyword. (Requires Keywords Everywhere API.)
Keywords Everywhere API key – Optional, but recommended. Used to pull in CPC/search volume data. (Useful for shortlisting categories.)
Set the country to pull search data from – Country-specific search data is available. (Default is the USA.)
Set the currency for CPC data – Country-specific CPC data is available. (Default USD.)
Keep the longest word suggestion – With similar word suggestions, this option will keep the longest match.
Enable fuzzy product matching – This will search for product names in a fuzzy match. (Words can be found out of order, recommended – but slow and CPU intensive.)

Conclusion

With a small amount of preparation, it is possible to tap into a large amount of organic opportunity while improving the user experience.

Although this script was created with an ecommerce focus, according to feedback, it works well for other site types such as job listing sites.

So even if your site isn’t an ecommerce site, it’s still worth a try.

Python enthusiast?

I released the source code for a non-Streamlit version here.

More resources:

Featured Image: patpitchaya/Shutterstock

!function(f,b,e,v,n,t,s)
{if(f.fbq)return;n=f.fbq=function(){n.callMethod?
n.callMethod.apply(n,arguments):n.queue.push(arguments)};
if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version=’2.0′;
n.queue=[];t=b.createElement(e);t.async=!0;
t.src=v;s=b.getElementsByTagName(e)[0];
s.parentNode.insertBefore(t,s)}(window,document,’script’,
‘https://connect.facebook.net/en_US/fbevents.js’);

if( typeof sopp !== “undefined” && sopp === ‘yes’ ){
fbq(‘dataProcessingOptions’, [‘LDU’], 1, 1000);
}else{
fbq(‘dataProcessingOptions’, []);
}

fbq(‘init’, ‘1321385257908563’);

fbq(‘track’, ‘PageView’);

fbq(‘trackSingle’, ‘1321385257908563’, ‘ViewContent’, {
content_name: ‘python-ecommerce-category-pages’,
content_category: ‘ecommerce technical-seo’
});

Source link

Related Topics:Automate Category Creation eCommerce Page python

Up Next

11 Tips For Promoting Your Local Business On Instagram

Don't Miss

DuckDuckGo’s Search Deal Stops Browser From Blocking Microsoft Trackers

Click to comment

You must be logged in to post a comment Login

SEO

Google Cautions On Blocking GoogleOther Bot

Published

9 hours ago

July 26, 2024

Max

Google cautions about blocking and opting out of getting crawled by the GoogleOther crawler

Google’s Gary Illyes answered a question about the non-search features that the GoogleOther crawler supports, then added a caution about the consequences of blocking GoogleOther.

What Is GoogleOther?

GoogleOther is a generic crawler created by Google for the various purposes that fall outside of those of bots that specialize for Search, Ads, Video, Images, News, Desktop and Mobile. It can be used by internal teams at Google for research and development in relation to various products.

The official description of GoogleOther is:

“GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”

Something that may be surprising is that there are actually three kinds of GoogleOther crawlers.

Three Kinds Of GoogleOther Crawlers

GoogleOther
Generic crawler for public URLs
GoogleOther-Image
Optimized to crawl public image URLs
GoogleOther-Video
Optimized to crawl public video URLs

All three GoogleOther crawlers can be used for research and development purposes. That’s just one purpose that Google publicly acknowledges that all three versions of GoogleOther could be used for.

What Non-Search Features Does GoogleOther Support?

Google doesn’t say what specific non-search features GoogleOther supports, probably because it doesn’t really “support” a specific feature. It exists for research and development crawling which could be in support of a new product or an improvement in a current product, it’s a highly open and generic purpose.

This is the question asked that Gary narrated:

“What non-search features does GoogleOther crawling support?”

Gary Illyes answered:

“This is a very topical question, and I think it is a very good question. Besides what’s in the public I don’t have more to share.

GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.

Historically Googlebot was used for this, but that kind of makes things murky and less transparent, so we launched GoogleOther so you have better controls over what your site is crawled for.

That said GoogleOther is not tied to a single product, so opting out of GoogleOther crawling might affect a wide range of things across the Google universe; alas, not Search, search is only Googlebot.”

It Might Affect A Wide Range Of Things

Gary is clear that blocking GoogleOther wouldn’t have an affect on Google Search because Googlebot is the crawler used for indexing content. So if blocking any of the three versions of GoogleOther is something a site owner wants to do, then it should be okay to do that without a negative effect on search rankings.

But Gary also cautioned about the outcome that blocking GoogleOther, saying that it would have an effect on other products and services across Google. He didn’t state which other products it could affect nor did he elaborate on the pros or cons of blocking GoogleOther.

Pros And Cons Of Blocking GoogleOther

Whether or not to block GoogleOther doesn’t necessarily have a straightforward answer. There are several considerations to whether doing that makes sense.

Pros

Inclusion in research for a future Google product that’s related to search (maps, shopping, images, a new feature in search) could be useful. It might be helpful to have a site included in that kind of research because it might be used for testing something good for a site and be one of the few sites chosen to test a feature that could increase earnings for a site.

Another consideration is that blocking GoogleOther to save on server resources is not necessarily a valid reason because GoogleOther doesn’t seem to crawl so often that it makes a noticeable impact.

If blocking Google from using site content for AI is a concern then blocking GoogleOther will have no impact on that at all. GoogleOther has nothing to do with crawling for Google Gemini apps or Vertex AI, including any future products that will be used for training associated language models. The bot for that specific use case is Google-Extended.

Cons

On the other hand it might not be helpful to allow GoogleOther if it’s being used to test something related to fighting spam and there’s something the site has to hide.

It’s possible that a site owner might not want to participate if GoogleOther comes crawling for market research or for training machine learning models (for internal purposes) that are unrelated to public-facing products like Gemini and Vertex.

Allowing GoogleOther to crawl a site for unknown purposes is like giving Google a blank check to use your site data in any way they see fit outside of training public-facing LLMs or purposes related to named bots like GoogleBot.

Takeaway

Should you block GoogleOther? It’s a coin toss. There are possible potential benefits but in general there isn’t enough information to make an informed decision.

Listen to the Google SEO Office Hours podcast at the 1:30 minute mark:

Featured Image by Shutterstock/Cast Of Thousands

Source link

SEO

AI Search Boosts User Satisfaction

Published

1 day ago

July 26, 2024

Max

AI chat robot on search engine bar. Artificial intelligence bot innovation technology answer question with smart solution. 3D vector created from graphic software.

A new study finds that despite concerns about AI in online services, users are more satisfied with search engines and social media platforms than before.

The American Customer Satisfaction Index (ACSI) conducted its annual survey of search and social media users, finding that satisfaction has either held steady or improved.

This comes at a time when major tech companies are heavily investing in AI to enhance their services.

Search Engine Satisfaction Holds Strong

Google, Bing, and other search engines have rapidly integrated AI features into their platforms over the past year. While critics have raised concerns about potential negative impacts, the ACSI study suggests users are responding positively.

Google maintains its position as the most satisfying search engine with an ACSI score of 81, up 1% from last year. Users particularly appreciate its AI-powered features.

Interestingly, Bing and Yahoo! have seen notable improvements in user satisfaction, notching 3% gains to reach scores of 77 and 76, respectively. These are their highest ACSI scores in over a decade, likely due to their AI enhancements launched in 2023.

The study hints at the potential of new AI-enabled search functionality to drive further improvements in the customer experience. Bing has seen its market share improve by small but notable margins, rising from 6.35% in the first quarter of 2023 to 7.87% in Q1 2024.

Customer Experience Improvements

The ACSI study shows improvements across nearly all benchmarks of the customer experience for search engines. Notable areas of improvement include:

Ease of navigation
Ease of using the site on different devices
Loading speed performance and reliability
Variety of services and information
Freshness of content

These improvements suggest that AI enhancements positively impact various aspects of the search experience.

Social Media Sees Modest Gains

For the third year in a row, user satisfaction with social media platforms is on the rise, increasing 1% to an ACSI score of 74.

TikTok has emerged as the new industry leader among major sites, edging past YouTube with a score of 78. This underscores the platform’s effective use of AI-driven content recommendations.

Meta’s Facebook and Instagram have also seen significant improvements in user satisfaction, showing 3-point gains. While Facebook remains near the bottom of the industry at 69, Instagram’s score of 76 puts it within striking distance of the leaders.

Challenges Remain

Despite improvements, the study highlights ongoing privacy and advertising challenges for search engines and social media platforms. Privacy ratings for search engines remain relatively low but steady at 79, while social media platforms score even lower at 73.

Advertising experiences emerge as a key differentiator between higher- and lower-satisfaction brands, particularly in social media. New ACSI benchmarks reveal user concerns about advertising content’s trustworthiness and personal relevance.

Why This Matters For SEO Professionals

This study provides an independent perspective on how users are responding to the AI push in online services. For SEO professionals, these findings suggest that:

AI-enhanced search features resonate with users, potentially changing search behavior and expectations.
The improving satisfaction with alternative search engines like Bing may lead to a more diverse search landscape.
The continued importance of factors like content freshness and site performance in user satisfaction aligns with long-standing SEO best practices.

As AI becomes more integrated into our online experiences, SEO strategies may need to adapt to changing user preferences.

Featured Image: kate3155/Shutterstock

Source link

SEO

Google To Upgrade All Retailers To New Merchant Center By September

Published

2 days ago

July 25, 2024

Max

Google To Upgrade All Retailers To New Merchant Center By September

Google has announced plans to transition all retailers to its updated Merchant Center platform by September.

This move will affect e-commerce businesses globally and comes ahead of the holiday shopping season.

The Merchant Center is a tool for online retailers to manage how their products appear across Google’s shopping services.

Key Changes & Features

The new Merchant Center includes several significant updates.

Product Studio

An AI-powered tool for content creation. Google reports that 80% of current users view it as improving efficiency.

This feature allows retailers to generate tailored product assets, animate still images, and modify existing product images to match brand aesthetics.

It also simplifies tasks like background removal and image resolution enhancement.

Centralized Analytics

A new tab consolidating various business insights, including pricing data and competitive analysis tools.

Retailers can access pricing recommendations, competitive visibility reports, and retail-specific search trends, enabling them to make data-driven decisions and capitalize on popular product categories.

Redesigned Navigation

Google claims the new interface is more intuitive and cites increased setup success rates for new merchants.

The platform now offers simplified website verification processes and can pre-populate product information during setup.

Initial User Response

According to Google, early adopters have shown increased engagement with the platform.

The company reports a 25% increase in omnichannel merchants adding product offers in the new system. However, these figures have yet to be independently verified.

Jeff Harrell, Google’s Senior Director of Merchant Shopping, states in an announcement:

“We’ve seen a significant increase in retention and engagement among existing online merchants who have moved to the new Merchant Center.”

Potential Challenges and Support

While Google emphasizes the upgrade’s benefits, some retailers, particularly those comfortable with the current version, may face challenges adapting to the new system.

The upgrade’s mandatory nature could raise concerns among users who prefer the existing interface or have integrated workflows based on the current system.

To address these concerns, Google has stated that it will provide resources and support to help with the transition. This includes tutorial videos, detailed documentation, and access to customer support teams for troubleshooting.

Industry Context

This update comes as e-commerce platforms evolve, with major players like Amazon and Shopify enhancing their seller tools. Google’s move is part of broader efforts to maintain competitiveness in the e-commerce services sector.

The upgrade could impact consumers by improving product listings and providing more accurate information across Google’s shopping services.

For the e-commerce industry as a whole, it signals a continued push towards AI-driven tools and data-centric decision-making.