Connect with us

SEO

How To Do A Sitemap Audit For Better Indexing & Crawling Via Python

Published

on

How To Do A Sitemap Audit For Better Indexing & Crawling Via Python

Sitemap auditing involves syntax, crawlability, and indexation checks for the URLs and tags in your sitemap files.

A sitemap file contains the URLs to index with further information regarding the last modification date, priority of the URL, images, videos on the URL, and other language alternates of the URL, along with the change frequency.

Sitemap index files can involve millions of URLs, even if a single sitemap can only involve 50,000 URLs at the top.

Auditing these URLs for better indexation and crawling might take time.

But with the help of Python and SEO automation, it is possible to audit millions of URLs within the sitemaps.

Advertisement

What Do You Need To Perform A Sitemap Audit With Python?

To understand the Python Sitemap Audit process, you’ll need:

  • A fundamental understanding of technical SEO and sitemap XML files.
  • Working knowledge of Python and sitemap XML syntax.
  • The ability to work with Python Libraries, Pandas, Advertools, LXML, Requests, and XPath Selectors.

Which URLs Should Be In The Sitemap?

A healthy sitemap XML sitemap file should include the following criteria:

  • All URLs should have a 200 Status Code.
  • All URLs should be self-canonical.
  • URLs should be open to being indexed and crawled.
  • URLs shouldn’t be duplicated.
  • URLs shouldn’t be soft 404s.
  • The sitemap should have a proper XML syntax.
  • The URLs in the sitemap should have an aligning canonical with Open Graph and Twitter Card URLs.
  • The sitemap should have less than 50.000 URLs and a 50 MB size.

What Are The Benefits Of A Healthy XML Sitemap File?

Smaller sitemaps are better than larger sitemaps for faster indexation. This is particularly important in News SEO, as smaller sitemaps help for increasing the overall valid indexed URL count.

Differentiate frequently updated and static content URLs from each other to provide a better crawling distribution among the URLs.

Using the “lastmod” date in an honest way that aligns with the actual publication or update date helps a search engine to trust the date of the latest publication.

While performing the Sitemap Audit for better indexing, crawling, and search engine communication with Python, the criteria above are followed.

An Important Note…

When it comes to a sitemap’s nature and audit, Google and Microsoft Bing don’t use “changefreq” for changing frequency of the URLs and “priority” to understand the prominence of a URL. In fact, they call it a “bag of noise.”

However, Yandex and Baidu use all these tags to understand the website’s characteristics.

A 16-Step Sitemap Audit For SEO With Python

A sitemap audit can involve content categorization, site-tree, or topicality and content characteristics.

Advertisement

However, a sitemap audit for better indexing and crawlability mainly involves technical SEO rather than content characteristics.

In this step-by-step sitemap audit process, we’ll use Python to tackle the technical aspects of sitemap auditing millions of URLs.

Image created by the author, February 2022

1. Import The Python Libraries For Your Sitemap Audit

The following code block is to import the necessary Python Libraries for the Sitemap XML File audit.

import advertools as adv

import pandas as pd

from lxml import etree

from IPython.core.display import display, HTML

display(HTML("<style>.container { width:100% !important; }</style>"))

Here’s what you need to know about this code block:

  • Advertools is necessary for taking the URLs from the sitemap file and making a request for taking their content or the response status codes.
  • “Pandas” is necessary for aggregating and manipulating the data.
  • Plotly is necessary for the visualization of the sitemap audit output.
  • LXML is necessary for the syntax audit of the sitemap XML file.
  • IPython is optional to expand the output cells of Jupyter Notebook to 100% width.

2. Take All Of The URLs From The Sitemap

Millions of URLs can be taken into a Pandas data frame with Advertools, as shown below.

sitemap_url = "https://www.complaintsboard.com/sitemap.xml"
sitemap = adv.sitemap_to_df(sitemap_url)
sitemap.to_csv("sitemap.csv")
sitemap_df = pd.read_csv("sitemap.csv", index_col=False)
sitemap_df.drop(columns=["Unnamed: 0"], inplace=True)
sitemap_df

Above, the Complaintsboard.com sitemap has been taken into a Pandas data frame, and you can see the output below.

Sitemap URL ExtractionA General Sitemap URL Extraction with Sitemap Tags with Python is above.

In total, we have 245,691 URLs in the sitemap index file of Complaintsboard.com.

The website uses “changefreq,” “lastmod,” and “priority” with an inconsistency.

3. Check Tag Usage Within The Sitemap XML File

To understand which tags are used or not within the Sitemap XML file, use the function below.

Advertisement
def check_sitemap_tag_usage(sitemap):
     lastmod = sitemap["lastmod"].isna().value_counts()
     priority = sitemap["priority"].isna().value_counts()
     changefreq = sitemap["changefreq"].isna().value_counts()
     lastmod_perc = sitemap["lastmod"].isna().value_counts(normalize = True) * 100
     priority_perc = sitemap["priority"].isna().value_counts(normalize = True) * 100
     changefreq_perc = sitemap["changefreq"].isna().value_counts(normalize = True) * 100
     sitemap_tag_usage_df = pd.DataFrame(data={"lastmod":lastmod,
     "priority":priority,
     "changefreq":changefreq,
     "lastmod_perc": lastmod_perc,
     "priority_perc": priority_perc,
     "changefreq_perc": changefreq_perc})
     return sitemap_tag_usage_df.astype(int)

The function check_sitemap_tag_usage is a data frame constructor based on the usage of the sitemap tags.

It takes the “lastmod,” “priority,” and “changefreq” columns by implementing “isna()” and “value_counts()” methods via “pd.DataFrame”.

Below, you can see the output.

Sitemap Tag AuditSitemap Audit with Python for sitemap tags’ usage.

The data frame above shows that 96,840 of the URLs do not have the Lastmod tag, which is equal to 39% of the total URL count of the sitemap file.

The same usage percentage is 19% for the “priority” and the “changefreq” within the sitemap XML file.

There are three main content freshness signals from a website.

These are dates from a web page (visible to the user), structured data (invisible to the user), “lastmod” in the sitemap.

Advertisement

If these dates are not consistent with each other, search engines can ignore the dates on the websites to see their freshness signals.

4. Audit The Site-tree And URL Structure Of The Website

Understanding the most important or crowded URL Path is necessary to weigh the website’s SEO efforts or technical SEO Audits.

A single improvement for Technical SEO can benefit thousands of URLs simultaneously, which creates a cost-effective and budget-friendly SEO strategy.

URL Structure Understanding mainly focuses on the website’s more prominent sections and content network analysis understanding.

To create a URL Tree Dataframe from a website’s URLs from the sitemap, use the following code block.

sitemap_url_df = adv.url_to_df(sitemap_df["loc"])
sitemap_url_df

With the help of “urllib” or the “advertools” as above, you can easily parse the URLs within the sitemap into a data frame.

Advertisement
Python sitemap auditCreating a URL Tree with URLLib or Advertools is easy.
Checking the URL breakdowns helps to understand the overall information tree of a website.

The data frame above contains the “scheme,” “netloc,” “path,” and every “/” breakdown within the URLs as a “dir” which represents the directory.

Auditing the URL structure of the website is prominent for two objectives.

These are checking whether all URLs have “HTTPS” and understanding the content network of the website.

Content analysis with sitemap files is not the topic of the “Indexing and Crawling” directly, thus at the end of the article, we will talk about it slightly.

Check the next section to see the SSL Usage on Sitemap URLs.

5. Check The HTTPS Usage On The URLs Within Sitemap

Use the following code block to check the HTTP Usage ratio for the URLs within the Sitemap.

sitemap_url_df["scheme"].value_counts().to_frame()

The code block above uses a simple data filtration for the “scheme” column which contains the URLs’ HTTPS Protocol information.

Advertisement

using the “value_counts” we see that all URLs are on the HTTPS.

Python https scheme columnChecking the HTTP URLs from the Sitemaps can help to find bigger URL Property consistency errors.

6. Check The Robots.txt Disallow Commands For Crawlability

The structure of URLs within the sitemap is beneficial to see whether there is a situation for “submitted but disallowed”.

To see whether there is a robots.txt file of the website, use the code block below.

import requests
r = requests.get("https://www.complaintsboard.com/robots.txt")
R.status_code
200

Simply, we send a “get request” to the robots.txt URL.

If the response status code is 200, it means there is a robots.txt file for the user-agent-based crawling control.

After checking the “robots.txt” existence, we can use the “adv.robotstxt_test” method for bulk robots.txt audit for crawlability of the URLs in the sitemap.

sitemap_df_robotstxt_check = adv.robotstxt_test("https://www.complaintsboard.com/robots.txt", urls=sitemap_df["loc"], user_agents=["*"])
sitemap_df_robotstxt_check["can_fetch"].value_counts()

We have created a new variable called “sitemap_df_robotstxt_check”, and assigned the output of the “robotstxt_test” method.

Advertisement

We have used the URLs within the sitemap with the “sitemap_df[“loc”]”.

We have performed the audit for all of the user-agents via the “user_agents = [“*”]” parameter and value pair.

You can see the result below.

True     245690
False         1
Name: can_fetch, dtype: int64

It shows that there is one URL that is disallowed but submitted.

We can filter the specific URL as below.

pd.set_option("display.max_colwidth",255)
sitemap_df_robotstxt_check[sitemap_df_robotstxt_check["can_fetch"] == False]

We have used “set_option” to expand all of the values within the “url_path” section.

Advertisement
Python Sitemap Audit Robots TXT CheckA URL appears as disallowed but submitted via a sitemap as in Google Search Console Coverage Reports.
We see that a “profile” page has been disallowed and submitted.

Later, the same control can be done for further examinations such as “disallowed but internally linked”.

But, to do that, we need to crawl at least 3 million URLs from ComplaintsBoard.com, and it can be an entirely new guide.

Some website URLs do not have a proper “directory hierarchy”, which can make the analysis of the URLs, in terms of content network characteristics, harder.

Complaintsboard.com doesn’t use a proper URL structure and taxonomy, so analyzing the website structure is not easy for an SEO or Search Engine.

But the most used words within the URLs or the content update frequency can signal which topic the company actually weighs on.

Since we focus on “technical aspects” in this tutorial, you can read the Sitemap Content Audit here.

7. Check The Status Code Of The Sitemap URLs With Python

Every URL within the sitemap has to have a 200 Status Code.

Advertisement

A crawl has to be performed to check the status codes of the URLs within the sitemap.

But, since it’s costly when you have millions of URLs to audit, we can simply use a new crawling method from Advertools.

Without taking the response body, we can crawl just the response headers of the URLs within the sitemap.

It is useful to decrease the crawl time for auditing possible robots, indexing, and canonical signals from the response headers.

To perform a response header crawl, use the “adv.crawl_headers” method.

adv.crawl_headers(sitemap_df["loc"], output_file="sitemap_df_header.jl")
df_headers = pd.read_json("sitemap_df_header.jl", lines=True)
df_headers["status"].value_counts()

The explanation of the code block for checking the URLs’ status codes within the Sitemap XML Files for the Technical SEO aspect can be seen below.

Advertisement
200    207866
404        23
Name: status, dtype: int64

It shows that the 23 URL from the sitemap is actually 404.

And, they should be removed from the sitemap.

To audit which URLs from the sitemap are 404, use the filtration method below from Pandas.

df_headers[df_headers["status"] == 404]

The result can be seen below.

Python Sitemap Audit for URL Status CodeFinding the 404 URLs from Sitemaps is helpful against Link Rot.

8. Check The Canonicalization From Response Headers

From time to time, using canonicalization hints on the response headers is beneficial for crawling and indexing signal consolidation.

In this context, the canonical tag on the HTML and the response header has to be the same.

If there are two different canonicalization signals on a web page, the search engines can ignore both assignments.

Advertisement

For ComplaintsBoard.com, we don’t have a canonical response header.

  • The first step is auditing whether the response header for canonical usage exists.
  • The second step is comparing the response header canonical value to the HTML canonical value if it exists.
  • The third step is checking whether the canonical values are self-referential.

Check the columns of the output of the header crawl to check the Canonicalization from Response Headers.

df_headers.columns

Below, you can see the columns.

Python Sitemap URL Response Header AuditPython SEO Crawl Output Data Frame columns. “dataframe.columns” method is always useful to check.

If you are not familiar with the response headers, you may not know how to use canonical hints within response headers.

A response header can include the canonical hint with the “Link” value.

It is registered as “resp_headers_link” by the Advertools directly.

Another problem is that the extracted strings appear within the “<URL>;” string pattern.

It means we will use regex to extract it.

Advertisement
df_headers["resp_headers_link"]

You can see the result below.

Sitemap URL Response HeaderScreenshot from Pandas, February 2022

The regex pattern “[^<>][a-z:/0-9-.]*” is good enough to extract the specific canonical value.

A self-canonicalization check with the response headers is below.

df_headers["response_header_canonical"] = df_headers["resp_headers_link"].str.extract(r"([^<>][a-z:/0-9-.]*)")
(df_headers["response_header_canonical"] == df_headers["url"]).value_counts()

We have used two different boolean checks.

One to check whether the response header canonical hint is equal to the URL itself.

Another to see whether the status code is 200.

Since we have 404 URLs within the sitemap, their canonical value will be “NaN”.

Advertisement
Non-canonical URL in Sitemap Audit with PythonIt shows there are specific URLs with canonicalization inconsistencies.
We have 29 outliers for Technical SEO. Every wrong signal given to the search engine for indexation or ranking will cause the dilution of the ranking signals.

To see these URLs, use the code block below.

Response Header Python SEO AuditScreenshot from Pandas, February 2022.

The Canonical Values from the Response Headers can be seen above.

df_headers[(df_headers["response_header_canonical"] != df_headers["url"]) & (df_headers["status"] == 200)]

Even a single “/” in the URL can cause canonicalization conflict as appears here for the homepage.

Canonical Response Header CheckComplaintsBoard.com Screenshot for checking the Response Header Canonical Value and the Actual URL of the web page.
You can check the canonical conflict here.

If you check log files, you will see that the search engine crawls the URLs from the “Link” response headers.

Thus in technical SEO, this should be weighted.

9. Check The Indexing And Crawling Commands From Response Headers

There are 14 different X-Robots-Tag specifications for the Google search engine crawler.

The latest one is “indexifembedded” to determine the indexation amount on a web page.

The Indexing and Crawling directives can be in the form of a response header or the HTML meta tag.

This section focuses on the response header version of indexing and crawling directives.

Advertisement
  • The first step is checking whether the X-Robots-Tag property and values exist within the HTTP Header or not.
  • The second step is auditing whether it aligns itself with the HTML Meta Tag properties and values if they exist.

Use the command below yo check the X-Robots-Tag” from the response headers.

def robots_tag_checker(dataframe:pd.DataFrame):
     for i in df_headers:
          if i.__contains__("robots"):
               return i
          else:
               return "There is no robots tag"
robots_tag_checker(df_headers)
OUTPUT>>>
'There is no robots tag'

We have created a custom function to check the “X-Robots-tag” response headers from the web pages’ source code.

It appears that our test subject website doesn’t use the X-Robots-Tag.

If there would be an X-Robots-tag, the code block below should be used.

df_headers["response_header_x_robots_tag"].value_counts()
df_headers[df_headers["response_header_x_robots_tag"] == "noindex"]

Check whether there is a “noindex” directive from the response headers, and filter the URLs with this indexation conflict.

In the Google Search Console Coverage Report, those appear as “Submitted marked as noindex”.

Contradicting indexing and canonicalization hints and signals might make a search engine ignore all of the signals while making the search algorithms trust less to the user-declared signals.

Advertisement

10. Check The Self Canonicalization Of Sitemap URLs

Every URL in the sitemap XML files should give a self-canonicalization hint.

Sitemaps should only include the canonical versions of the URLs.

The Python code block in this section is to understand whether the sitemap URLs have self-canonicalization values or not.

To check the canonicalization from the HTML Documents’ “<head>” section, crawl the websites by taking their response body.

Use the code block below.

user_agent = "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The difference between “crawl_headers” and the “crawl” is that “crawl” takes the entire response body while the “crawl_headers” is only for response headers.

Advertisement
adv.crawl(sitemap_df["loc"],

output_file="sitemap_crawl_complaintsboard.jl",

follow_links=False,

custom_settings={"LOG_FILE":"sitemap_crawl_complaintsboard.log", “USER_AGENT”:user_agent})

You can check the file size differences from crawl logs to response header crawl and entire response body crawl.

SEO Crawl PythonPython Crawl Output Size Comparison.

From 6GB output to the 387 MB output is quite economical.

If a search engine just wants to see certain response headers and the status code, creating information on the headers would make their crawl hits more economical.

How To Deal With Large DataFrames For Reading And Aggregating Data?

This section requires dealing with the large data frames.

A computer can’t read a Pandas DataFrame from a CSV or JL file if the file size is larger than the computer’s RAM.

Thus, the “chunking” method is used.

When a website sitemap XML File contains millions of URLs, the total crawl output will be larger than tens of gigabytes.

Advertisement

An iteration across sitemap crawl output data frame rows is necessary.

For chunking, use the code block below.

df_iterator = pd.read_json(

    'sitemap_crawl_complaintsboard.jl',

    chunksize=10000,

     lines=True)
for i, df_chunk in enumerate(df_iterator):

    output_df = pd.DataFrame(data={"url":df_chunk["url"],"canonical":df_chunk["canonical"], "self_canonicalised":df_chunk["url"] == df_chunk["canonical"]})
    mode="w" if i == 0 else 'a'

    header = i == 0

    output_df.to_csv(

        "canonical_check.csv",

        index=False,

        header=header,

        mode=mode

       )

df[((df["url"] != df["canonical"]) == True) & (df["self_canonicalised"] == False) & (df["canonical"].isna() != True)]

You can see the result below.

Python SEO AuditPython SEO Canonicalization Audit.

We see that the paginated URLs from the “book” subfolder give canonical hints to the first page, which is a non-correct practice according to the Google guidelines.

11. Check The Sitemap Sizes Within Sitemap Index Files

Every Sitemap File should be less than 50 MB. Use the Python code block below in the Technical SEO with Python context to check the sitemap file size.

pd.pivot_table(sitemap_df[sitemap_df["loc"].duplicated()==True], index="sitemap")

You can see the result below.

Python SEO sitemap sizingPython SEO Sitemap Size Audit.

We see that all sitemap XML files are under 50MB.

For better and faster indexation, keeping the sitemap URLs valuable and unique while decreasing the size of the sitemap files is beneficial.

Advertisement

12. Check The URL Count Per Sitemap With Python

Every URL within the sitemaps should have fewer than 50.000 URLs.

Use the Python code block below to check the URL Counts within the sitemap XML files.

(pd.pivot_table(sitemap_df,

values=["loc"],

index="sitemap",

aggfunc="count")

.sort_values(by="loc", ascending=False))

You can see the result below.

Sitemap URL Count CheckPython SEO Sitemap URL Count Audit.
All sitemaps have less than 50.000 URLs. Some sitemaps have only one URL, which wastes the search engine’s attention.

Keeping sitemap URLs that are frequently updated different from the static and stale content URLs is beneficial.

URL Count and URL Content character differences help a search engine to adjust crawl demand effectively for different website sections.

13. Check The Indexing And Crawling Meta Tags From URLs’ Content With Python

Even if a web page is not disallowed from robots.txt, it can still be disallowed from the HTML Meta Tags.

Thus, checking the HTML Meta Tags for better indexation and crawling is necessary.

Advertisement

Using the “custom selectors” is necessary to perform the HTML Meta Tag audit for the sitemap URLs.

sitemap = adv.sitemap_to_df("https://www.holisticseo.digital/sitemap.xml")

adv.crawl(url_list=sitemap["loc"][:1000], output_file="meta_command_audit.jl",

follow_links=False,

xpath_selectors= {"meta_command": "//meta[@name="robots"]/@content"},

custom_settings={"CLOSESPIDER_PAGECOUNT":1000})

df_meta_check = pd.read_json("meta_command_audit.jl", lines=True)

df_meta_check["meta_command"].str.contains("nofollow|noindex", regex=True).value_counts()

The “//meta[@name=”robots”]/@content” XPATH selector is to extract all the robots commands from the URLs from the sitemap.

We have used only the first 1000 URLs in the sitemap.

And, I stop crawling after the initial 1000 responses.

I have used another website to check the Crawling Meta Tags since ComplaintsBoard.com doesn’t have it on the source code.

You can see the result below.

Advertisement
URL Indexing Audit from Sitemap PythonPython SEO Meta Robots Audit.
None of the URLs from the sitemap have “nofollow” or “noindex” within the “Robots” commands.

To check their values, use the code below.

df_meta_check[df_meta_check["meta_command"].str.contains("nofollow|noindex", regex=True) == False][["url", "meta_command"]]

You can see the result below.

Meta Tag Audit from the WebsitesMeta Tag Audit from the Websites.

14. Validate The Sitemap XML File Syntax With Python

Sitemap XML File Syntax validation is necessary to validate the integration of the sitemap file with the search engine’s perception.

Even if there are certain syntax errors, a search engine can recognize the sitemap file during the XML Normalization.

But, every syntax error can decrease the efficiency for certain levels.

Use the code block below to validate the Sitemap XML File Syntax.

def validate_sitemap_syntax(xml_path: str, xsd_path: str)
    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)
    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)
    return result
validate_sitemap_syntax("sej_sitemap.xml", "sitemap.xsd")

For this example, I have used “https://www.searchenginejournal.com/sitemap_index.xml”. The XSD file involves the XML file’s context and tree structure.

It is stated in the first line of the Sitemap file as below.

Advertisement

For further information, you can also check DTD documentation.

15. Check The Open Graph URL And Canonical URL Matching

It is not a secret that search engines also use the Open Graph and RSS Feed URLs from the source code for further canonicalization and exploration.

The Open Graph URLs should be the same as the canonical URL submission.

From time to time, even in Google Discover, Google chooses to use the image from the Open Graph.

To check the Open Graph URL and Canonical URL consistency, use the code block below.

for i, df_chunk in enumerate(df_iterator):

    if "og:url" in df_chunk.columns:

        output_df = pd.DataFrame(data={

        "canonical":df_chunk["canonical"],

        "og:url":df_chunk["og:url"],

        "open_graph_canonical_consistency":df_chunk["canonical"] == df_chunk["og:url"]})

        mode="w" if i == 0 else 'a'

        header = i == 0

        output_df.to_csv(

            "open_graph_canonical_consistency.csv",

            index=False,

            header=header,

            mode=mode

        )
    else:

        print("There is no Open Graph URL Property")
There is no Open Graph URL Property

If there is an Open Graph URL Property on the website, it will give a CSV file to check whether the canonical URL and the Open Graph URL are the same or not.

Advertisement

But for this website, we don’t have an Open Graph URL.

Thus, I have used another website for the audit.

if "og:url" in df_meta_check.columns:

     output_df = pd.DataFrame(data={

     "canonical":df_meta_check["canonical"],

     "og:url":df_meta_check["og:url"],

     "open_graph_canonical_consistency":df_meta_check["canonical"] == df_meta_check["og:url"]})

     mode="w" if i == 0 else 'a'

     #header = i == 0

     output_df.to_csv(

            "df_og_url_canonical_audit.csv",

            index=False,

            #header=header,

            mode=mode
     )

else:

     print("There is no Open Graph URL Property")

df = pd.read_csv("df_og_url_canonical_audit.csv")

df

You can see the result below.

Sitemap Open Graph Audit with PythonPython SEO Open Graph URL Audit.

We see that all canonical URLs and the Open Graph URLs are the same.

Python Audit with CanonicalizationPython SEO Canonicalization Audit.

16. Check The Duplicate URLs Within Sitemap Submissions

A sitemap index file shouldn’t have duplicated URLs across different sitemap files or within the same sitemap XML file.

The duplication of the URLs within the sitemap files can make a search engine download the sitemap files less since a certain percentage of the sitemap file is bloated with unnecessary submissions.

For certain situations, it can appear as a spamming attempt to control the crawling schemes of the search engine crawlers.

use the code block below to check the duplicate URLs within the sitemap submissions.

Advertisement
sitemap_df["loc"].duplicated().value_counts()

You can see that the 49574 URLs from the sitemap are duplicated.

Python SEO Duplicated URL in SitemapPython SEO Duplicated URL Audit from the Sitemap XML Files

To see which sitemaps have more duplicated URLs, use the code block below.

pd.pivot_table(sitemap_df[sitemap_df["loc"].duplicated()==True], index="sitemap", values="loc", aggfunc="count").sort_values(by="loc", ascending=False)

You can see the result.

Python SEO Sitemap AuditPython SEO Sitemap Audit for duplicated URLs.

Chunking the sitemaps can help with site-tree and technical SEO analysis.

To see the duplicated URLs within the Sitemap, use the code block below.

sitemap_df[sitemap_df["loc"].duplicated() == True]

You can see the result below.

Duplicated Sitemap URLDuplicated Sitemap URL Audit Output.

Conclusion

I wanted to show how to validate a sitemap file for better and healthier indexation and crawling for Technical SEO.

Python is vastly used for data science, machine learning, and natural language processing.

But, you can also use it for Technical SEO Audits to support the other SEO Verticals with a Holistic SEO Approach.

Advertisement

In a future article, we can expand these Technical SEO Audits further with different details and methods.

But, in general, this is one of the most comprehensive Technical SEO guides for Sitemaps and Sitemap Audit Tutorial with Python.

More resources: 


Featured Image: elenasavchina2/Shutterstock




Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

The 9 Best Landing Page Builders For 2024

Published

on

By

The 9 Best Landing Page Builders For 2024

Generating leads is crucial to boosting your sales – but if your landing pages aren’t effective, you’re going to struggle to turn visitors into customers.

Landing pages play a key role in elevating the effectiveness of your marketing efforts and differentiating, refining, and enhancing the user experience.

In this article, we’ll explore how to choose the right landing page builder for your business before highlighting the nine top landing page builders and what makes them stand out.

Let’s get into it.

Choosing The Right Landing Page Builder For Your Business

With so many options to choose from, how can you decide which landing page builder is right for you?

Advertisement

Here are a few things to consider when making your decision.

  • Marketing objectives: Start by deciding what you want to achieve with your landing pages. Are you looking to collect email subscribers, drive sales, generate leads, or promote an event? Each platform caters to different use cases, so leading with your goals can help you refine your search.
  • Ease of use: Different builders have different learning curves. If you’re a novice, you should probably opt for a tool that caters to beginners. If you have more technical experience, like coding knowledge, you might want to look to more advanced builders to get the most power.
  • Integration needs: Do you want your landing pages to integrate with other software or tools you’re already using?
  • Mobile optimization: It’s essential that your landing pages cater to mobile users, so make sure your builder considers that with features like responsive design.
  • Budget constraints: Unfortunately, budget matters. Landing page builders come with various price tags depending on their capabilities and features. Make sure you’re working within the boundaries of what you can afford.

While this is, by no means, an exhaustive list of considerations, it’s a starting point to help you choose a landing page builder that makes sense for your business needs.

Now, let’s look at nine of the best landing page builders to choose from.

1. Carrd

Screenshot from Carrd.com, April 2024
  • Best for: Simple projects, personal use, and small budgets.

Looking for a great landing page builder that won’t break the bank? Look no further than Carrd.

Carrd is a streamlined landing page builder that focuses on creating single-page websites quickly and easily. It’s designed for simplicity, making it ideal for anyone who wants to quickly create a webpage without needing to build a multi-page site.

Think portfolios, personal profiles, project presentations, and small business showcases.

Carrd’s user-friendly interface and selection of themes allow users to create sleek pages in the blink of an eye without even needing an account – you can just visit the website, pick a theme, and get started. However, you will need to sign up to save or publish your site.

It balances simplicity and functionality to help you craft that pages are clean, focused, and responsive across all devices. If you’re just testing the waters or working with slim budgets, this is the right tool for you.

Advertisement

Pros:

  • Extremely affordable, with a free tier available.
  • Intuitive and user-friendly interface.
  • Responsive design.
  • Fast and lightweight, making it ideal for quick and simple sites.

Cons:

  • Limited to single-page websites.
  • Restrictive layouts/themes, which limit creative freedom.
  • Lacks advanced features and integrations found in more comprehensive builders.

Pricing: 

  • Carrd’s free basic plan allows you to launch three sites with Carrd branding to .carrd.co domains.
  • Paid plans range from $9 to $49 per year, and offer additional features like no Carrd branding, custom domains, and Google Analytics support (depending on your membership tier).

2. ConvertKit 

1714739165 738 The 9 Best Landing Page Builders For 2024Screenshot from ConvertKit.com, April 2024
  • Best for: Content creators, bloggers, and marketers focused on growing their audience through email.

Billing itself as “the creator marketing platform,” ConvertKit’s landing page builder is targeted at creators, bloggers, and marketers who want to expand their email subscriber base.

If your goal is to create a landing page to help you build an email list, ConvertKit might be the right option for you.

ConvertKit’s landing page builder offers a range of high-quality, customizable templates and integrates with third-party tools to help you get the most out of your site.

It also integrates with the stock photo platform Unsplash to offer access to 100,000+ free images for your landing page.

While ConvertKit’s analytics and customization options might not be as extensive as those of some of its competitors, its ease of use and focused approach make it a standout for email-driven campaigns.

Pros:

Advertisement
  • No cost for starting (up to 1,000 subscribers).
  • Access to thousands of free images.
  • Robust third-party integration capabilities.

Cons:

  • Limited template customization and flexibility.
  • A/B testing and analytics features are less advanced.
  • Priced higher than some basic landing page builders.

Pricing:

  • Free plan for up to 10,000 subscribers.
  • The Creator plan starts at $25/month and includes benefits like automation features and app integrations. Pricing scales are based on subscriber count.

3. Unbounce

1714739165 787 The 9 Best Landing Page Builders For 2024Screenshot from Unbounce.com, April 2024

Unbounce is a leading landing page builder renowned for its focus on conversion rate optimization (CRO). Its website promises to help you “build high-converting landing pages with ease.”

It offers a suite of advanced tools, such as A/B testing, dynamic text replacement, and the Smart Traffic system, which uses AI to optimize visitor flow to the highest-converting page variant based on user behavior and characteristics.

It also focuses on features that can help you boost your lead gen efforts, such as opt-in email popups and sticky banners.

With 100+ responsive templates, Unbound makes it easy to create landing pages that are both engaging and effective.

Compared to some other options on this list, Unbounce is a particularly robust platform with tons of customization and integrations – and the price point reflects that.

As a premium offering with a steeper learning curve, it might not be the best for beginners – but its AI-powered features and conversion-focused tools make it a formidable tool for achieving your goals.

Pros:

Advertisement
  • Advanced A/B testing and AI-driven optimization.
  • Large selection of responsive templates.
  • Integrated features for enhancing lead capture.

Cons:

  • Higher price point than some other builders, which might not work for those with limited budgets.
  • Complex setup and steeper learning curve for new users.
  • Some customization limitations.

Pricing

  • The Build plan starts at $74/month and covers unlimited conversions, one root domain, and up to 20,000 monthly unique visitors.
  • Other paid plans range from $112/month up to $649/month.

4. Leadpages

1714739166 24 The 9 Best Landing Page Builders For 2024Screenshot from Leadpages.com, April 2024
  • Best for: Small businesses and entrepreneurs looking to generate sales.

Need a landing page that will help you generate sales? Consider taking a look at Leadpages.

Its strength lies in its user-friendly, drag-and-drop editor and an extensive collection of templates that streamline the page-building process. Plus, according to the Leadpages website, it’s a platform that converts five times better than the industry average.

Leadages offers CRO tools, real-time analytics, and A/B testing capabilities, enabling users to enhance their page performance effectively.

Its various widgets allow you to add videos, images, forms, and even payment integrations directly onto your landing pages, making it a versatile tool for businesses that want to combine content with sales functionality.

On top of all this, Leadpages now includes an AI Engine for creating headlines and images and an AI writing assistant at some membership tiers, which can help you write better content.

Pros:

  • Intuitive no-code editor and easy payment integration.
  • Comprehensive A/B testing and real-time analytics.
  • Extensive template library with over 250 options.

Cons:

  • Higher cost compared to some alternatives.
  • Limited ecommerce features and potential mobile responsiveness issues.
  • Some users report mobile responsiveness issues.

Pricing:

  • The standard plan starts at $37/month for one custom domain, unlimited traffic and leads, and 10,000 monthly AI Engine credits.
  • More advanced features are available in higher-tier plans, which start at $74/month.

5. Landingi

1714739166 311 The 9 Best Landing Page Builders For 2024Screenshot from Landingi.com, April 2024
  • Best for: Businesses seeking a versatile landing page solution with a wide range of features.

If you’re in the market for versatility, Landingi is worth investigating.

Landingi offers a flexible, comprehensive landing page builder with a robust set of features, including an advanced editor, popups, A/B testing, and a substantial library of 300+ templates.

Advertisement

Its unique Smart Sections feature allows you to reuse and easily update specific page elements across multiple designs, saving time and headaches.

Designed to serve businesses of all sizes, Landingi’s simple, drag-and-drop builder can help you create and optimize various types of landing pages – and if you have any HTML and CSS knowledge, it can be a pretty impressive editor.

Landingi is a particularly strong choice for small businesses looking to target different customer segments with unique landing pages. The integration capabilities with numerous apps, including payment gateways like Stripe, make it a great choice for companies looking to sell products.

While its rich feature set can be overwhelming for newcomers, and creating pages might take a bit longer compared to other platforms, the level of customization and control it offers makes Landingi one of the best landing page builders out there.

Pros:

  • Extensive template library with 300+ customizable options.
  • Powerful editing capabilities with Smart Sections for efficient design.
  • Broad integration with various apps, including payment systems.

Cons:

  • Steeper learning curve for beginners.
  • Potentially longer time to create landing pages compared to simpler platforms.

Pricing:

  • The Lite plan starts at $35/month and gives you 10 digital assets, unlimited conversions, 5,000 visits per month, and one custom domain.
  • Landingi also offers Professional and Unlimited tiers with more advanced features and capabilities.

6. Instapage

1714739166 945 The 9 Best Landing Page Builders For 2024Screenshot from Instapage.com, April 2024
  • Best for: Large businesses, marketing teams, or agencies that require collaboration and advanced optimization features.

If you’re seeking a more high-end landing page platform, Instapage might be the one for you. It offers advanced features tailored for professional marketing teams and agencies with a need to create optimized landing pages at scale.

In addition to a drag-and-drop builder and plenty of high-quality templates, Instapage offers a bevy of features, including advanced cloud-based team collaboration tools, heatmaps for user engagement analysis, robust A/B testing capabilities, AI-generated content creation, and more.

Advertisement

One of its standout features is Instablocks, which allows users to create custom page components that can be easily reused across different projects.

Instapage supports advanced marketing goals with features like AdMap to align ads with page content. Plus, its mobile-friendly design ensures a fast, seamless user experience.

While Instapage offers a premium experience with its comprehensive set of tools and features, its higher price point and complex functionalities may be a barrier for smaller businesses or those new to landing page optimization.

Pros:

  • Extensive customization with a library of professional templates.
  • Instablocks for efficient design and asset reuse.
  • Effective team collaboration features.

Cons:

  • Premium pricing will be a barrier for many businesses.
  • Steep learning curve for utilizing advanced features.
  • Limitations in reporting and visitor tracking for lower-tier plans.

Pricing:

  • The Build plan starts at $199/month, with a 14-day free trial.
  • Customers will need to upgrade to a customized Convert plan to access some of the more complex features, such as AdMap, heatmaps, and more.

7. Wix 

1714739166 903 The 9 Best Landing Page Builders For 2024Screenshot from Wix.com, April 2024
  • Best for: Individuals and small businesses seeking creative control without advanced coding.

Now for something much more accessible: Wix is renowned for its user-friendly platform, which is ideal for creating attractive landing pages with minimal effort.

Like other options on this list, Wix offers an accessible drag-and-drop editor and a range of existing templates to help users craft aesthetically pleasing and functional landing pages.

Wix’s platform has a reputation for being particularly beginner-friendly, with a low learning curve and a free plan to help new users get started without any upfront investment.

Advertisement

For those focused on ecommerce, Wix provides specific features to build landing pages that showcase products and promotions, supported by over 50 payment solutions and tools like heatmaps to enhance user engagement and conversion rates.

While it offers a free starter plan, accessing more advanced functionalities and removing Wix ads requires upgrading to a paid subscription.

Wix’s balance of user-friendly design tools, ecommerce support, and cost-effective pricing makes it a favorable option for those new to web design or businesses needing straightforward, visually appealing landing pages.

Pros:

  • User-friendly with an intuitive drag-and-drop interface.
  • Free plan available, making it accessible for beginners.
  • Ecommerce capabilities with extensive payment integration.

Cons:

  • Advanced features and ad-free experience require a paid plan.
  • Potential limitations in customization for complex requirements.
  • Site speed may decrease with more intricate designs.

Pricing:

  • A free plan is available, but it includes Wix branding and lacks more advanced features like payments.
  • Paid plans start at $17/month, offering additional features.

8. Elementor

1714739166 573 The 9 Best Landing Page Builders For 2024Screenshot from Elementor.com, April 2024
  • Best for: WordPress users looking for a powerful and intuitive landing page builder.

If you’re a WordPress user, you’ll want to know about Elementor.

It’s a WordPress page builder that has gained popularity for its flexibility, comprehensive customization capabilities, and user-friendly interface.

Elementor allows users to design dynamic and detailed landing pages within WordPress. This feature makes it the perfect choice for WordPress users who want to extend the functionality of their website with sleek landing pages that maintain a consistent look and feel with their existing content.

Advertisement

Its real-time editing features allow for immediate feedback on design changes without any coding.

It also offers dozens of designer-made templates to choose from. You can add custom forms and popups to your landing page, save page components for reuse, and seamlessly integrate with your customer relationship management (CRM) tools to create a powerful customer experience.

While Elementor offers a ton in terms of design flexibility and integration, it’s important to note that it’s exclusively for WordPress users and can be resource-intensive – so it might impact site performance, especially on more complex websites.

Pros:

  • Advanced customization and design flexibility.
  • Real-time editing and instant feedback.
  • Seamless WordPress integration.

Cons:

  • Exclusively for WordPress users.
  • Potentially impacts site performance due to resource intensity.

Pricing:

  • Free version with limited functionality.
  • Paid versions start at $59/year, providing advanced features and support.

9. Taplink

1714739166 136 The 9 Best Landing Page Builders For 2024Screenshot from Taplink.at, April 2024
  • Best for: Social media influencers and businesses looking to direct traffic from social platforms to other content or actions.

This one’s a little different than some of the other examples here, but it deserves highlighting.

Taplink is a specialized micro-landing page builder optimized for social media profiles. It’s perfect for businesses and influencers that want to drive traffic from social media to other content or actions. You just use Taplink to create landing pages and share them on your social profiles.

Taplink stands out for its simplicity and mobile optimization, which is crucial when targeting social media audiences.

Advertisement

The focus on quick, effective page creation allows users to engage with their audience without the complexities of traditional website development.

For those aiming to convert social media interest into tangible outcomes – such as lead generation, sales, or content promotion – Taplink is a winner.

While its feature set is more limited than some of the more comprehensive builders featured here, its affordability and user-friendly design make it a great tool for those looking to maximize their social engagement with minimal effort and investment.

Pros:

  • Simple and quick setup.
  • Mobile-optimized for social media engagement.
  • Cost-effective for targeted campaigns.

Cons:

  • Designed primarily for micro-landing pages, limiting the scope.
  • Fewer features and customization options than extensive landing page builders.

Pricing:

  • Taplink offers a free basic plan, with premium features available on paid plans starting from $3/month. The most expensive tier, the Business tier, is $6/month.

There’s A Landing Page Platform To Help You Convert Visitors

Choosing the right landing page builder for your business can significantly impact your marketing success – but the decision will depend on your specific goals and needs.

As we’ve explored, each tool has unique strengths and caters to different aspects of the landing page creation and optimization process.

Whether you’re looking for advanced design capabilities, a user-friendly interface, or specific functionalities like CRO, there’s a platform that can help you not just streamline your landing page design process, but start converting visitors into loyal customers.

Advertisement

More resources: 


Featured Image: Griboedov/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Google’s Search Engine Market Share Drops As Competitors’ Grows

Published

on

By

Assorted search engine apps including Google, You.com and Bing are seen on an iPhone. Microsoft plans to use ChatGPT in Bing, and You.com has launched an AI chatbot.

According to data from GS Statcounter, Google’s search engine market share has fallen to 86.99%, the lowest point since the firm began tracking search engine share in 2009.

The drop represents a more than 4% decrease from the previous month, marking the largest single-month decline on record.

Screenshot from: https://gs.statcounter.com/search-engine-market-share/, May 2024.

U.S. Market Impact

The decline is most significant in Google’s key market, the United States, where its share of searches across all devices fell by nearly 10%, reaching 77.52%.

1714669058 226 Googles Search Engine Market Share Drops As Competitors GrowsScreenshot from: https://gs.statcounter.com/search-engine-market-share/, May 2024.

Concurrently, competitors Microsoft Bing and Yahoo Search have seen gains. Bing reached a 13% market share in the U.S. and 5.8% globally, its highest since launching in 2009.

Yahoo Search’s worldwide share nearly tripled to 3.06%, a level not seen since July 2015.

1714669058 375 Googles Search Engine Market Share Drops As Competitors GrowsScreenshot from: https://gs.statcounter.com/search-engine-market-share/, May 2024.

Search Quality Concerns

Many industry experts have recently expressed concerns about the declining quality of Google’s search results.

A portion of the SEO community believes that the search giant’s results have worsened following the latest update.

Advertisement

These concerns have begun to extend to average internet users, who are increasingly voicing complaints about the state of their search results.

Alternative Perspectives

Web analytics platform SimilarWeb provided additional context on X (formerly Twitter), stating that its data for the US for March 2024 suggests Google’s decline may not be as severe as initially reported.

SimilarWeb also highlighted Yahoo’s strong performance, categorizing it as a News and Media platform rather than a direct competitor to Google in the Search Engine category.

Why It Matters

The shifting search engine market trends can impact businesses, marketers, and regular users.

Google has been on top for a long time, shaping how we find things online and how users behave.

However, as its market share drops and other search engines gain popularity, publishers may need to rethink their online strategies and optimize for multiple search platforms besides Google.

Users are becoming vocal about Google’s declining search quality over time. As people start trying alternate search engines, the various platforms must prioritize keeping users satisfied if they want to maintain or grow their market position.

It will be interesting to see how they respond to this boost in market share.

What It Means for SEO Pros

As Google’s competitors gain ground, SEO strategies may need to adapt by accounting for how each search engine’s algorithms and ranking factors work.

Advertisement

This could involve diversifying SEO efforts across multiple platforms and staying up-to-date on best practices for each one.

The increased focus on high-quality search results emphasizes the need to create valuable, user-focused content that meets the needs of the target audience.

SEO pros must prioritize informative, engaging, trustworthy content that meets search engine algorithms and user expectations.

Remain flexible, adaptable, and proactive to navigate these shifts. Keeping a pulse on industry trends, user behaviors, and competing search engine strategies will be key for successful SEO campaigns.


Featured Image: Tada Images/Shutterstock



Source link

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

How To Drive Pipeline With A Silo-Free Strategy

Published

on

By

How To Drive Pipeline With A Silo-Free Strategy

When it comes to B2B strategy, a holistic approach is the only approach. 

Revenue organizations usually operate with siloed teams, and often expect a one-size-fits-all solution (usually buying clicks with paid media). 

However, without cohesive brand, infrastructure, and pipeline generation efforts, they’re pretty much doomed to fail. 

It’s just like rowing crew, where each member of the team must synchronize their movements to propel the boat forward – successful B2B marketing requires an integrated strategy. 

So if you’re ready to ditch your disjointed marketing efforts and try a holistic approach, we’ve got you covered.

Advertisement

Join us on May 15, for an insightful live session with Digital Reach Agency on how to craft a compelling brand and PMF. 

We’ll walk through the critical infrastructure you need, and the reliances and dependences of the core digital marketing disciplines.

Key takeaways from this webinar:

  • Thinking Beyond Traditional Silos: Learn why traditional marketing silos are no longer viable and how they spell doom for modern revenue organizations.
  • How To Identify and Fix Silos: Discover actionable strategies for pinpointing and sealing the gaps in your marketing silos. 
  • The Power of Integration: Uncover the secrets to successfully integrating brand strategy, digital infrastructure, and pipeline generation efforts.

Ben Childs, President and Founder of Digital Reach Agency, and Jordan Gibson, Head of Growth at Digital Reach Agency, will show you how to seamlessly integrate various elements of your marketing strategy for optimal results.

Don’t make the common mistake of using traditional marketing silos – sign up now and learn what it takes to transform your B2B go-to-market.

You’ll also get the opportunity to ask Ben and Jordan your most pressing questions, following the presentation.

And if you can’t make it to the live event, register anyway and we’ll send you a recording shortly after the webinar. 

Advertisement

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS