Connect with us

SEO

How To Do A Sitemap Audit For Better Indexing & Crawling Via Python

Published

on

How To Do A Sitemap Audit For Better Indexing & Crawling Via Python

Sitemap auditing involves syntax, crawlability, and indexation checks for the URLs and tags in your sitemap files.

A sitemap file contains the URLs to index with further information regarding the last modification date, priority of the URL, images, videos on the URL, and other language alternates of the URL, along with the change frequency.

Sitemap index files can involve millions of URLs, even if a single sitemap can only involve 50,000 URLs at the top.

Auditing these URLs for better indexation and crawling might take time.

But with the help of Python and SEO automation, it is possible to audit millions of URLs within the sitemaps.

Advertisement

What Do You Need To Perform A Sitemap Audit With Python?

To understand the Python Sitemap Audit process, you’ll need:

  • A fundamental understanding of technical SEO and sitemap XML files.
  • Working knowledge of Python and sitemap XML syntax.
  • The ability to work with Python Libraries, Pandas, Advertools, LXML, Requests, and XPath Selectors.

Which URLs Should Be In The Sitemap?

A healthy sitemap XML sitemap file should include the following criteria:

  • All URLs should have a 200 Status Code.
  • All URLs should be self-canonical.
  • URLs should be open to being indexed and crawled.
  • URLs shouldn’t be duplicated.
  • URLs shouldn’t be soft 404s.
  • The sitemap should have a proper XML syntax.
  • The URLs in the sitemap should have an aligning canonical with Open Graph and Twitter Card URLs.
  • The sitemap should have less than 50.000 URLs and a 50 MB size.

What Are The Benefits Of A Healthy XML Sitemap File?

Smaller sitemaps are better than larger sitemaps for faster indexation. This is particularly important in News SEO, as smaller sitemaps help for increasing the overall valid indexed URL count.

Differentiate frequently updated and static content URLs from each other to provide a better crawling distribution among the URLs.

Using the “lastmod” date in an honest way that aligns with the actual publication or update date helps a search engine to trust the date of the latest publication.

While performing the Sitemap Audit for better indexing, crawling, and search engine communication with Python, the criteria above are followed.

An Important Note…

When it comes to a sitemap’s nature and audit, Google and Microsoft Bing don’t use “changefreq” for changing frequency of the URLs and “priority” to understand the prominence of a URL. In fact, they call it a “bag of noise.”

However, Yandex and Baidu use all these tags to understand the website’s characteristics.

A 16-Step Sitemap Audit For SEO With Python

A sitemap audit can involve content categorization, site-tree, or topicality and content characteristics.

Advertisement

However, a sitemap audit for better indexing and crawlability mainly involves technical SEO rather than content characteristics.

In this step-by-step sitemap audit process, we’ll use Python to tackle the technical aspects of sitemap auditing millions of URLs.

Image created by the author, February 2022

1. Import The Python Libraries For Your Sitemap Audit

The following code block is to import the necessary Python Libraries for the Sitemap XML File audit.

import advertools as adv

import pandas as pd

from lxml import etree

from IPython.core.display import display, HTML

display(HTML("<style>.container { width:100% !important; }</style>"))

Here’s what you need to know about this code block:

  • Advertools is necessary for taking the URLs from the sitemap file and making a request for taking their content or the response status codes.
  • “Pandas” is necessary for aggregating and manipulating the data.
  • Plotly is necessary for the visualization of the sitemap audit output.
  • LXML is necessary for the syntax audit of the sitemap XML file.
  • IPython is optional to expand the output cells of Jupyter Notebook to 100% width.

2. Take All Of The URLs From The Sitemap

Millions of URLs can be taken into a Pandas data frame with Advertools, as shown below.

sitemap_url = "https://www.complaintsboard.com/sitemap.xml"
sitemap = adv.sitemap_to_df(sitemap_url)
sitemap.to_csv("sitemap.csv")
sitemap_df = pd.read_csv("sitemap.csv", index_col=False)
sitemap_df.drop(columns=["Unnamed: 0"], inplace=True)
sitemap_df

Above, the Complaintsboard.com sitemap has been taken into a Pandas data frame, and you can see the output below.

Sitemap URL ExtractionA General Sitemap URL Extraction with Sitemap Tags with Python is above.

In total, we have 245,691 URLs in the sitemap index file of Complaintsboard.com.

The website uses “changefreq,” “lastmod,” and “priority” with an inconsistency.

3. Check Tag Usage Within The Sitemap XML File

To understand which tags are used or not within the Sitemap XML file, use the function below.

Advertisement
def check_sitemap_tag_usage(sitemap):
     lastmod = sitemap["lastmod"].isna().value_counts()
     priority = sitemap["priority"].isna().value_counts()
     changefreq = sitemap["changefreq"].isna().value_counts()
     lastmod_perc = sitemap["lastmod"].isna().value_counts(normalize = True) * 100
     priority_perc = sitemap["priority"].isna().value_counts(normalize = True) * 100
     changefreq_perc = sitemap["changefreq"].isna().value_counts(normalize = True) * 100
     sitemap_tag_usage_df = pd.DataFrame(data={"lastmod":lastmod,
     "priority":priority,
     "changefreq":changefreq,
     "lastmod_perc": lastmod_perc,
     "priority_perc": priority_perc,
     "changefreq_perc": changefreq_perc})
     return sitemap_tag_usage_df.astype(int)

The function check_sitemap_tag_usage is a data frame constructor based on the usage of the sitemap tags.

It takes the “lastmod,” “priority,” and “changefreq” columns by implementing “isna()” and “value_counts()” methods via “pd.DataFrame”.

Below, you can see the output.

Sitemap Tag AuditSitemap Audit with Python for sitemap tags’ usage.

The data frame above shows that 96,840 of the URLs do not have the Lastmod tag, which is equal to 39% of the total URL count of the sitemap file.

The same usage percentage is 19% for the “priority” and the “changefreq” within the sitemap XML file.

There are three main content freshness signals from a website.

These are dates from a web page (visible to the user), structured data (invisible to the user), “lastmod” in the sitemap.

Advertisement

If these dates are not consistent with each other, search engines can ignore the dates on the websites to see their freshness signals.

4. Audit The Site-tree And URL Structure Of The Website

Understanding the most important or crowded URL Path is necessary to weigh the website’s SEO efforts or technical SEO Audits.

A single improvement for Technical SEO can benefit thousands of URLs simultaneously, which creates a cost-effective and budget-friendly SEO strategy.

URL Structure Understanding mainly focuses on the website’s more prominent sections and content network analysis understanding.

To create a URL Tree Dataframe from a website’s URLs from the sitemap, use the following code block.

sitemap_url_df = adv.url_to_df(sitemap_df["loc"])
sitemap_url_df

With the help of “urllib” or the “advertools” as above, you can easily parse the URLs within the sitemap into a data frame.

Advertisement
Python sitemap auditCreating a URL Tree with URLLib or Advertools is easy.
Checking the URL breakdowns helps to understand the overall information tree of a website.

The data frame above contains the “scheme,” “netloc,” “path,” and every “/” breakdown within the URLs as a “dir” which represents the directory.

Auditing the URL structure of the website is prominent for two objectives.

These are checking whether all URLs have “HTTPS” and understanding the content network of the website.

Content analysis with sitemap files is not the topic of the “Indexing and Crawling” directly, thus at the end of the article, we will talk about it slightly.

Check the next section to see the SSL Usage on Sitemap URLs.

5. Check The HTTPS Usage On The URLs Within Sitemap

Use the following code block to check the HTTP Usage ratio for the URLs within the Sitemap.

sitemap_url_df["scheme"].value_counts().to_frame()

The code block above uses a simple data filtration for the “scheme” column which contains the URLs’ HTTPS Protocol information.

Advertisement

using the “value_counts” we see that all URLs are on the HTTPS.

Python https scheme columnChecking the HTTP URLs from the Sitemaps can help to find bigger URL Property consistency errors.

6. Check The Robots.txt Disallow Commands For Crawlability

The structure of URLs within the sitemap is beneficial to see whether there is a situation for “submitted but disallowed”.

To see whether there is a robots.txt file of the website, use the code block below.

import requests
r = requests.get("https://www.complaintsboard.com/robots.txt")
R.status_code
200

Simply, we send a “get request” to the robots.txt URL.

If the response status code is 200, it means there is a robots.txt file for the user-agent-based crawling control.

After checking the “robots.txt” existence, we can use the “adv.robotstxt_test” method for bulk robots.txt audit for crawlability of the URLs in the sitemap.

sitemap_df_robotstxt_check = adv.robotstxt_test("https://www.complaintsboard.com/robots.txt", urls=sitemap_df["loc"], user_agents=["*"])
sitemap_df_robotstxt_check["can_fetch"].value_counts()

We have created a new variable called “sitemap_df_robotstxt_check”, and assigned the output of the “robotstxt_test” method.

Advertisement

We have used the URLs within the sitemap with the “sitemap_df[“loc”]”.

We have performed the audit for all of the user-agents via the “user_agents = [“*”]” parameter and value pair.

You can see the result below.

True     245690
False         1
Name: can_fetch, dtype: int64

It shows that there is one URL that is disallowed but submitted.

We can filter the specific URL as below.

pd.set_option("display.max_colwidth",255)
sitemap_df_robotstxt_check[sitemap_df_robotstxt_check["can_fetch"] == False]

We have used “set_option” to expand all of the values within the “url_path” section.

Advertisement
Python Sitemap Audit Robots TXT CheckA URL appears as disallowed but submitted via a sitemap as in Google Search Console Coverage Reports.
We see that a “profile” page has been disallowed and submitted.

Later, the same control can be done for further examinations such as “disallowed but internally linked”.

But, to do that, we need to crawl at least 3 million URLs from ComplaintsBoard.com, and it can be an entirely new guide.

Some website URLs do not have a proper “directory hierarchy”, which can make the analysis of the URLs, in terms of content network characteristics, harder.

Complaintsboard.com doesn’t use a proper URL structure and taxonomy, so analyzing the website structure is not easy for an SEO or Search Engine.

But the most used words within the URLs or the content update frequency can signal which topic the company actually weighs on.

Since we focus on “technical aspects” in this tutorial, you can read the Sitemap Content Audit here.

7. Check The Status Code Of The Sitemap URLs With Python

Every URL within the sitemap has to have a 200 Status Code.

Advertisement

A crawl has to be performed to check the status codes of the URLs within the sitemap.

But, since it’s costly when you have millions of URLs to audit, we can simply use a new crawling method from Advertools.

Without taking the response body, we can crawl just the response headers of the URLs within the sitemap.

It is useful to decrease the crawl time for auditing possible robots, indexing, and canonical signals from the response headers.

To perform a response header crawl, use the “adv.crawl_headers” method.

adv.crawl_headers(sitemap_df["loc"], output_file="sitemap_df_header.jl")
df_headers = pd.read_json("sitemap_df_header.jl", lines=True)
df_headers["status"].value_counts()

The explanation of the code block for checking the URLs’ status codes within the Sitemap XML Files for the Technical SEO aspect can be seen below.

Advertisement
200    207866
404        23
Name: status, dtype: int64

It shows that the 23 URL from the sitemap is actually 404.

And, they should be removed from the sitemap.

To audit which URLs from the sitemap are 404, use the filtration method below from Pandas.

df_headers[df_headers["status"] == 404]

The result can be seen below.

Python Sitemap Audit for URL Status CodeFinding the 404 URLs from Sitemaps is helpful against Link Rot.

8. Check The Canonicalization From Response Headers

From time to time, using canonicalization hints on the response headers is beneficial for crawling and indexing signal consolidation.

In this context, the canonical tag on the HTML and the response header has to be the same.

If there are two different canonicalization signals on a web page, the search engines can ignore both assignments.

Advertisement

For ComplaintsBoard.com, we don’t have a canonical response header.

  • The first step is auditing whether the response header for canonical usage exists.
  • The second step is comparing the response header canonical value to the HTML canonical value if it exists.
  • The third step is checking whether the canonical values are self-referential.

Check the columns of the output of the header crawl to check the Canonicalization from Response Headers.

df_headers.columns

Below, you can see the columns.

Python Sitemap URL Response Header AuditPython SEO Crawl Output Data Frame columns. “dataframe.columns” method is always useful to check.

If you are not familiar with the response headers, you may not know how to use canonical hints within response headers.

A response header can include the canonical hint with the “Link” value.

It is registered as “resp_headers_link” by the Advertools directly.

Another problem is that the extracted strings appear within the “<URL>;” string pattern.

It means we will use regex to extract it.

Advertisement
df_headers["resp_headers_link"]

You can see the result below.

Sitemap URL Response HeaderScreenshot from Pandas, February 2022

The regex pattern “[^<>][a-z:/0-9-.]*” is good enough to extract the specific canonical value.

A self-canonicalization check with the response headers is below.

df_headers["response_header_canonical"] = df_headers["resp_headers_link"].str.extract(r"([^<>][a-z:/0-9-.]*)")
(df_headers["response_header_canonical"] == df_headers["url"]).value_counts()

We have used two different boolean checks.

One to check whether the response header canonical hint is equal to the URL itself.

Another to see whether the status code is 200.

Since we have 404 URLs within the sitemap, their canonical value will be “NaN”.

Advertisement
Non-canonical URL in Sitemap Audit with PythonIt shows there are specific URLs with canonicalization inconsistencies.
We have 29 outliers for Technical SEO. Every wrong signal given to the search engine for indexation or ranking will cause the dilution of the ranking signals.

To see these URLs, use the code block below.

Response Header Python SEO AuditScreenshot from Pandas, February 2022.

The Canonical Values from the Response Headers can be seen above.

df_headers[(df_headers["response_header_canonical"] != df_headers["url"]) & (df_headers["status"] == 200)]

Even a single “/” in the URL can cause canonicalization conflict as appears here for the homepage.

Canonical Response Header CheckComplaintsBoard.com Screenshot for checking the Response Header Canonical Value and the Actual URL of the web page.
You can check the canonical conflict here.

If you check log files, you will see that the search engine crawls the URLs from the “Link” response headers.

Thus in technical SEO, this should be weighted.

9. Check The Indexing And Crawling Commands From Response Headers

There are 14 different X-Robots-Tag specifications for the Google search engine crawler.

The latest one is “indexifembedded” to determine the indexation amount on a web page.

The Indexing and Crawling directives can be in the form of a response header or the HTML meta tag.

This section focuses on the response header version of indexing and crawling directives.

Advertisement
  • The first step is checking whether the X-Robots-Tag property and values exist within the HTTP Header or not.
  • The second step is auditing whether it aligns itself with the HTML Meta Tag properties and values if they exist.

Use the command below yo check the X-Robots-Tag” from the response headers.

def robots_tag_checker(dataframe:pd.DataFrame):
     for i in df_headers:
          if i.__contains__("robots"):
               return i
          else:
               return "There is no robots tag"
robots_tag_checker(df_headers)
OUTPUT>>>
'There is no robots tag'

We have created a custom function to check the “X-Robots-tag” response headers from the web pages’ source code.

It appears that our test subject website doesn’t use the X-Robots-Tag.

If there would be an X-Robots-tag, the code block below should be used.

df_headers["response_header_x_robots_tag"].value_counts()
df_headers[df_headers["response_header_x_robots_tag"] == "noindex"]

Check whether there is a “noindex” directive from the response headers, and filter the URLs with this indexation conflict.

In the Google Search Console Coverage Report, those appear as “Submitted marked as noindex”.

Contradicting indexing and canonicalization hints and signals might make a search engine ignore all of the signals while making the search algorithms trust less to the user-declared signals.

Advertisement

10. Check The Self Canonicalization Of Sitemap URLs

Every URL in the sitemap XML files should give a self-canonicalization hint.

Sitemaps should only include the canonical versions of the URLs.

The Python code block in this section is to understand whether the sitemap URLs have self-canonicalization values or not.

To check the canonicalization from the HTML Documents’ “<head>” section, crawl the websites by taking their response body.

Use the code block below.

user_agent = "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The difference between “crawl_headers” and the “crawl” is that “crawl” takes the entire response body while the “crawl_headers” is only for response headers.

Advertisement
adv.crawl(sitemap_df["loc"],

output_file="sitemap_crawl_complaintsboard.jl",

follow_links=False,

custom_settings={"LOG_FILE":"sitemap_crawl_complaintsboard.log", “USER_AGENT”:user_agent})

You can check the file size differences from crawl logs to response header crawl and entire response body crawl.

SEO Crawl PythonPython Crawl Output Size Comparison.

From 6GB output to the 387 MB output is quite economical.

If a search engine just wants to see certain response headers and the status code, creating information on the headers would make their crawl hits more economical.

How To Deal With Large DataFrames For Reading And Aggregating Data?

This section requires dealing with the large data frames.

A computer can’t read a Pandas DataFrame from a CSV or JL file if the file size is larger than the computer’s RAM.

Thus, the “chunking” method is used.

When a website sitemap XML File contains millions of URLs, the total crawl output will be larger than tens of gigabytes.

Advertisement

An iteration across sitemap crawl output data frame rows is necessary.

For chunking, use the code block below.

df_iterator = pd.read_json(

    'sitemap_crawl_complaintsboard.jl',

    chunksize=10000,

     lines=True)
for i, df_chunk in enumerate(df_iterator):

    output_df = pd.DataFrame(data={"url":df_chunk["url"],"canonical":df_chunk["canonical"], "self_canonicalised":df_chunk["url"] == df_chunk["canonical"]})
    mode="w" if i == 0 else 'a'

    header = i == 0

    output_df.to_csv(

        "canonical_check.csv",

        index=False,

        header=header,

        mode=mode

       )

df[((df["url"] != df["canonical"]) == True) & (df["self_canonicalised"] == False) & (df["canonical"].isna() != True)]

You can see the result below.

Python SEO AuditPython SEO Canonicalization Audit.

We see that the paginated URLs from the “book” subfolder give canonical hints to the first page, which is a non-correct practice according to the Google guidelines.

11. Check The Sitemap Sizes Within Sitemap Index Files

Every Sitemap File should be less than 50 MB. Use the Python code block below in the Technical SEO with Python context to check the sitemap file size.

pd.pivot_table(sitemap_df[sitemap_df["loc"].duplicated()==True], index="sitemap")

You can see the result below.

Python SEO sitemap sizingPython SEO Sitemap Size Audit.

We see that all sitemap XML files are under 50MB.

For better and faster indexation, keeping the sitemap URLs valuable and unique while decreasing the size of the sitemap files is beneficial.

Advertisement

12. Check The URL Count Per Sitemap With Python

Every URL within the sitemaps should have fewer than 50.000 URLs.

Use the Python code block below to check the URL Counts within the sitemap XML files.

(pd.pivot_table(sitemap_df,

values=["loc"],

index="sitemap",

aggfunc="count")

.sort_values(by="loc", ascending=False))

You can see the result below.

Sitemap URL Count CheckPython SEO Sitemap URL Count Audit.
All sitemaps have less than 50.000 URLs. Some sitemaps have only one URL, which wastes the search engine’s attention.

Keeping sitemap URLs that are frequently updated different from the static and stale content URLs is beneficial.

URL Count and URL Content character differences help a search engine to adjust crawl demand effectively for different website sections.

13. Check The Indexing And Crawling Meta Tags From URLs’ Content With Python

Even if a web page is not disallowed from robots.txt, it can still be disallowed from the HTML Meta Tags.

Thus, checking the HTML Meta Tags for better indexation and crawling is necessary.

Advertisement

Using the “custom selectors” is necessary to perform the HTML Meta Tag audit for the sitemap URLs.

sitemap = adv.sitemap_to_df("https://www.holisticseo.digital/sitemap.xml")

adv.crawl(url_list=sitemap["loc"][:1000], output_file="meta_command_audit.jl",

follow_links=False,

xpath_selectors= {"meta_command": "//meta[@name="robots"]/@content"},

custom_settings={"CLOSESPIDER_PAGECOUNT":1000})

df_meta_check = pd.read_json("meta_command_audit.jl", lines=True)

df_meta_check["meta_command"].str.contains("nofollow|noindex", regex=True).value_counts()

The “//meta[@name=”robots”]/@content” XPATH selector is to extract all the robots commands from the URLs from the sitemap.

We have used only the first 1000 URLs in the sitemap.

And, I stop crawling after the initial 1000 responses.

I have used another website to check the Crawling Meta Tags since ComplaintsBoard.com doesn’t have it on the source code.

You can see the result below.

Advertisement
URL Indexing Audit from Sitemap PythonPython SEO Meta Robots Audit.
None of the URLs from the sitemap have “nofollow” or “noindex” within the “Robots” commands.

To check their values, use the code below.

df_meta_check[df_meta_check["meta_command"].str.contains("nofollow|noindex", regex=True) == False][["url", "meta_command"]]

You can see the result below.

Meta Tag Audit from the WebsitesMeta Tag Audit from the Websites.

14. Validate The Sitemap XML File Syntax With Python

Sitemap XML File Syntax validation is necessary to validate the integration of the sitemap file with the search engine’s perception.

Even if there are certain syntax errors, a search engine can recognize the sitemap file during the XML Normalization.

But, every syntax error can decrease the efficiency for certain levels.

Use the code block below to validate the Sitemap XML File Syntax.

def validate_sitemap_syntax(xml_path: str, xsd_path: str)
    xmlschema_doc = etree.parse(xsd_path)
    xmlschema = etree.XMLSchema(xmlschema_doc)
    xml_doc = etree.parse(xml_path)
    result = xmlschema.validate(xml_doc)
    return result
validate_sitemap_syntax("sej_sitemap.xml", "sitemap.xsd")

For this example, I have used “https://www.searchenginejournal.com/sitemap_index.xml”. The XSD file involves the XML file’s context and tree structure.

It is stated in the first line of the Sitemap file as below.

Advertisement

For further information, you can also check DTD documentation.

15. Check The Open Graph URL And Canonical URL Matching

It is not a secret that search engines also use the Open Graph and RSS Feed URLs from the source code for further canonicalization and exploration.

The Open Graph URLs should be the same as the canonical URL submission.

From time to time, even in Google Discover, Google chooses to use the image from the Open Graph.

To check the Open Graph URL and Canonical URL consistency, use the code block below.

for i, df_chunk in enumerate(df_iterator):

    if "og:url" in df_chunk.columns:

        output_df = pd.DataFrame(data={

        "canonical":df_chunk["canonical"],

        "og:url":df_chunk["og:url"],

        "open_graph_canonical_consistency":df_chunk["canonical"] == df_chunk["og:url"]})

        mode="w" if i == 0 else 'a'

        header = i == 0

        output_df.to_csv(

            "open_graph_canonical_consistency.csv",

            index=False,

            header=header,

            mode=mode

        )
    else:

        print("There is no Open Graph URL Property")
There is no Open Graph URL Property

If there is an Open Graph URL Property on the website, it will give a CSV file to check whether the canonical URL and the Open Graph URL are the same or not.

Advertisement

But for this website, we don’t have an Open Graph URL.

Thus, I have used another website for the audit.

if "og:url" in df_meta_check.columns:

     output_df = pd.DataFrame(data={

     "canonical":df_meta_check["canonical"],

     "og:url":df_meta_check["og:url"],

     "open_graph_canonical_consistency":df_meta_check["canonical"] == df_meta_check["og:url"]})

     mode="w" if i == 0 else 'a'

     #header = i == 0

     output_df.to_csv(

            "df_og_url_canonical_audit.csv",

            index=False,

            #header=header,

            mode=mode
     )

else:

     print("There is no Open Graph URL Property")

df = pd.read_csv("df_og_url_canonical_audit.csv")

df

You can see the result below.

Sitemap Open Graph Audit with PythonPython SEO Open Graph URL Audit.

We see that all canonical URLs and the Open Graph URLs are the same.

Python Audit with CanonicalizationPython SEO Canonicalization Audit.

16. Check The Duplicate URLs Within Sitemap Submissions

A sitemap index file shouldn’t have duplicated URLs across different sitemap files or within the same sitemap XML file.

The duplication of the URLs within the sitemap files can make a search engine download the sitemap files less since a certain percentage of the sitemap file is bloated with unnecessary submissions.

For certain situations, it can appear as a spamming attempt to control the crawling schemes of the search engine crawlers.

use the code block below to check the duplicate URLs within the sitemap submissions.

Advertisement
sitemap_df["loc"].duplicated().value_counts()

You can see that the 49574 URLs from the sitemap are duplicated.

Python SEO Duplicated URL in SitemapPython SEO Duplicated URL Audit from the Sitemap XML Files

To see which sitemaps have more duplicated URLs, use the code block below.

pd.pivot_table(sitemap_df[sitemap_df["loc"].duplicated()==True], index="sitemap", values="loc", aggfunc="count").sort_values(by="loc", ascending=False)

You can see the result.

Python SEO Sitemap AuditPython SEO Sitemap Audit for duplicated URLs.

Chunking the sitemaps can help with site-tree and technical SEO analysis.

To see the duplicated URLs within the Sitemap, use the code block below.

sitemap_df[sitemap_df["loc"].duplicated() == True]

You can see the result below.

Duplicated Sitemap URLDuplicated Sitemap URL Audit Output.

Conclusion

I wanted to show how to validate a sitemap file for better and healthier indexation and crawling for Technical SEO.

Python is vastly used for data science, machine learning, and natural language processing.

But, you can also use it for Technical SEO Audits to support the other SEO Verticals with a Holistic SEO Approach.

Advertisement

In a future article, we can expand these Technical SEO Audits further with different details and methods.

But, in general, this is one of the most comprehensive Technical SEO guides for Sitemaps and Sitemap Audit Tutorial with Python.

More resources: 


Featured Image: elenasavchina2/Shutterstock




Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Google Answers Whether Having Two Sites Affects Rankings

Published

on

By

Google John Mueller answers if publishing multiple sites could affect rankings

Google’s John Mueller answered whether having two sites could negatively affect search rankings. His answer is surprisingly applicable to different ways of interpreting the question.

Can Having Two Sites Affect Rankings?

A person submitted a question to Google where they wanted to know if having two sites could negatively affect their rankings. The question as reproduced in Google is concise and also a little vague which leads to the outcome that the question can be answered in way that’s different from the way that Mueller answered it.

This is the question:

“My rankings have dropped due to Google finding out I have two websites. Is this correct?”

Google’s John Mueller answered:

“No. That’s not likely. Many people have several websites. Separate websites are not a problem.

The issue is often more indirect: if you work on a lot of websites, you’re not going to have a lot of time to make truly awesome websites everywhere. And, if you’re making websites that aren’t awesome, then that can be something which our algorithms pick up on when it comes to recommending your site to others.”

Advertisement

A Different Way To Answer The Question

John Mueller answered the question under the assumption that the two sites in the question are on different topics. He prefaced his answer by saying that many “people have several websites” which is true.

But many people don’t have multiple websites on the same topic. The person asking the question was vague about whether the sites were about different topics, too.

It’s very possible that the sites are on the same topic, in which case it makes sense why they may be concerned that Google found out about the two sites because it could be seen as trying to game Google. After all, who worries about having multiple sites on different topics?

If the sites were on the same topic then the answer to the question is somewhat different.

One of the important considerations when one person controls multiple sites on the same topic is that they’re doing it for ranking purposes which is not a good starting point for any website.

I’m not saying there’s something corrupt about the practice but I am saying that it’s not really the best starting point for creating signals of quality. It’s not a matter of someone thinking that they’re going to create multiple high quality sites for users, right?

Advertisement

Another reason why people create multiple sites for ranking (and not quality) is because people feel if they split up a topic into subsidiary subtopics they can create stronger sites about those related subtopics as opposed to one site with multiple related subtopics.

But what almost inevitably happens is that they wind up running multiple related sites that could be stronger together as one authoritative website.

I asked Bill Hartzer of Hartzer Consulting (Facebook profile) if he thought multiple sites on the same topic could affect rankings.

Bill agreed with me and shared:

“A lot of people, after building a website that ranks well, will think that they can simply create another website on the same topic and “make double the money” or get “double the traffic” and it’s simply not true.

Companies will also have one main website, but they’ll create a separate website on a separate domain name for each of their products or services. Over the past 10 years or so, that hasn’t been a good strategy. While it’s good to register the domain names of your products or services, it’s better to combine all those websites into one main, more authoritative website.

Typically if they’re on the same topic, one website, the original site, will continue to rank well. But the second website doesn’t rank as well. In most cases, it’s always better to combine the websites into one website.”

Advertisement

Multiple Sites And Rankings

John Mueller is right that publishing multiple sites (on different topics) could compromise a person’s ability to focus on one site to make it outstanding, remarking that there’s an indirect negative effect on rankings. He is also correct in saying that it’s unlikely to have a direct negative effect on rankings.

Changing the question to whether there’s an effect on ranking if the multiple sites are on the same topic, then the answer becomes more nuanced but follow a similar trajectory as Mueller’s original answer that it detracts from being able to create one outstanding site and can lead to a person creating multiple middling sites.

But that’s not necessarily a foregone conclusion when a person is creating  multiple sites on different topics. It’s absolutely possible to create multiple sites on different topics and to be successful at it. It might be hard for one person alone to pull it off but it’s not difficult to do when multiple people are working on the websites creating content and focusing on promotion.

Watch/listen to the Google SEO Office hours at the 33 second mark:

Featured Image by Shutterstock/Asier Romero

Advertisement

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

25 WordPress Alternatives Best For SEO

Published

on

By

25 Best SEO-Friendly Alternatives to WordPress Websites

WordPress powers hundreds of millions of websites, but it is not the only content management system (CMS) option.

There’s a diverse marketplace of publishing platforms for those seeking alternatives. This review provides an overview of 25 leading alternatives to WordPress across key website categories.

We explore user-friendly website builders like Wix, Squarespace, and Weebly, which offer drag-and-drop simplicity. We look at flexible open-source options for developers and tech-savvy users, including Joomla, Drupal, and Hugo.

Ecommerce merchants can choose between hosted platforms like Shopify or open-source solutions like Magento. We also cover blogging-focused options like Ghost and Tumblr, web hosting providers like Bluehost, and community management tools like vBulletin.

For each alternative, we summarize the key features, benefits, and drawbacks to consider. Factors like budget, technical abilities, and website goals are examined to help identify the best fit based on individual needs.

Advertisement

While WordPress powers a large share of sites, there’s no shortage of quality options for creating the perfect online presence for those seeking alternatives.

Why Consider A WordPress Alternative?

There are several reasons why someone might consider a WordPress alternative for their website:

  • Specific needs: While WordPress is versatile, some websites may have particular requirements for which other platforms are better suited.
  • Ease of use: Some users may find WordPress challenging, especially if they lack technical skills.
  • Maintenance and security: As an open-source platform, WordPress requires users to handle updates, backups, and security measures themselves.
  • Built-in features: Some alternatives come with built-in features that WordPress requires plugins for.
  • Customization: While WordPress offers many customization options, some users may prefer platforms that allow more granular control over the website’s appearance and functionality.
  • Simplicity: Other publishing platforms might be a better fit for users who want a simple platform to publish content without dealing with the complexities of managing a website.

How To Choose An Alternative To WordPress

Choosing the right WordPress alternative depends on your specific needs and goals. To help you make an informed decision, consider the following factors:

  • Purpose of your website: Determine its primary purpose. Is it a blog, an online store, a portfolio, or a complex business website?
  • Budget: Consider your budget for building and maintaining your website. Some alternatives are free, while others require a subscription or a one-time payment.
  • Technical skills: Assess your technical skills and those of your team. Some alternatives are designed for users with little coding experience, while others may require more technical knowledge.
  • Customization and flexibility: Evaluate how much control you want over your website’s appearance and functionality.
  • Scalability: Consider your website’s potential for growth. If you anticipate a significant increase in traffic or content, choose a platform that can scale with your needs.
  • Support and community: Look into the level of support and the size of the community surrounding each alternative.
  • Hosting: Decide whether you prefer a self-hosted solution or a hosted platform.
  • Features: List the features your website requires. Ensure that your alternative offers these features natively or through extensions.

Once you’ve considered these factors, research various WordPress alternatives and compare them based on your requirements.

Read reviews, explore user communities, and, if possible, test out the platforms through free trials or demos.

This will help you better understand how each alternative works and whether it aligns with your needs and expectations.

25 Best WordPress Alternatives

1. Wix

Screenshot from: Wix.com, March 2024.

Wix is best suited for individuals, small businesses, and entrepreneurs who want to create a professional-looking website without extensive technical skills or a large budget.

The platform’s user-friendly drag-and-drop interface makes it easy for users to design and customize their websites, offering various templates and design elements suitable for multiple purposes.

Advertisement

As a hosted platform, Wix takes care of technical aspects like server maintenance and security updates, making it ideal for those who don’t want to deal with these issues.

Wix also offers a free plan, allowing users to create a website without cost, although with limitations such as Wix branding and a non-custom domain.

One nuance to remember when using Wix is that once you’ve chosen a template and started building your site, it can be challenging to switch to a different template without redesigning your content.

Key Features:

  • Full hosting solution.
  • No software to self-install.
  • Drag-and-drop visual builder.
  • Extensions are available.
  • Optimized for mobile.
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • All-in-one website builder and platform.
  • Free version available.
  • Premade designs and templates.
  • Free subdomain available for all plans.
  • Free custom domain available with paid plans.
  • Customer support.

Cons:

  • There is no option to retain complete control of your site as you can with WordPress.
  • No access to source code.

2. Squarespace

1713545870 853 25 Wordpress Alternatives Best For SEOScreenshot from: squarespace.com, March 2024.

Squarespace is best suited for creatives, bloggers, and small business owners who want to create visually stunning websites with minimal technical knowledge.

The platform is known for its sleek, modern templates, which showcase content beautifully and are optimized for mobile devices.

As an all-in-one platform, Squarespace includes hosting, domain registration, and various tools for managing your website.

Advertisement

While Squarespace offers a high degree of customization, it may not be as flexible as some alternatives. The platform has a specific structure and set of features, which can be limiting for those who require advanced functionality or integrations.

Nonetheless, for most users, Squarespace’s built-in features and integrations are more than sufficient.

One tip for getting the most out of Squarespace is to use its built-in SEO and marketing tools, such as custom meta descriptions, alt tags, and automatic sitemaps.

Key Features:

  • Complete hosting solution (including video).
  • No software to self-install.
  • Drag-and-drop visual builder.
  • Extensions are available.
  • Optimized for mobile.
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • All-in-one website builder and platform.
  • Free version available.
  • Premade designs and templates.
  • Free subdomain available for all plans.
  • Free custom domain available with an annual subscription.
  • Customer support.

Cons:

  • There is no option to retain complete control of your site as you can with WordPress.
  • No custom coding.
  • No access to source code.
  • No third-party extensions.

3. Weebly

weebly website homepageScreenshot from: weebly.com, March 2024.

Weebly is best suited for individuals, small businesses, and entrepreneurs who want to create a simple, functional website without investing much time or money.

The platform’s drag-and-drop interface and intuitive editor make it beginner-friendly and easy to use, even for those without website-building experience.

However, it may not be as powerful or flexible as other alternatives, with a limited set of features and integrations that can be restrictive for those requiring advanced functionality or custom solutions.

Advertisement

One tip for getting the most out of Weebly is to explore its app center, which offers a range of third-party apps and integrations to extend your website’s functionality, including tools for marketing, social media, and ecommerce.

Key Features:

  • Full hosting solution.
  • No software to self-install.
  • Drag-and-drop visual builder.
  • Extensions are available.
  • Optimized for mobile.
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • All-in-one website builder and platform.
  • Free version available.
  • Premade designs and templates.
  • Free subdomain available.
  • Inexpensive premium plans are as low as $6.00 per month.
  • Free custom domain available with premium plans.
  • Customer support.

Cons:

  • No option to retain complete control of your site as you can with WordPress.
  • No access to source code.
  • The free version restricts you to a maximum of five pages.

4. Google Sites

google sites website homepageScreenshot from: workspace.google.com/intl/en_ph/lp/sites/, March 2024.

Google Sites is best suited for individuals, educators, and small businesses who need a simple, easy-to-use platform for creating basic websites or intranets. Its seamless integration with other Google tools, like Google Docs, Sheets, and Drive, makes it an excellent choice for those familiar with and heavily using these tools.

Google Sites also offers collaboration features, allowing multiple users to work on the same website simultaneously, making it ideal for team projects or class websites.

However, it’s a relatively basic website builder compared to other alternatives, with limited features and customization options. It may not be the best choice for those needing advanced functionality or design flexibility.

Additionally, it lacks built-in ecommerce features, making it less suitable for online stores.

One tip for getting the most out of Google Sites is leveraging its integration with other Google tools, such as embedding Google Docs, Sheets, or Slides into your web pages or using Google Forms to collect visitor data.

Advertisement

Key Features:

  • The creator has complete control over page access and permissions.
  • Tools can be accessed anywhere.
  • It can be used as a basic project management program.
  • Plenty of web development and deployment options.
  • Real-time editing.
  • Uses website speed optimization tools to minimize loading times.

Pros:

  • Fast to get started and easy to use.
  • Free to use.
  • Integrated with other Google products.

Cons:

  • Limited functionality compared to other website builders.
  • It may not work with non-Google apps.
  • Limited customization options.
  • No SEO tools, and you can’t edit metadata.
  • It cannot integrate Facebook pixels.

5. Jekyll

jekyll website homepageScreenshot from: jekyllrb.com, March 2024.

Jekyll is best suited for developers, bloggers, and tech-savvy individuals who prefer a lightweight, flexible website creation platform. It’s particularly popular among the GitHub community, as it can be easily integrated with GitHub Pages for free hosting.

Jekyll requires specific technical knowledge, as users must be comfortable working with the command line and writing code. While Jekyll offers plugins and themes to extend its functionality, users may need to rely on their coding skills to customize their website fully.

One tip for getting the most out of Jekyll is to utilize its built-in blogging features, which offer a simple, intuitive way to create and manage blog posts using Markdown.

Another nuance to remember is that Jekyll generates static pages that may not be the best choice for websites requiring frequent updates or complex functionality.

Key Features:

  • No programming is involved.
  • SEO is built-in.
  • GitHub manages redirects.
  • Easy setup of custom domains.

Pros:

  • No server maintenance.
  • Very fast.
  • Secure.
  • Free hosting.
  • Free SSL certificate.
  • Works with GitHub as CMS.

Cons:

  • It can’t create contact forms.
  • No dynamic content options.
  • Posts cannot be scheduled.
  • Does not include image manipulation functionality.

6. Hugo

hugo website homepageScreenshot from: gohugo.io, March 2024.

Hugo is best suited for developers, bloggers, and content creators who value speed, flexibility, and simplicity. Its lightning-fast build times and static page generation make it ideal for those who frequently update their site or publish new content regularly.

While Hugo offers themes and templates to help users get started, creating a unique design may require coding skills.

Advertisement

One tip for getting the most out of Hugo is to leverage its built-in shortcodes, which allow users to easily add complex functionality to their web pages without writing extensive code.

Another nuance to remember is that, as a static site generator, Hugo may not be the best choice for websites that require dynamic features like user authentication or real-time data updates.

Key Features:

  • Can build most websites in seconds.
  • Cross-platform with easy installation.
  • Allows you to host your site anywhere.
  • Customizable URLs.
  • “Minutes to Read” and “WordCount” functionality.
  • Integrated Google Analytics and Disqus comment support.

Pros:

  • It easily integrates with Google Calendar and other apps.
  • Easy to use with responsive customer service.
  • Multilingual capabilities are built-in.
  • Extendable as needed.

Cons:

  • It can’t create one-off tasks.
  • It can be confusing upon initial use, particularly in templating syntax.
  • No plugins are available.
  • Limited text formatting features.

7. Webflow

webflow website homepageScreenshot from: webflow.com, March 2024.

Webflow is best suited for freelance designers and small agencies who want complete control over their website’s design without worrying about hosting, security, or performance.

One nuance of Webflow is that extending a site’s functionality is not as straightforward as installing a plugin like WordPress.

Users must either set up integrations between their Webflow site and other platforms using third-party tools like Zapier, or they can embed custom code blocks on pages to add features.

A key aspect to note about Webflow is its pricing structure. Building a site is completely free, and users only need to purchase a site plan and custom domain when they are ready to launch.

Advertisement

This makes it an attractive option for freelancers and small teams who want to design and prototype sites without upfront costs, paying only when they are ready to go live.

Key Features:

  • More than 100 templates to choose from.
  • Design is prioritized, with animation, interaction, and parallax scrolling options.
  • Offers automatically generated sitemaps and customizable 301 redirects.
  • Multiple payment options for ecommerce sites and automatic tax calculation.

Pros:

  • Affordable plans range from free to $235 for top-tier ecommerce plans.
  • Free starter plan.
  • Numerous learning and help resources.
  • Good range of templates.
  • Good security.

Cons:

  • Steep learning curve.
  • Integration with social media can be frustrating.
  • Advanced capabilities aren’t built-in and require integration.

8. Joomla

joomla website homepageScreenshot from: joomla.org, March 2024.

Joomla is best suited for creating social networking, community, and membership sites. With its built-in multilingual support and advanced user and content management options, Joomla enables site owners to manage hundreds of users, create custom post types, and publish content in multiple languages.

One nuance of Joomla is that it has a steeper learning curve compared to more beginner-friendly CMSs like WordPress.

While Joomla aims to combine the power and flexibility of Drupal with the user-friendliness of WordPress, users with some web development experience will be better equipped to understand and take full advantage of Joomla’s built-in features and settings.

Users can choose from over 6,000 extensions available in the official directory to extend a Joomla site’s functionality. However, unlike WordPress plugins that can be installed with just a few clicks, Joomla extensions must be installed via the backend.

This process requires more technical know-how and may be challenging for beginners.

Advertisement

Key Features:

  • Almost 6,000 extensions are available.
  • Traditional content editing (no drag-and-drop visual editor).
  • Optimized for mobile (depending on the template).
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • Free, open-source software.
  • Premade designs and templates.
  • Access to source code.

Cons:

  • No free subdomains or custom domains are available.
  • No customer support.
  • Requires a PHP-enable server to run.
  • Fewer templates and extensions than WordPress.

9. Drupal

drupal website homepageScreenshot from: drupal.org, March 2024.

Drupal is best suited for large corporations, government agencies, and universities with dedicated teams of developers. With its extensive customization options and ability to handle large data and heavy traffic, Drupal is ideal for complex, high-performance websites.

One key nuance of Drupal is its steep learning curve. Drupal is designed for developers or users with a strong understanding of HTML, CSS, and PHP.

Customizing a Drupal site involves working with numerous modules and themes, which can be highly configurable but require technical expertise to set up and maintain.

For organizations with the necessary technical resources, Drupal’s flexibility and robustness make it a top choice for building highly secure and customized websites.

Key Features:

  • Content management system (CMS).
  • Over 47,000 modules are available.
  • Traditional content editing (no drag-and-drop visual editor).
  • Optimized for mobile (depending on the theme you choose).
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • Free, open-source software.
  • Premade designs and templates.
  • Access to source code.
  • Strong security and data encryption.

Cons:

  • No free subdomains.
  • No customer support.
  • Requires a PHP-enabled server to run.

10. DataLife Engine

datalifengine website homepageScreenshot from: dle-news.com, March 2024.

DataLife Engine (DLE) is best suited for media companies, news websites, and blogs, prioritizing SEO, security, and performance. The platform’s focus on handling high traffic levels with minimal server load makes it an attractive choice for websites that expect significant visitors and must ensure a smooth user experience.

DLE’s user-friendly interface and content management features suit organizations with multiple users involved in creating and publishing well-suited content.

Advertisement

The platform’s ability to track statistics and automatically filter words in comments can benefit media websites and blogs that need to moderate user-generated content and analyze audience engagement.

However, there are some nuances to consider when using DLE. The limited number of plugins and themes may restrict how much websites can customize their appearance and functionality compared to other CMSs like WordPress.

It’s also important to note that while DLE supports English users, they are considered a secondary focus.

Key Features:

  • Content management system (CMS).
  • Designed for multiple users.
  • SEO-focused.
  • Tracks statistics.
  • Automatically filters words in comments.
  • It supports an unlimited number of categories.
  • Low server load.
  • Allows plugins.

Pros:

  • Stores data using MySQL.
  • Excellent user experience
  • Websites load quickly, even on low-end servers.
  • Excellent for publishing news and blog posts.

Cons:

  • No free version licenses vary from $79 for basic to $199 for unlimited.
  • English users are a secondary focus.
  • A limited number of plugins and themes.
  • The lowest license doesn’t include customer support.

11. Sitefinity

sitefinity website homepageScreenshot from: progress.com/sitefinity-cms/, March 2024.

Progress’ Sitefinity is best suited for organizations that manage multiple websites, brands, or marketing channels from a single platform.

The CMS’s ability to sync assets across pages and sites makes it an attractive choice for companies with a diverse online presence. It streamlines content management and ensures consistency across various touchpoints.

One notable advantage of Sitefinity is its low-cost license compared to other CMS options, which may make it an attractive choice for budget-conscious organizations.

Advertisement

Additionally, the minimal coding required for integration and the flexible deployment time can help businesses reduce development costs and bring their websites to market faster.

However, Sitefinity setup and administration can be challenging. Organizations may need to invest time and resources into training their teams or hiring experienced professionals to ensure a smooth implementation and ongoing management of the platform.

Key Features:

  • Manage multiple sites from one location.
  • Sync assets across pages and sites.
  • It makes personalization simpler.
  • Integrated analytics and optimization.
  • Four versions include basic, marketing-focused, PaaS, and ecommerce.
  • Multilingual capabilities.

Pros:

  • Low-cost license compared to other CMS.
  • No setup fee.
  • Minimal coding is required for integration.
  • Flexible deployment time shortens time to market.
  • Options for marketing automation.

Cons:

  • Free trial, but no free version.
  • Setup and administration can be challenging.
  • No mobile interface.

12. CMS Hub

CMS hub website homepageScreenshot from: hubspot.com/products/cms, March 2024.

CMS Hub, previously known as Hubspot CMS, is best suited for businesses that already use HubSpot’s marketing, sales, or service tools and want to integrate their website with their existing HubSpot ecosystem.

It combines the ease of use of a drag-and-drop website builder with the flexibility and performance of a CMS.

CMS Hub seamlessly integrates with HubSpot’s CRM platform, allowing businesses to create personalized content experiences, optimize marketing efforts, and streamline sales processes.

It also focuses on security and performance, with HubSpot handling website hosting, SSL certification, and CDN configuration.

Advertisement

However, while CMS Hub offers customization options, it may not be as flexible as other CMSs like WordPress or Drupal, potentially limiting businesses with particular design or functionality requirements.

Additionally, CMS Hub’s pricing model can be expensive compared to other CMS options, so companies must carefully consider their budget and weigh the benefits of its all-in-one approach.

Key Features:

  • Cloud-based.
  • Includes SEO recommendations.
  • Includes numerous themes and responsive templates.
  • Fully integrated CRM.
  • Drag-and-drop webpage editor.
  • Built-in security.

Pros:

  • Adaptive A/B testing helps you identify the best page layout.
  • All-in-one publishing tools.
  • Built-in SEO tools.
  • Supports smart content with personalized rules.
  • Mobile pages supported with Google AMP.

Cons:

  • Does not support ecommerce.
  • No automatic backup and recovery.

13. Contentful

1713545870 70 25 Wordpress Alternatives Best For SEOScreenshot from: contentful.com, March 2024.

Contentful is a headless CMS best suited for businesses and developers requiring a flexible, scalable, and customizable content management solution.

It’s particularly well-suited for organizations delivering content across multiple channels, such as websites, mobile apps, and IoT devices.

One of Contentful’s key advantages is its content modeling capabilities. The platform allows users to create custom content models that can be easily adapted to their needs.

When using Contentful, it’s important to remember that it’s a headless CMS that focuses on content management and delivery rather than providing a built-in front end or presentation layer.

Advertisement

Developers must build a front end using their preferred tools and frameworks and then integrate with Contentful’s API to retrieve and display the content.

Another aspect to consider is the learning curve associated with Contentful. While the platform is designed to be user-friendly, it may take some time for content editors and managers to become familiar with its interface and content modeling concepts.

Features:

  • RESTful API gives you complete control over assets, translations, and versions.
  • Customizable interface and framework that works across third-party component providers.
  • It provides regional autonomy, so pieces in multiple languages and time zones can be published globally.
  • Content modeling allows you to structure content by channel.
  • Single sign-on and secure access.

Pros:

  • Focus on integration simplifies the technology stack.
  • User-friendly with a clean interface.
  • Free version for up to five users.
  • Good scalability.

Cons:

  • Expensive for an upgraded version ($489/month).
  • Poor internal search tools.
  • Modeling content can be tricky.

14. Adobe Experience Manager

adobe experience manager homepageScreenshot from: business.adobe.com/products/experience-manager/adobe-experience-manager.html, March 2024

Adobe Experience Manager (AEM) is an enterprise-level CMS best suited for large organizations with complex content management needs and a significant budget. AEM’s target audience includes global brands, financial institutions, healthcare providers, and government agencies.

One of AEM’s key strengths is its tight integration with other Adobe products, such as Analytics, Target, and Creative Cloud.

This integration allows organizations to leverage the full power of Adobe’s digital marketing suite, enabling them to create, manage, and optimize content and experiences across the entire customer journey.

AEM also offers advanced features like intelligent content delivery, AI-powered content tagging, and multi-site management, making it a comprehensive solution for enterprise content management.

Advertisement

The platform’s complexity and extensive feature set can overwhelm smaller organizations or teams with limited resources. Additionally, AEM’s licensing and implementation costs are among the highest in the market, making it a significant investment for any organization.

Key Features:

  • Comprehensive marketing platform.
  • End-to-end digital document solution.
  • Enterprise-level security.
  • Analytics included.
  • Intelligent search.
  • Scalable to your needs.

Pros:

  • Streamlines workflows by keeping everything on one platform.
  • Individual marketers can handle authoring and publishing.
  • Easy authorization of workflow.
  • Can handle massive content loads.
  • Can manage multiple sites at once.

Cons:

  • Steep learning curve.
  • Requires different sign-ins to access other areas.
  • Doesn’t integrate well with external DAMs.
  • Not ideal for communities and forums.

Ecommerce Platforms

15. BigCommerce

big commerce website homepageScreenshot from: bigcommerce.com, March 2024.

BigCommerce is a hosted ecommerce platform best suited for businesses of all sizes looking to create and manage an online store. It caters to many users, from small and medium-sized businesses to large enterprises.

One of BigCommerce’s key advantages is its scalability. The platform accommodates businesses as they grow, offering features like unlimited products, file storage, and bandwidth.

BigCommerce also provides a range of advanced ecommerce functionalities, such as multi-channel selling, abandoned cart recovery, and built-in SEO tools, which can help businesses optimize their online sales performance.

When considering BigCommerce, it’s important to remember that while the platform offers a wide range of features, some of the more advanced functionalities may require a higher-tier plan or additional costs.

BigCommerce also enforces certain design limitations on its themes, which may restrict the level of customization available without diving into custom coding.

Advertisement

Key Features:

  • Full hosting solution.
  • No software to self-install.
  • Drag-and-drop visual builder.
  • Extensions are available.
  • Optimized for mobile.
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • High level of customization options.
  • Over 100 themes to choose from (including some free).
  • No platform commission fees.
  • Free subdomain available.
  • Customer support.

Cons:

  • No free version is available.
  • No access to source code.
  • Pricing is based on revenue, which isn’t great if you have tight margins.

16. Shopify

shopify website homepageScreenshot from: shopify.com, March 2024.

Shopify is an ecommerce platform well-suited for entrepreneurs and small business owners who want to establish an online presence quickly without extensive technical expertise.

Shopify’s extensive app marketplace allows businesses to extend its functionality and customize their online store to meet specific needs.

The platform also provides built-in features like inventory management, payment processing, and abandoned cart recovery to help streamline operations and optimize sales performance.

When using Shopify, consider the platform’s transaction fees, which vary depending on the payment gateway. Some advanced design changes may require HTML, CSS, and Liquid knowledge.

Despite these considerations, Shopify remains a top choice for businesses seeking a reliable, scalable, and feature-rich ecommerce platform.

Key Features:

Advertisement
  • Full hosting solution.
  • No software to self-install.
  • Drag-and-drop visual builder.
  • Extensions are available.
  • Optimized for mobile.
  • Blog module.
  • Ecommerce store.
  • Social media tools.

Pros:

  • All-in-one website builder and platform.
  • Premade designs and templates.
  • Free subdomain available.
  • Customer support.

Cons:

  • No free version is available.
  • No access to source code.
  • Platform commission fees.

17. Magento

megento website homepageScreenshot from: business.adobe.com/products/magento/magento-commerce.html, March 2024

Magento is an open-source ecommerce platform best suited for medium to large enterprises with complex online selling needs.

Magento’s flexibility and scalability come with a steeper learning curve than other ecommerce platforms. It requires more technical expertise to set up, customize, and maintain, making it less suitable for small businesses or users without web development knowledge.

When using Magento, remember that its powerful features and customization options can impact website performance if not optimized properly.

Choosing a reliable hosting provider and working with experienced Magento developers is crucial for ensuring optimal store performance and security.

Key Features:

  • Option to pay for Magento Commerce for a complete hosting platform or download the free, open-source software to install on your web server.
  • Drag-and-drop visual builder.
  • Extensions are available.
  • Optimized for mobile.
  • Ecommerce store.
  • Social media tools.

Pros:

  • All-in-one ecommerce platform or open-source ecommerce software package.
  • Free version available.
  • Designed for large-scale ecommerce.
  • Premade designs and templates.
  • Free subdomain available (mainly for setup and testing purposes).
  • Customer support (paid version only).
  • Access to source code with the downloadable version.

Cons:

  • No blog module, although you can add it as an extension.
  • Not optimized for web projects or website purposes outside of ecommerce.
  • The steep learning curve for inexperienced users.
  • A large investment for small-scale ecommerce.

18. PrestaShop

prestashop website homepageScreenshot from: prestashop.com, March 2024.

PrestaShop is an open-source ecommerce platform best suited for small to medium-sized businesses looking for a cost-effective and customizable solution.

PrestaShop offers a wide range of themes and modules, allowing businesses to customize their online store’s appearance and functionality. The platform also has a strong community of developers and users, providing support and resources for store owners.

While PrestaShop is generally easy to use, some advanced customizations may require coding knowledge. Additionally, the number of installed modules can impact the platform’s performance.

Advertisement

Key Features:

  • Customizable to your needs, including themes and features.
  • Includes backend tools like payments, shipping and data.
  • Community of translators for multilanguage digital stores.
  • Secure payment modules.
  • Scalable.
  • Includes demographic assistance.

Pros:

  • Free version available.
  • Open source so that you can customize your site to your needs.
  • 5,000+ themes, modules, and services are available with the premium plan.
  • Excellent user experience.

Cons:

  • Limited scalability.
  • No support team.
  • Initial setup requires some programming knowledge.

19. OpenCart

opencart website homepageScreenshot from: opencart.com, March 2024.

OpenCart is a free, open-source ecommerce platform best suited for small—to medium-sized businesses with limited budgets.

OpenCart offers a decent range of features and extensions, allowing businesses to customize their online store. However, its marketplace is smaller than other platforms, which may limit advanced functionality options.

When using OpenCart, be mindful of its performance limitations as the store grows. Optimizing and carefully selecting extensions may be required to maintain a smooth user experience.

Additionally, its simplicity may not be suitable for businesses with complex ecommerce needs.

Features:

  • The administrator dashboard gives you information at a glance.
  • User management allows you to assign permissions and separate access.
  • Allows you to run multiple stores from one dashboard.
  • Customizable variables let you include options for sizes, colors, or anything else.

Pros:

  • The platform is entirely free, as are many add-ons.
  • Extensive metrics and reports were provided.
  • Works with your current payment gateway.
  • Comes with dedicated technical support.
  • Flexible.

Cons:

  • Often creates duplicate pages, which can cause SEO problems.
  • Not all extensions, modules, plugins, and add-ons work well together.
  • Checkout can be slow, particularly if you have numerous plugins.
  • Can be difficult to import a list of inventory.
  • Requires some degree of technical ability for optimal use.

Blogging Platforms

20. Medium

medium website homepageScreenshot from: medium.com, March 2024.

Medium is a publishing platform best suited for individual bloggers, writers, and content creators who want to share their ideas and stories with a built-in audience.

Medium’s clean and minimalistic interface allows readers to concentrate on the content. The platform also offers a built-in social network, making it easy for writers to connect with readers and other creators.

Advertisement

However, this simplicity comes with limited customization options for branding and design.

When using Medium, it’s important to understand that the platform controls the distribution and monetization of content. While this can lead to increased exposure, it also means less control over the presentation and ownership of your content compared to self-hosted solutions.

Key Features:

  • Full hosting solution.
  • No software to self-install.
  • Optimized for mobile.
  • Blog module.
  • Limited social media tools.

Pros:

  • A community site for blogs.
  • Free version available.
  • Medium Partner Program to earn revenue.
  • Customer support.

Cons:

  • No extensions.
  • No ecommerce stores.
  • No premade designs or themes.
  • No free subdomains.
  • No third-party extensions.
  • No access to source code.

21. Ghost

ghost website homepageScreenshot from: ghost.org, March 2024.

Ghost is a lightweight, open-source publishing platform best suited for bloggers, writers, and small publications who value simplicity and performance. It’s designed for users who want a clean, focused writing experience without the complexity of more feature-rich CMS platforms.

Ghost offers a simple, intuitive editor and a minimalistic default theme, allowing users to create and publish content quickly.

The platform also provides built-in SEO features and supports memberships and subscriptions, making it a good choice for content creators looking to monetize their work.

As Ghost primarily focuses on publishing, it may not be the best fit for users who require extensive customization options or advanced functionality beyond blogging.

Advertisement

Key Features:

  • You can subscribe through Ghost’s hosting platform or download the free, open-source software to install on your web server.
  • Basic drag-and-drop visual builder.
  • Extensions are available through integrations with other tools.
  • Optimized for mobile.
  • Blog module.
  • Ecommerce store (subscription only).
  • Social media tools.

Pros:

  • All-in-one website builder and platform.
  • Free version available.
  • Premade designs and templates.
  • Free subdomain available with the paid version.
  • Customer support.
  • Access to source code.

Cons:

  • Not compatible with all third-party web hosts.
  • Highly specialized with limited capabilities beyond blogging.
  • Not built to scale up into a business site or complex website.

22. Tumblr

tumblr website homepageScreenshot from: tumblr.com, March 2024.

Tumblr is a microblogging and social networking platform best suited for younger audiences who enjoy sharing short-form multimedia content.

Tumblr’s emphasis on community and content discovery makes it easy for users to connect with others who share similar interests. The platform’s reblogging feature spreads content quickly, increasing visibility and engagement.

When using Tumblr, it’s important to understand the platform’s unique culture and demographics. Tumblr is known for its diverse, often niche communities, which can be both a strength and a challenge for brands and marketers.

Additionally, while Tumblr offers some customization options, it may not be the best choice for users who require a highly professional or branded online presence.

Key Features:

  • Features strong social media functionality.
  • Customizable.
  • Google Analytics Integration.
  • Unlimited storage.
  • Ad-free blog themes.
  • Free SSL certification.

Pros:

  • Free to use; no upgrades are required to access all features.
  • Free web hosting.
  • User-friendly and easy to set up.
  • No storage limits.
  • Can post audio, video, images, gifs, and more.

Cons:

  • Daily posting limit (250/day).
  • Files must be under 10 MB.
  • No plugins.
  • Safety and security leave something to be desired.
  • Unsuited to long-form content.

23. Bluehost

bluehost website homepageScreenshot from: bluehost.com, March 2024.

Bluehost is a web hosting provider best suited for beginners and small businesses looking for an affordable, easy-to-use hosting solution.

Bluehost’s advantages are its user-friendly interface and one-click installations for popular CMS platforms like WordPress.

Advertisement

This makes it easy for users with limited technical knowledge to set up and manage their websites. Bluehost also provides 24/7 customer support and a free SSL certificate with each hosting plan.

While Bluehost is known for its reliability and performance, it may not be the best choice for websites with high traffic or complex requirements. Some users have reported issues with slow loading speeds and limited storage space on shared hosting plans.

Key Features:

  • Domain names can be purchased through Bluehost.
  • Versatile hosting options let you choose what works best for you.
  • Dedicated servers and virtual private servers are available.
  • A variety of plans are available based on your needs.
  • Comes with customer service chat options.

Pros:

  • The first term is inexpensive.
  • Lots of storage and unlimited bandwidth.
  • Good uptime.
  • Free SSL certificates.

Cons:

  • Extra features come with added costs, which can get pricey.
  • High renewal rates.
  • Speed could be better.
  • All servers are U.S.-based.

24. Blogger

blogger website homepageScreenshot from: blogger.com, March 2024.

Blogger is a free, beginner-friendly blogging platform best suited for hobbyists, casual bloggers, and those who want to start a blog without investing in a self-hosted solution. It’s ideal for users who prioritize simplicity and ease of use over advanced customization options.

Blogger offers a straightforward, intuitive interface that makes it easy for users to create and publish blog posts.

The platform provides a selection of customizable templates and allows users to add gadgets and widgets to enhance their blog’s functionality. However, unlike other blogging platforms, Blogger’s design and customization options are relatively limited.

Blogger’s simplicity and lack of advanced features may make it unsuitable for professional bloggers or those looking to create a more sophisticated online presence.

Advertisement

Features:

  • Clear analytics.
  • Included layout/themes.
  • Monetization options, including Google Adsense integration.
  • Uses Google security.
  • Unlimited storage.

Pros:

  • Free to use.
  • Extremely user-friendly.
  • Free SSL security.
  • Good uptime.

Cons:

  • You don’t own your website.
  • Fewer options and control over design.
  • Limited support.
  • Hard to port to a different platform.

Community Management

25. vBulletin

 

vBulletin is a proprietary forum software best suited for businesses, organizations, and communities looking to create and manage an online discussion platform.

vBulletin offers many features, including private messaging, user groups, and content management tools, making it a powerful solution for managing large, active communities.

The platform also provides a high level of customization, allowing administrators to tailor the look and feel of their forum to match their brand or website.

One of the primary considerations when using vBulletin is its licensing cost, which can be a significant investment for some users.

Additionally, while vBulletin offers a range of customization options, some technical knowledge may be required to optimize and maintain the platform entirely.

Advertisement

Key Features:

  • Built-in SEO and security.
  • Includes a chat app.
  • Easy to get started.
  • Built-in applications.
  • Optimized for mobile users.
  • Blogging functionality.
  • Fully customizable.

Pros:

  • Frequent patches and bug fixes.
  • Customer support.
  • Easy to install and get started.
  • Designed to host forums.
  • Includes templates.

Cons:

  • No free option.
  • Limited features compared to some other platforms.
  • Requires some tech skills to take full advantage of the functionality.
  • It can’t customize code for the cloud-based version.

Which One Is Right For You?

With so many options, determining the right alternative to WordPress depends on your specific needs and goals.

For individuals and small businesses seeking an easy-to-use website builder, Wix, Squarespace, or Weebly offer intuitive drag-and-drop interfaces. Those prioritizing simplicity and speed may prefer static site generators like Jekyll or Hugo.

Developers and tech-savvy users who value flexibility and customization can explore headless CMS options like Contentful or more robust open-source platforms like Joomla and Drupal.

Ecommerce merchants must evaluate features like inventory management, payment processing, and scalability when choosing between Shopify, BigCommerce, WooCommerce, and others.

No matter your requirements, there is likely a WordPress alternative that is well-suited to your needs. Thoroughly assessing your website goals, budget, and technical abilities will help you select the right platform to build your ideal online presence.

With some research and planning, you can find the perfect alternative to take your website beyond what WordPress offers.

Advertisement

More Resources:


Featured Image: GaudiLab/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

2024 WordPress Vulnerability Report Shows Errors Sites Keep Making

Published

on

By

2024 Annual WordPress security report by WPScan

WordPress security scanner WPScan’s 2024 WordPress vulnerability report calls attention to WordPress vulnerability trends and suggests the kinds of things website publishers (and SEOs) should be looking out for.

Some of the key findings from the report were that just over 20% of vulnerabilities were rated as high or critical level threats, with medium severity threats, at 67% of reported vulnerabilities, making up the majority. Many regard medium level vulnerabilities as if they are low-level threats and that’s a mistake because they’re not low level and should be regarded as deserving attention.

The WPScan report advised:

“While severity doesn’t translate directly to the risk of exploitation, it’s an important guideline for website owners to make an educated decision about when to disable or update the extension.”

WordPress Vulnerability Severity Distribution

Critical level vulnerabilities, the highest level of threat, represented only 2.38% of vulnerabilities, which is essentially good news for WordPress publishers. Yet as mentioned earlier, when combined with the percentages of high level threats (17.68%) the number or concerning vulnerabilities rises to almost 20%.

Here are the percentages by severity ratings:

Advertisement
  • Critical 2.38%
  • Low 12.83%
  • High 17.68%
  • Medium 67.12%

Authenticated Versus Unauthenticated

Authenticated vulnerabilities are those that require an attacker to first attain user credentials and their accompanying permission levels in order to exploit a particular vulnerability. Exploits that require subscriber-level authentication are the most exploitable of the authenticated exploits and those that require administrator level access present the least risk (although not always a low risk for a variety of reasons).

Unauthenticated attacks are generally the easiest to exploit because anyone can launch an attack without having to first acquire a user credential.

The WPScan vulnerability report found that about 22% of reported vulnerabilities required subscriber level or no authentication at all, representing the most exploitable vulnerabilities. On the other end of the scale of the exploitability are vulnerabilities requiring admin permission levels representing a total of 30.71% of reported vulnerabilities.

Permission Levels Required For Exploits

Vulnerabilities requiring administrator level credentials represented the highest percentage of exploits, followed by Cross Site Request Forgery (CSRF) with 24.74% of vulnerabilities. This is interesting because CSRF is an attack that uses social engineering to get a victim to click a link from which the user’s permission levels are acquired. This is a mistake that WordPress publishers should be aware of because all it takes is for an admin level user to follow a link which then enables the hacker to assume admin level privileges to the WordPress website.

The following is the percentages of exploits ordered by roles necessary to launch an attack.

Ascending Order Of User Roles For Vulnerabilities

  • Author 2.19%
  • Subscriber 10.4%
  • Unauthenticated 12.35%
  • Contributor 19.62%
  • CSRF 24.74%
  • Admin 30.71%

Most Common Vulnerability Types Requiring Minimal Authentication

Broken Access Control in the context of WordPress refers to a security failure that can allow an attacker without necessary permission credentials to gain access to higher credential permissions.

In the section of the report that looks at the occurrences and vulnerabilities underlying unauthenticated or subscriber level vulnerabilities reported (Occurrence vs Vulnerability on Unauthenticated or Subscriber+ reports), WPScan breaks down the percentages for each vulnerability type that is most common for exploits that are the easiest to launch (because they require minimal to no user credential authentication).

Advertisement

The WPScan threat report noted that Broken Access Control represents a whopping 84.99% followed by SQL injection (20.64%).

The Open Worldwide Application Security Project (OWASP) defines Broken Access Control as:

“Access control, sometimes called authorization, is how a web application grants access to content and functions to some users and not others. These checks are performed after authentication, and govern what ‘authorized’ users are allowed to do.

Access control sounds like a simple problem but is insidiously difficult to implement correctly. A web application’s access control model is closely tied to the content and functions that the site provides. In addition, the users may fall into a number of groups or roles with different abilities or privileges.”

SQL injection, at 20.64% represents the second most prevalent type of vulnerability, which WPScan referred to as both “high severity and risk” in the context of vulnerabilities requiring minimal authentication levels because attackers can access and/or tamper with the database which is the heart of every WordPress website.

These are the percentages:

  • Broken Access Control 84.99%
  • SQL Injection 20.64%
  • Cross-Site Scripting 9.4%
  • Unauthenticated Arbitrary File Upload 5.28%
  • Sensitive Data Disclosure 4.59%
  • Insecure Direct Object Reference (IDOR) 3.67%
  • Remote Code Execution 2.52%
  • Other 14.45%

Vulnerabilities In The WordPress Core Itself

The overwhelming majority of vulnerability issues were reported in third-party plugins and themes. However, there were in 2023 a total of 13 vulnerabilities reported in the WordPress core itself. Out of the thirteen vulnerabilities only one of them was rated as a high severity threat, which is the second highest level, with Critical being the highest level vulnerability threat, a rating scoring system maintained by the Common Vulnerability Scoring System (CVSS).

The WordPress core platform itself is held to the highest standards and benefits from a worldwide community that is vigilant in discovering and patching vulnerabilities.

Advertisement

Website Security Should Be Considered As Technical SEO

Site audits don’t normally cover website security but in my opinion every responsible audit should at least talk about security headers. As I’ve been saying for years, website security quickly becomes an SEO issue once a website’s ranking start disappearing from the search engine results pages (SERPs) due to being compromised by a vulnerability. That’s why it’s critical to be proactive about website security.

According to the WPScan report, the main point of entry for hacked websites were leaked credentials and weak passwords. Ensuring strong password standards plus two-factor authentication is an important part of every website’s security stance.

Using security headers is another way to help protect against Cross-Site Scripting and other kinds of vulnerabilities.

Lastly, a WordPress firewall and website hardening are also useful proactive approaches to website security. I once added a forum to a brand new website I created and it was immediately under attack within minutes. Believe it or not, virtually every website worldwide is under attack 24 hours a day by bots scanning for vulnerabilities.

Read the WPScan Report:

WPScan 2024 Website Threat Report

Advertisement

Featured Image by Shutterstock/Ljupco Smokovski

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS