Connect with us

SEO

Visualizing Hot Topics Using Python To Analyze News Sitemaps

Published

on

Visualizing Hot Topics Using Python To Analyze News Sitemaps

News sitemaps use different and unique sitemap protocols to provide more information for the news search engines.

A news sitemap contains the news published in the last 48 hours.

News sitemap tags include the news publication’s title, language, name, genre, publication date, keywords, and even stock tickers.

How can you use these sitemaps to your advantage for content research and competitive analysis?

In this Python tutorial, you’ll learn a 10-step process for analyzing news sitemaps and visualizing topical trends discovered therein.

Advertisement

Housekeeping Notes To Get Us Started

This tutorial was written during Russia’s invasion of Ukraine.

Using machine learning, we can even label news sources and articles according to which news source is “objective” and which news source is “sarcastic.”

But to keep things simple, we will focus on topics with frequency analysis.

We will use more than 10 global news sources across the U.S. and U.K.

Note: We would like to include Russian news sources, but they do not have a proper news sitemap. Even if they had, they block the external requests.

Comparing the word occurrence of “invasion” and “liberation” from Western and Eastern news sources shows the benefit of distributional frequency text analysis methods.

What You Need To Analyze News Content With Python

The related Python libraries for auditing a news sitemap to understand the news source’s content strategy are listed below:

Advertisement
  • Advertools.
  • Pandas.
  • Plotly Express, Subplots, and Graph Objects.
  • Re (Regex).
  • String.
  • NLTK (Corpus, Stopwords, Ngrams).
  • Unicodedata.
  • Matplotlib.
  • Basic Python Syntax Understanding.

10 Steps For News Sitemap Analysis With Python

All set up? Let’s get to it.

1. Take The News URLs From News Sitemap

We chose the “The Guardian,” “New York Times,” “Washington Post,” “Daily Mail,” “Sky News,” “BBC,” and “CNN” to examine the News URLs from the News Sitemaps.

df_guardian = adv.sitemap_to_df("http://www.theguardian.com/sitemaps/news.xml")
df_nyt = adv.sitemap_to_df("https://www.nytimes.com/sitemaps/new/news.xml.gz")
df_wp = adv.sitemap_to_df("https://www.washingtonpost.com/arcio/news-sitemap/")
df_bbc = adv.sitemap_to_df("https://www.bbc.com/sitemaps/https-index-com-news.xml")
df_dailymail = adv.sitemap_to_df("https://www.dailymail.co.uk/google-news-sitemap.xml")
df_skynews = adv.sitemap_to_df("https://news.sky.com/sitemap-index.xml")
df_cnn = adv.sitemap_to_df("https://edition.cnn.com/sitemaps/cnn/news.xml")

2. Examine An Example News Sitemap With Python

I have used BBC as an example to demonstrate what we just extracted from these news sitemaps.

df_bbc
News Sitemap Data Frame View

The BBC Sitemap has the columns below.

df_bbc.columns
News Sitemap TagsNews Sitemap Tags as data frame columns

The general data structures of these columns are below.

df_bbc.info()
News Sitemap as a DataframeNews Sitemap Columns and Data types

The BBC doesn’t use the “news_publication” column and others.

3. Find The Most Used Words In URLs From News Publications

To see the most used words in the news sites’ URLs, we need to use “str,” “explode”, and “split” methods.

df_dailymail["loc"].str.split("/").str[5].str.split("-").explode().value_counts().to_frame()
loc
article
176
Russian
50
Ukraine
50
says
38
reveals
38
...
...
readers
1
Red
1
Cross
1
provide
1
weekend.html
1
5445 rows × 1 column

We see that for the “Daily Mail,” “Russia and Ukraine” are the main topic.

4. Find The Most Used Language In News Publications

The URL structure or the “language” section of the news publication can be used to see the most used languages in news publications.

In this sample, we used “BBC” to see their language prioritization.

Advertisement
df_bbc["publication_language"].head(20).value_counts().to_frame()
publication_language
en
698
fa
52
sr
52
ar
47
mr
43
hi
43
gu
41
ur
35
pt
33
te
31
ta
31
cy
30
ha
29
tr
28
es
25
sw
22
cpe
22
ne
21
pa
21
yo
20
20 rows × 1 column

To reach out to the Russian population via Google News, every western news source should use the Russian language.

Some international news institutions started to perform this perspective.

If you are a news SEO, it’s helpful to watch Russian language publications from competitors to distribute the objective news to Russia and compete within the news industry.

5. Audit The News Titles For Frequency Of Words

We used BBC to see the “news titles” and which words are more frequent.

df_bbc["news_title"].str.split(" ").explode().value_counts().to_frame()
news_title
to
232
in
181
-
141
of
140
for
138
...
...
ፊልም
1
ብላክ
1
ባንኪ
1
ጕሒላ
1
niile
1
11916 rows × 1 columns

The problem here is that we have “every type of word in the news titles,” such as “contextless stop words.”

We need to clean these types of non-categorical terms to understand their focus better.

Advertisement
from nltk.corpus import stopwords
stop = stopwords.words('english')
df_bbc_news_title_most_used_words = df_bbc["news_title"].str.split(" ").explode().value_counts().to_frame()
pat = r'b(?:{})b'.format('|'.join(stop))
df_bbc_news_title_most_used_words.reset_index(drop=True, inplace=True)
df_bbc_news_title_most_used_words["without_stop_words"] = df_bbc_news_title_most_used_words["words"].str.replace(pat,"")
df_bbc_news_title_most_used_words.drop(df_bbc_news_title_most_used_words.loc[df_bbc_news_title_most_used_words["without_stop_words"]==""].index, inplace=True)
df_bbc_news_title_most_used_words
Removing Stop Words from Text AnalysisThe “without_stop_words” column involves the cleaned text values.

We have removed most of the stop words with the help of the “regex” and “replace” method of Pandas.

The second concern is removing the “punctuations.”

For that, we will use the “string” module of Python.

import string
df_bbc_news_title_most_used_words["without_stop_word_and_punctation"] = df_bbc_news_title_most_used_words['without_stop_words'].str.replace('[{}]'.format(string.punctuation), '')
df_bbc_news_title_most_used_words.drop(df_bbc_news_title_most_used_words.loc[df_bbc_news_title_most_used_words["without_stop_word_and_punctation"]==""].index, inplace=True)
df_bbc_news_title_most_used_words.drop(["without_stop_words", "words"], axis=1, inplace=True)
df_bbc_news_title_most_used_words
news_title
without_stop_word_and_punctation
Ukraine
110
Ukraine
v
83
v
de
61
de
Ukraine:
60
Ukraine
da
51
da
...
...
...
ፊልም
1
ፊልም
ብላክ
1
ብላክ
ባንኪ
1
ባንኪ
ጕሒላ
1
ጕሒላ
niile
1
niile
11767 rows × 2 columns

Or, use “df_bbc_news_title_most_used_words[“news_title”].to_frame()” to take a more clear picture of data.

news_title
Ukraine
110
v
83
de
61
Ukraine:
60
da
51
...
...
ፊልም
1
ብላክ
1
ባንኪ
1
ጕሒላ
1
niile
1
11767 rows × 1 columns

We see 11,767 unique words in the URLs of the BBC, and Ukraine is the most popular, with 110 occurrences.

There are different Ukraine-related phrases from the data frame, such as “Ukraine:.”

The “NLTK Tokenize” can be used to unite these types of different variations.

Advertisement

The next section will use a different method to unite them.

Note: If you want to make things easier, use Advertools as below.

adv.word_frequency(df_bbc["news_title"],phrase_len=2, rm_words=adv.stopwords.keys())

The result is below.

Text Analysis and WordText Analysis with Advertools

“adv.word_frequency” has the attributes “phrase_len” and “rm_words” to determine the length of the phrase occurrence and remove the stop words.

You may tell me, why didn’t I use it in the first place?

I wanted to show you an educational example with “regex, NLTK, and the string” so that you can understand what’s happening behind the scenes.

6. Visualize The Most Used Words In News Titles

To visualize the most used words in the news titles, you can use the code block below.

Advertisement
df_bbc_news_title_most_used_words["news_title"] = df_bbc_news_title_most_used_words["news_title"].astype(int)
df_bbc_news_title_most_used_words["without_stop_word_and_punctation"] = df_bbc_news_title_most_used_words["without_stop_word_and_punctation"].astype(str)
df_bbc_news_title_most_used_words.index = df_bbc_news_title_most_used_words["without_stop_word_and_punctation"]
df_bbc_news_title_most_used_words["news_title"].head(20).plot(title="The Most Used Words in BBC News Titles")
News Sitemap Python AnalysisNews NGrams Visualization

You realize that there is a “broken line.”

Do you remember the “Ukraine” and “Ukraine:” in the data frame?

When we remove the “punctuation,” the second and first values become the same.

That’s why the line graph says that Ukraine appeared 60 times and 110 times separately.

To prevent such a data discrepancy, use the code block below.

df_bbc_news_title_most_used_words_1 = df_bbc_news_title_most_used_words.drop_duplicates().groupby('without_stop_word_and_punctation', sort=False, as_index=True).sum()
df_bbc_news_title_most_used_words_1
news_title
without_stop_word_and_punctation
Ukraine
175
v
83
de
61
da
51
и
41
...
...
ፊልም
1
ብላክ
1
ባንኪ
1
ጕሒላ
1
niile
1
11109 rows × 1 columns

The duplicated rows are dropped, and their values are summed together.

Now, let’s visualize it again.

Advertisement

7. Extract Most Popular N-Grams From News Titles

Extracting n-grams from the news titles or normalizing the URL words and forming n-grams for understanding the overall topicality is useful to understand which news publication approaches which topic. Here’s how.

import nltk
import unicodedata
import re
def text_clean(content):
  lemmetizer = nltk.stem.WordNetLemmatizer()

  stopwords = nltk.corpus.stopwords.words('english')

  content = (unicodedata.normalize('NFKD', content)

    .encode('ascii', 'ignore')

    .decode('utf-8', 'ignore')

    .lower())

  words = re.sub(r'[^ws]', '', content).split()

  return [lemmetizer.lemmatize(word) for word in words if word not in stopwords]
raw_words = text_clean(''.join(str(df_bbc['news_title'].tolist())))
raw_words[:10]
OUTPUT>>>
['oneminute', 'world', 'news', 'best', 'generation', 'make', 'agyarkos', 'dream', 'fight', 'card']

The output shows we have “lemmatized” all the words in the news titles and put them in a list.

The list comprehension provides a quick shortcut for filtering every stop word easily.

Using “nltk.corpus.stopwords.words(“english”)” provides all the stop words in English.

But you can add extra stop words to the list to expand the exclusion of words.

The “unicodedata” is to canonicalize the characters.

Advertisement

The characters that we see are actually Unicode bytes like “U+2160 ROMAN NUMERAL ONE” and the Roman Character “U+0049 LATIN CAPITAL LETTER I” are actually the same.

The “unicodedata.normalize” distinguishes the character differences so that the lemmatizer can differentiate the different words with similar characters from each other.

pd.set_option("display.max_colwidth",90)

bbc_bigrams = (pd.Series(ngrams(words, n = 2)).value_counts())[:15].sort_values(ascending=False).to_frame()

bbc_trigrams = (pd.Series(ngrams(words, n = 3)).value_counts())[:15].sort_values(ascending=False).to_frame()

Below, you will see the most popular “n-grams” from BBC News.

Bigrams of BBCNGrams Dataframe from BBC

To simply visualize the most popular n-grams of a news source, use the code block below.

bbc_bigrams.plot.barh(color="red", width=.8,figsize=(10 , 7))

“Ukraine, war” is the trending news.

You can also filter the n-grams for “Ukraine” and create an “entity-attribute” pair.

News Sitemap NGramsNews Sitemap NGrams from BBC

Crawling these URLs and recognizing the “person type entities” can give you an idea about how BBC approaches newsworthy situations.

But it is beyond “news sitemaps.” Thus, it is for another day.

Advertisement

To visualize the popular n-grams from news source’s sitemaps, you can create a custom python function as below.

def ngram_visualize(dataframe:pd.DataFrame, color:str="blue") -> pd.DataFrame.plot:

     dataframe.plot.barh(color=color, width=.8,figsize=(10 ,7))
ngram_visualize(ngram_extractor(df_dailymail))

The result is below.

N-Gram VisualizationNews Sitemap Trigram Visualization

To make it interactive, add an extra parameter as below.

def ngram_visualize(dataframe:pd.DataFrame, backend:str, color:str="blue", ) -> pd.DataFrame.plot:

     if backend=="plotly":

          pd.options.plotting.backend=backend

          return dataframe.plot.bar()

     else:

          return dataframe.plot.barh(color=color, width=.8,figsize=(10 ,7))
ngram_visualize(ngram_extractor(df_dailymail), backend="plotly")

As a quick example, check below.

8. Create Your Own Custom Functions To Analyze The News Source Sitemaps

When you audit news sitemaps repeatedly, there will be a need for a small Python package.

Below, you can find four different quick Python function chain that uses every previous function as a callback.

To clean a textual content item, use the function below.

Advertisement
def text_clean(content):

  lemmetizer = nltk.stem.WordNetLemmatizer()

  stopwords = nltk.corpus.stopwords.words('english')

  content = (unicodedata.normalize('NFKD', content)

    .encode('ascii', 'ignore')

    .decode('utf-8', 'ignore')

    .lower())

  words = re.sub(r'[^ws]', '', content).split()

  return [lemmetizer.lemmatize(word) for word in words if word not in stopwords]

To extract the n-grams from a specific news website’s sitemap’s news titles, use the function below.

def ngram_extractor(dataframe:pd.DataFrame|pd.Series):

     if "news_title" in dataframe.columns:

          return dataframe_ngram_extractor(dataframe,  ngram=3, first=10)

Use the function below to turn the extracted n-grams into a data frame.

def dataframe_ngram_extractor(dataframe:pd.DataFrame|pd.Series, ngram:int, first:int):

     raw_words = text_clean(''.join(str(dataframe['news_title'].tolist())))

     return (pd.Series(ngrams(raw_words, n = ngram)).value_counts())[:first].sort_values(ascending=False).to_frame()

To extract multiple news websites’ sitemaps, use the function below.

def ngram_df_constructor(df_1:pd.DataFrame, df_2:pd.DataFrame):

  df_1_bigrams = dataframe_ngram_extractor(df_1, ngram=2, first=500)

  df_1_trigrams = dataframe_ngram_extractor(df_1, ngram=3, first=500)

  df_2_bigrams = dataframe_ngram_extractor(df_2, ngram=2, first=500)

  df_2_trigrams = dataframe_ngram_extractor(df_2, ngram=3, first=500)

  ngrams_df = {

  "df_1_bigrams":df_1_bigrams.index,

  "df_1_trigrams": df_1_trigrams.index,

  "df_2_bigrams":df_2_bigrams.index,

  "df_2_trigrams": df_2_trigrams.index,

  }

  dict_df = (pd.DataFrame({ key:pd.Series(value) for key, value in ngrams_df.items() }).reset_index(drop=True)

  .rename(columns={"df_1_bigrams":adv.url_to_df(df_1["loc"])["netloc"][1].split("www.")[1].split(".")[0] + "_bigrams",

                    "df_1_trigrams":adv.url_to_df(df_1["loc"])["netloc"][1].split("www.")[1].split(".")[0] + "_trigrams",

                    "df_2_bigrams": adv.url_to_df(df_2["loc"])["netloc"][1].split("www.")[1].split(".")[0] + "_bigrams",

                    "df_2_trigrams": adv.url_to_df(df_2["loc"])["netloc"][1].split("www.")[1].split(".")[0] + "_trigrams"}))

  return dict_df

Below, you can see an example use case.

ngram_df_constructor(df_bbc, df_guardian)
Ngram PopularityPopular Ngram Comparison to see the news websites’ focus.

Only with these nested four custom python functions can you do the things below.

  • Easily, you can visualize these n-grams and the news website counts to check.
  • You can see the focus of the news websites for the same topic or different topics.
  • You can compare their wording or the vocabulary for the same topics.
  • You can see how many different sub-topics from the same topics or entities are processed in a comparative way.

I didn’t put the numbers for the frequencies of the n-grams.

But, the first ranked ones are the most popular ones from that specific news source.

To examine the next 500 rows, click here.

Advertisement

9. Extract The Most Used News Keywords From News Sitemaps

When it comes to news keywords, they are surprisingly still active on Google.

For example, Microsoft Bing and Google do not think that “meta keywords” are a useful signal anymore, unlike Yandex.

But, news keywords from the news sitemaps are still used.

Among all these news sources, only The Guardian uses the news keywords.

And understanding how they use news keywords to provide relevance is useful.

df_guardian["news_keywords"].str.split().explode().value_counts().to_frame().rename(columns={"news_keywords":"news_keyword_occurence"})

You can see the most used words in the news keywords for The Guardian.

Advertisement
news_keyword_occurence
news,
250
World
142
and
142
Ukraine,
127
UK
116
...
...
Cumberbatch,
1
Dune
1
Saracens
1
Pearson,
1
Thailand
1
1409 rows × 1 column

The visualization is below.

(df_guardian["news_keywords"].str.split().explode().value_counts()

.to_frame().rename(columns={"news_keywords":"news_keyword_occurence"})

.head(25).plot.barh(figsize=(10,8),

title="The Guardian Most Used Words in News Keywords", xlabel="News Keywords",

legend=False, ylabel="Count of News Keyword"))

Most Popular Words in News KeywordsMost Popular Words in News Keywords

The “,” at the end of the news keywords represent whether it is a separate value or part of another.
I suggest you not remove the “punctuations” or “stop words” from news keywords so that you can see their news keyword usage style better.

For a different analysis, you can use “,” as a separator.

df_guardian["news_keywords"].str.split(",").explode().value_counts().to_frame().rename(columns={"news_keywords":"news_keyword_occurence"})

The result difference is below.

news_keyword_occurence
World news
134
Europe
116
UK news
111
Sport
109
Russia
90
...
...
Women's shoes
1
Men's shoes
1
Body image
1
Kae Tempest
1
Thailand
1
1080 rows × 1 column

Focus on the “split(“,”).”

(df_guardian["news_keywords"].str.split(",").explode().value_counts()

.to_frame().rename(columns={"news_keywords":"news_keyword_occurence"})

.head(25).plot.barh(figsize=(10,8),

title="The Guardian Most Used Words in News Keywords", xlabel="News Keywords",

legend=False, ylabel="Count of News Keyword"))

You can see the result difference for visualization below.

Most Popular Keywords from News SitemapsMost Popular Keywords from News Sitemaps

From “Chelsea” to “Vladamir Putin” or “Ukraine War” and “Roman Abramovich,” most of these phrases align with the early days of Russia’s Invasion of Ukraine.

Use the code block below to visualize two different news website sitemaps’ news keywords interactively.

Advertisement
df_1 = df_guardian["news_keywords"].str.split(",").explode().value_counts().to_frame().rename(columns={"news_keywords":"news_keyword_occurence"})

df_2 = df_nyt["news_keywords"].str.split(",").explode().value_counts().to_frame().rename(columns={"news_keywords":"news_keyword_occurence"})

fig = make_subplots(rows = 1, cols = 2)

fig.add_trace(

     go.Bar(y = df_1["news_keyword_occurence"][:6].index, x = df_1["news_keyword_occurence"], orientation="h", name="The Guardian News Keywords"), row=1, col=2

)

fig.add_trace(

     go.Bar(y = df_2["news_keyword_occurence"][:6].index, x = df_2["news_keyword_occurence"], orientation="h", name="New York Times News Keywords"), row=1, col=1

)

fig.update_layout(height = 800, width = 1200, title_text="Side by Side Popular News Keywords")

fig.show()

fig.write_html("news_keywords.html")

You can see the result below.

To interact with the live chart, click here.

In the next section, you will find two different subplot samples to compare the n-grams of the news websites.

10. Create Subplots For Comparing News Sources

Use the code block below to put the news sources’ most popular n-grams from the news titles to a sub-plot.

import matplotlib.pyplot as plt

import pandas as pd

df1 = ngram_extractor(df_bbc)

df2 = ngram_extractor(df_skynews)

df3 = ngram_extractor(df_dailymail)

df4 = ngram_extractor(df_guardian)

df5 = ngram_extractor(df_nyt)

df6 = ngram_extractor(df_cnn)

nrow=3

ncol=2

df_list = [df1 ,df2, df3, df4, df5, df6] #df6

titles = ["BBC News Trigrams", "Skynews Trigrams", "Dailymail Trigrams", "The Guardian Trigrams", "New York Times Trigrams", "CNN News Ngrams"]

fig, axes = plt.subplots(nrow, ncol, figsize=(25,32))

count=0

i = 0

for r in range(nrow):

    for c in range(ncol):

        (df_list[count].plot.barh(ax = axes[r,c],

        figsize = (40, 28),

        title = titles[i],

        fontsize = 10,

        legend = False,

        xlabel = "Trigrams",

        ylabel = "Count"))        

        count+=1

        i += 1

You can see the result below.

News Source NGramsMost Popular NGrams from News Sources

The example data visualization above is entirely static and doesn’t provide any interactivity.

Lately, Elias Dabbas, creator of Advertools, has shared a new script to take the article count, n-grams, and their counts from the news sources.

Advertisement

Check here for a better, more detailed, and interactive data dashboard.

The example above is from Elias Dabbas, and he demonstrates how to take the total article count, most frequent words, and n-grams from news websites in an interactive way.

Final Thoughts On News Sitemap Analysis With Python

This tutorial was designed to provide an educational Python coding session to take the keywords, n-grams, phrase patterns, languages, and other kinds of SEO-related information from news websites.

News SEO heavily relies on quick reflexes and always-on article creation.

Tracking your competitors’ angles and methods for covering a topic shows how the competitors have quick reflexes for the search trends.

Creating a Google Trends Dashboard and News Source Ngram Tracker for a comparative and complementary news SEO analysis would be better.

Advertisement

In this article, from time to time, I have put custom functions or advanced for loops, and sometimes, I have kept things simple.

Beginners to advanced Python practitioners can benefit from it to improve their tracking, reporting, and analyzing methodologies for news SEO and beyond.

More resources:


Featured Image: BestForBest/Shutterstock



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

2024 WordPress Vulnerability Report Shows Errors Sites Keep Making

Published

on

By

2024 Annual WordPress security report by WPScan

WordPress security scanner WPScan’s 2024 WordPress vulnerability report calls attention to WordPress vulnerability trends and suggests the kinds of things website publishers (and SEOs) should be looking out for.

Some of the key findings from the report were that just over 20% of vulnerabilities were rated as high or critical level threats, with medium severity threats, at 67% of reported vulnerabilities, making up the majority. Many regard medium level vulnerabilities as if they are low-level threats and that’s a mistake because they’re not low level and should be regarded as deserving attention.

The WPScan report advised:

“While severity doesn’t translate directly to the risk of exploitation, it’s an important guideline for website owners to make an educated decision about when to disable or update the extension.”

WordPress Vulnerability Severity Distribution

Critical level vulnerabilities, the highest level of threat, represented only 2.38% of vulnerabilities, which is essentially good news for WordPress publishers. Yet as mentioned earlier, when combined with the percentages of high level threats (17.68%) the number or concerning vulnerabilities rises to almost 20%.

Here are the percentages by severity ratings:

Advertisement
  • Critical 2.38%
  • Low 12.83%
  • High 17.68%
  • Medium 67.12%

Authenticated Versus Unauthenticated

Authenticated vulnerabilities are those that require an attacker to first attain user credentials and their accompanying permission levels in order to exploit a particular vulnerability. Exploits that require subscriber-level authentication are the most exploitable of the authenticated exploits and those that require administrator level access present the least risk (although not always a low risk for a variety of reasons).

Unauthenticated attacks are generally the easiest to exploit because anyone can launch an attack without having to first acquire a user credential.

The WPScan vulnerability report found that about 22% of reported vulnerabilities required subscriber level or no authentication at all, representing the most exploitable vulnerabilities. On the other end of the scale of the exploitability are vulnerabilities requiring admin permission levels representing a total of 30.71% of reported vulnerabilities.

Permission Levels Required For Exploits

Vulnerabilities requiring administrator level credentials represented the highest percentage of exploits, followed by Cross Site Request Forgery (CSRF) with 24.74% of vulnerabilities. This is interesting because CSRF is an attack that uses social engineering to get a victim to click a link from which the user’s permission levels are acquired. This is a mistake that WordPress publishers should be aware of because all it takes is for an admin level user to follow a link which then enables the hacker to assume admin level privileges to the WordPress website.

The following is the percentages of exploits ordered by roles necessary to launch an attack.

Ascending Order Of User Roles For Vulnerabilities

  • Author 2.19%
  • Subscriber 10.4%
  • Unauthenticated 12.35%
  • Contributor 19.62%
  • CSRF 24.74%
  • Admin 30.71%

Most Common Vulnerability Types Requiring Minimal Authentication

Broken Access Control in the context of WordPress refers to a security failure that can allow an attacker without necessary permission credentials to gain access to higher credential permissions.

In the section of the report that looks at the occurrences and vulnerabilities underlying unauthenticated or subscriber level vulnerabilities reported (Occurrence vs Vulnerability on Unauthenticated or Subscriber+ reports), WPScan breaks down the percentages for each vulnerability type that is most common for exploits that are the easiest to launch (because they require minimal to no user credential authentication).

Advertisement

The WPScan threat report noted that Broken Access Control represents a whopping 84.99% followed by SQL injection (20.64%).

The Open Worldwide Application Security Project (OWASP) defines Broken Access Control as:

“Access control, sometimes called authorization, is how a web application grants access to content and functions to some users and not others. These checks are performed after authentication, and govern what ‘authorized’ users are allowed to do.

Access control sounds like a simple problem but is insidiously difficult to implement correctly. A web application’s access control model is closely tied to the content and functions that the site provides. In addition, the users may fall into a number of groups or roles with different abilities or privileges.”

SQL injection, at 20.64% represents the second most prevalent type of vulnerability, which WPScan referred to as both “high severity and risk” in the context of vulnerabilities requiring minimal authentication levels because attackers can access and/or tamper with the database which is the heart of every WordPress website.

These are the percentages:

  • Broken Access Control 84.99%
  • SQL Injection 20.64%
  • Cross-Site Scripting 9.4%
  • Unauthenticated Arbitrary File Upload 5.28%
  • Sensitive Data Disclosure 4.59%
  • Insecure Direct Object Reference (IDOR) 3.67%
  • Remote Code Execution 2.52%
  • Other 14.45%

Vulnerabilities In The WordPress Core Itself

The overwhelming majority of vulnerability issues were reported in third-party plugins and themes. However, there were in 2023 a total of 13 vulnerabilities reported in the WordPress core itself. Out of the thirteen vulnerabilities only one of them was rated as a high severity threat, which is the second highest level, with Critical being the highest level vulnerability threat, a rating scoring system maintained by the Common Vulnerability Scoring System (CVSS).

The WordPress core platform itself is held to the highest standards and benefits from a worldwide community that is vigilant in discovering and patching vulnerabilities.

Advertisement

Website Security Should Be Considered As Technical SEO

Site audits don’t normally cover website security but in my opinion every responsible audit should at least talk about security headers. As I’ve been saying for years, website security quickly becomes an SEO issue once a website’s ranking start disappearing from the search engine results pages (SERPs) due to being compromised by a vulnerability. That’s why it’s critical to be proactive about website security.

According to the WPScan report, the main point of entry for hacked websites were leaked credentials and weak passwords. Ensuring strong password standards plus two-factor authentication is an important part of every website’s security stance.

Using security headers is another way to help protect against Cross-Site Scripting and other kinds of vulnerabilities.

Lastly, a WordPress firewall and website hardening are also useful proactive approaches to website security. I once added a forum to a brand new website I created and it was immediately under attack within minutes. Believe it or not, virtually every website worldwide is under attack 24 hours a day by bots scanning for vulnerabilities.

Read the WPScan Report:

WPScan 2024 Website Threat Report

Advertisement

Featured Image by Shutterstock/Ljupco Smokovski

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

An In-Depth Guide And Best Practices For Mobile SEO

Published

on

By

Mobile SEO: An In-Depth Guide And Best Practices

Over the years, search engines have encouraged businesses to improve mobile experience on their websites. More than 60% of web traffic comes from mobile, and in some cases based on the industry, mobile traffic can reach up to 90%.

Since Google has completed its switch to mobile-first indexing, the question is no longer “if” your website should be optimized for mobile, but how well it is adapted to meet these criteria. A new challenge has emerged for SEO professionals with the introduction of Interaction to Next Paint (INP), which replaced First Input Delay (FID) starting March, 12 2024.

Thus, understanding mobile SEO’s latest advancements, especially with the shift to INP, is crucial. This guide offers practical steps to optimize your site effectively for today’s mobile-focused SEO requirements.

What Is Mobile SEO And Why Is It Important?

The goal of mobile SEO is to optimize your website to attain better visibility in search engine results specifically tailored for mobile devices.

This form of SEO not only aims to boost search engine rankings, but also prioritizes enhancing mobile user experience through both content and technology.

Advertisement

While, in many ways, mobile SEO and traditional SEO share similar practices, additional steps related to site rendering and content are required to meet the needs of mobile users and the speed requirements of mobile devices.

Does this need to be a priority for your website? How urgent is it?

Consider this: 58% of the world’s web traffic comes from mobile devices.

If you aren’t focused on mobile users, there is a good chance you’re missing out on a tremendous amount of traffic.

Mobile-First Indexing

Additionally, as of 2023, Google has switched its crawlers to a mobile-first indexing priority.

This means that the mobile experience of your site is critical to maintaining efficient indexing, which is the step before ranking algorithms come into play.

Advertisement

Read more: Where We Are Today With Google’s Mobile-First Index

How Much Of Your Traffic Is From Mobile?

How much traffic potential you have with mobile users can depend on various factors, including your industry (B2B sites might attract primarily desktop users, for example) and the search intent your content addresses (users might prefer desktop for larger purchases, for example).

Regardless of where your industry and the search intent of your users might be, the future will demand that you optimize your site experience for mobile devices.

How can you assess your current mix of mobile vs. desktop users?

An easy way to see what percentage of your users is on mobile is to go into Google Analytics 4.

  • Click Reports in the left column.
  • Click on the Insights icon on the right side of the screen.
  • Scroll down to Suggested Questions and click on it.
  • Click on Technology.
  • Click on Top Device model by Users.
  • Then click on Top Device category by Users under Related Results.
  • The breakdown of Top Device category will match the date range selected at the top of GA4.
Screenshot from GA4, March 2024

You can also set up a report in Looker Studio.

  • Add your site to the Data source.
  • Add Device category to the Dimension field.
  • Add 30-day active users to the Metric field.
  • Click on Chart to select the view that works best for you.
A screen capture from Looker Studio showing a pie chart with a breakdown of mobile, desktop, tablet, and Smart TV users for a siteScreenshot from Looker Studio, March 2024

You can add more Dimensions to really dig into the data to see which pages attract which type of users, what the mobile-to-desktop mix is by country, which search engines send the most mobile users, and so much more.

Read more: Why Mobile And Desktop Rankings Are Different

Advertisement

How To Check If Your Site Is Mobile-Friendly

Now that you know how to build a report on mobile and desktop usage, you need to figure out if your site is optimized for mobile traffic.

While Google removed the mobile-friendly testing tool from Google Search Console in December 2023, there are still a number of useful tools for evaluating your site for mobile users.

Bing still has a mobile-friendly testing tool that will tell you the following:

  • Viewport is configured correctly.
  • Page content fits device width.
  • Text on the page is readable.
  • Links and tap targets are sufficiently large and touch-friendly.
  • Any other issues detected.

Google’s Lighthouse Chrome extension provides you with an evaluation of your site’s performance across several factors, including load times, accessibility, and SEO.

To use, install the Lighthouse Chrome extension.

  • Go to your website in your browser.
  • Click on the orange lighthouse icon in your browser’s address bar.
  • Click Generate Report.
  • A new tab will open and display your scores once the evaluation is complete.
An image showing the Lighthouse Scores for a website.Screenshot from Lighthouse, March 2024

You can also use the Lighthouse report in Developer Tools in Chrome.

  • Simply click on the three dots next to the address bar.
  • Select “More Tools.”
  • Select Developer Tools.
  • Click on the Lighthouse tab.
  • Choose “Mobile” and click the “Analyze page load” button.
An image showing how to get to Lighthouse within Google Chrome Developer Tools.Screenshot from Lighthouse, March 2024

Another option that Google offers is the PageSpeed Insights (PSI) tool. Simply add your URL into the field and click Analyze.

PSI will integrate any Core Web Vitals scores into the resulting view so you can see what your users are experiencing when they come to your site.

An image showing the PageSpeed Insights scores for a website.Screenshot from PageSpeed Insights, March 2024

Other tools, like WebPageTest.org, will graphically display the processes and load times for everything it takes to display your webpages.

With this information, you can see which processes block the loading of your pages, which ones take the longest to load, and how this affects your overall page load times.

Advertisement

You can also emulate the mobile experience by using Developer Tools in Chrome, which allows you to switch back and forth between a desktop and mobile experience.

An image showing how to change the device emulation for a site within Google Chrome Developer ToolsScreenshot from Google Chrome Developer Tools, March 2024

Lastly, use your own mobile device to load and navigate your website:

  • Does it take forever to load?
  • Are you able to navigate your site to find the most important information?
  • Is it easy to add something to cart?
  • Can you read the text?

Read more: Google PageSpeed Insights Reports: A Technical Guide

How To Optimize Your Site Mobile-First

With all these tools, keep an eye on the Performance and Accessibility scores, as these directly affect mobile users.

Expand each section within the PageSpeed Insights report to see what elements are affecting your score.

These sections can give your developers their marching orders for optimizing the mobile experience.

While mobile speeds for cellular networks have steadily improved around the world (the average speed in the U.S. has jumped to 27.06 Mbps from 11.14 Mbps in just eight years), speed and usability for mobile users are at a premium.

Read more: Top 7 SEO Benefits Of Responsive Web Design

Advertisement

Best Practices For Mobile Optimization

Unlike traditional SEO, which can focus heavily on ensuring that you are using the language of your users as it relates to the intersection of your products/services and their needs, optimizing for mobile SEO can seem very technical SEO-heavy.

While you still need to be focused on matching your content with the needs of the user, mobile search optimization will require the aid of your developers and designers to be fully effective.

Below are several key factors in mobile SEO to keep in mind as you’re optimizing your site.

Site Rendering

How your site responds to different devices is one of the most important elements in mobile SEO.

The two most common approaches to this are responsive design and dynamic serving.

Responsive design is the most common of the two options.

Advertisement

Using your site’s cascading style sheets (CSS) and flexible layouts, as well as responsive content delivery networks (CDN) and modern image file types, responsive design allows your site to adjust to a variety of screen sizes, orientations, and resolutions.

With the responsive design, elements on the page adjust in size and location based on the size of the screen.

You can simply resize the window of your desktop browser and see how this works.

An image showing the difference between Web.dev in a full desktop display vs. a mobile display using responsive design.Screenshot from web.dev, March 2024

This is the approach that Google recommends.

Adaptive design, also known as dynamic serving, consists of multiple fixed layouts that are dynamically served to the user based on their device.

Sites can have a separate layout for desktop, smartphone, and tablet users. Each design can be modified to remove functionality that may not make sense for certain device types.

This is a less efficient approach, but it does give sites more control over what each device sees.

Advertisement

While these will not be covered here, two other options:

  • Progressive Web Apps (PWA), which can seamlessly integrate into a mobile app.
  • Separate mobile site/URL (which is no longer recommended).

Read more: An Introduction To Rendering For SEO

Interaction to Next Paint (INP)

Google has introduced Interaction to Next Paint (INP) as a more comprehensive measure of user experience, succeeding First Input Delay. While FID measures the time from when a user first interacts with your page (e.g., clicking a link, tapping a button) to the time when the browser is actually able to begin processing event handlers in response to that interaction. INP, on the other hand, broadens the scope by measuring the responsiveness of a website throughout the entire lifespan of a page, not just first interaction.

Note that actions such as hovering and scrolling do not influence INP, however, keyboard-driven scrolling or navigational actions are considered keystrokes that may activate events measured by INP but not scrolling which is happeing due to interaction.

Scrolling may indirectly affect INP, for example in scenarios where users scroll through content, and additional content is lazy-loaded from the API. While the act of scrolling itself isn’t included in the INP calculation, the processing, necessary for loading additional content, can create contention on the main thread, thereby increasing interaction latency and adversely affecting the INP score.

What qualifies as an optimal INP score?

  • An INP under 200ms indicates good responsiveness.
  • Between 200ms and 500ms needs improvement.
  • Over 500ms means page has poor responsiveness.

and these are common issues causing poor INP scores:

  1. Long JavaScript Tasks: Heavy JavaScript execution can block the main thread, delaying the browser’s ability to respond to user interactions. Thus break long JS tasks into smaller chunks by using scheduler API.
  2. Large DOM (HTML) Size: A large DOM ( starting from 1500 elements) can severely impact a website’s interactive performance. Every additional DOM element increases the work required to render pages and respond to user interactions.
  3. Inefficient Event Callbacks: Event handlers that execute lengthy or complex operations can significantly affect INP scores. Poorly optimized callbacks attached to user interactions, like clicks, keypress or taps, can block the main thread, delaying the browser’s ability to render visual feedback promptly. For example when handlers perform heavy computations or initiate synchronous network requests such on clicks.

and you can troubleshoot INP issues using free and paid tools.

As a good starting point I would recommend to check your INP scores by geos via treo.sh which will give you a great high level insights where you struggle with most.

INP scores by GeosINP scores by Geos

Read more: How To Improve Interaction To Next Paint (INP)

Image Optimization

Images add a lot of value to the content on your site and can greatly affect the user experience.

Advertisement

From page speeds to image quality, you could adversely affect the user experience if you haven’t optimized your images.

This is especially true for the mobile experience. Images need to adjust to smaller screens, varying resolutions, and screen orientation.

  • Use responsive images
  • Implement lazy loading
  • Compress your images (use WebP)
  • Add your images into sitemap

Optimizing images is an entire science, and I advise you to read our comprehensive guide on image SEO how to implement the mentioned recommendations.

Avoid Intrusive Interstitials

Google rarely uses concrete language to state that something is a ranking factor or will result in a penalty, so you know it means business about intrusive interstitials in the mobile experience.

Intrusive interstitials are basically pop-ups on a page that prevent the user from seeing content on the page.

John Mueller, Google’s Senior Search Analyst, stated that they are specifically interested in the first interaction a user has after clicking on a search result.

Examples of intrusive interstitial pop-ups on a mobile site according to Google.

Not all pop-ups are considered bad. Interstitial types that are considered “intrusive” by Google include:

Advertisement
  • Pop-ups that cover most or all of the page content.
  • Non-responsive interstitials or pop-ups that are impossible for mobile users to close.
  • Pop-ups that are not triggered by a user action, such as a scroll or a click.

Read more: 7 Tips To Keep Pop-Ups From Harming Your SEO

Structured Data

Most of the tips provided in this guide so far are focused on usability and speed and have an additive effect, but there are changes that can directly influence how your site appears in mobile search results.

Search engine results pages (SERPs) haven’t been the “10 blue links” in a very long time.

They now reflect the diversity of search intent, showing a variety of different sections to meet the needs of users. Local Pack, shopping listing ads, video content, and more dominate the mobile search experience.

As a result, it’s more important than ever to provide structured data markup to the search engines, so they can display rich results for users.

In this example, you can see that both Zojirushi and Amazon have included structured data for their rice cookers, and Google is displaying rich results for both.

An image of a search result for Japanese rice cookers that shows rich results for Zojirushi and Amazon.Screenshot from search for [Japanese rice cookers], Google, March 2024

Adding structured data markup to your site can influence how well your site shows up for local searches and product-related searches.

Using JSON-LD, you can mark up the business, product, and services data on your pages in Schema markup.

Advertisement

If you use WordPress as the content management system for your site, there are several plugins available that will automatically mark up your content with structured data.

Read more: What Structured Data To Use And Where To Use It?

Content Style

When you think about your mobile users and the screens on their devices, this can greatly influence how you write your content.

Rather than long, detailed paragraphs, mobile users prefer concise writing styles for mobile reading.

Each key point in your content should be a single line of text that easily fits on a mobile screen.

Your font sizes should adjust to the screen’s resolution to avoid eye strain for your users.

Advertisement

If possible, allow for a dark or dim mode for your site to further reduce eye strain.

Headers should be concise and address the searcher’s intent. Rather than lengthy section headers, keep it simple.

Finally, make sure that your text renders in a font size that’s readable.

Read more: 10 Tips For Creating Mobile-Friendly Content

Tap Targets

As important as text size, the tap targets on your pages should be sized and laid out appropriately.

Tap targets include navigation elements, links, form fields, and buttons like “Add to Cart” buttons.

Advertisement

Targets smaller than 48 pixels by 48 pixels and targets that overlap or are overlapped by other page elements will be called out in the Lighthouse report.

Tap targets are essential to the mobile user experience, especially for ecommerce websites, so optimizing them is vital to the health of your online business.

Read more: Google’s Lighthouse SEO Audit Tool Now Measures Tap Target Spacing

Prioritizing These Tips

If you have delayed making your site mobile-friendly until now, this guide may feel overwhelming. As a result, you may not know what to prioritize first.

As with so many other optimizations in SEO, it’s important to understand which changes will have the greatest impact, and this is just as true for mobile SEO.

Think of SEO as a framework in which your site’s technical aspects are the foundation of your content. Without a solid foundation, even the best content may struggle to rank.

Advertisement
  • Responsive or Dynamic Rendering: If your site requires the user to zoom and scroll right or left to read the content on your pages, no number of other optimizations can help you. This should be first on your list.
  • Content Style: Rethink how your users will consume your content online. Avoid very long paragraphs. “Brevity is the soul of wit,” to quote Shakespeare.
  • Image Optimization: Begin migrating your images to next-gen image formats and optimize your content display network for speed and responsiveness.
  • Tap Targets: A site that prevents users from navigating or converting into sales won’t be in business long. Make navigation, links, and buttons usable for them.
  • Structured Data: While this element ranks last in priority on this list, rich results can improve your chances of receiving traffic from a search engine, so add this to your to-do list once you’ve completed the other optimizations.

Summary

From How Search Works, “Google’s mission is to organize the world’s information and make it universally accessible and useful.”

If Google’s primary mission is focused on making all the world’s information accessible and useful, then you know they will prefer surfacing sites that align with that vision.

Since a growing percentage of users are on mobile devices, you may want to infer the word “everywhere” added to the end of the mission statement.

Are you missing out on traffic from mobile devices because of a poor mobile experience?

If you hope to remain relevant, make mobile SEO a priority now.


Featured Image: Paulo Bobita/Search Engine Journal

Source link

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

HARO Has Been Dead for a While

Published

on

HARO Has Been Dead for a While

Every SEO’s favorite link-building collaboration tool, HARO, was officially killed off for good last week by Cision. It’s now been wrapped into a new product: Connectively.

I know nothing about the new tool. I haven’t tried it. But after trying to use HARO recently, I can’t say I’m surprised or saddened by its death. It’s been a walking corpse for a while. 

I used HARO way back in the day to build links. It worked. But a couple of months ago, I experienced the platform from the other side when I decided to try to source some “expert” insights for our posts. 

After just a few minutes of work, I got hundreds of pitches: 

So, I grabbed a cup of coffee and began to work through them. It didn’t take long before I lost the will to live. Every other pitch seemed like nothing more than lazy AI-generated nonsense from someone who definitely wasn’t an expert. 

Advertisement

Here’s one of them: 

Example of an AI-generated pitch in HAROExample of an AI-generated pitch in HARO

Seriously. Who writes like that? I’m a self-confessed dullard (any fellow Dull Men’s Club members here?), and even I’m not that dull… 

I don’t think I looked through more than 30-40 of the responses. I just couldn’t bring myself to do it. It felt like having a conversation with ChatGPT… and not a very good one! 

Despite only reviewing a few dozen of the many pitches I received, one stood out to me: 

Example HARO pitch that caught my attentionExample HARO pitch that caught my attention

Believe it or not, this response came from a past client of mine who runs an SEO agency in the UK. Given how knowledgeable and experienced he is (he actually taught me a lot about SEO back in the day when I used to hassle him with questions on Skype), this pitch rang alarm bells for two reasons: 

  1. I truly doubt he spends his time replying to HARO queries
  2. I know for a fact he’s no fan of Neil Patel (sorry, Neil, but I’m sure you’re aware of your reputation at this point!)

So… I decided to confront him 😉 

Here’s what he said: 

Hunch, confirmed ;)Hunch, confirmed ;)

Shocker. 

I pressed him for more details: 

Advertisement

I’m getting a really good deal and paying per link rather than the typical £xxxx per month for X number of pitches. […] The responses as you’ve seen are not ideal but that’s a risk I’m prepared to take as realistically I dont have the time to do it myself. He’s not native english, but I have had to have a word with him a few times about clearly using AI. On the low cost ones I don’t care but on authority sites it needs to be more refined.

I think this pretty much sums up the state of HARO before its death. Most “pitches” were just AI answers from SEOs trying to build links for their clients. 

Don’t get me wrong. I’m not throwing shade here. I know that good links are hard to come by, so you have to do what works. And the reality is that HARO did work. Just look at the example below. You can tell from the anchor and surrounding text in Ahrefs that these links were almost certainly built with HARO: 

Example of links build with HARO, via Ahrefs' Site ExplorerExample of links build with HARO, via Ahrefs' Site Explorer

But this was the problem. HARO worked so well back in the day that it was only a matter of time before spammers and the #scale crew ruined it for everyone. That’s what happened, and now HARO is no more. So… 

If you’re a link builder, I think it’s time to admit that HARO link building is dead and move on. 

No tactic works well forever. It’s the law of sh**ty clickthroughs. This is why you don’t see SEOs having huge success with tactics like broken link building anymore. They’ve moved on to more innovative tactics or, dare I say it, are just buying links.

Sidenote.

Talking of buying links, here’s something to ponder: if Connectively charges for pitches, are links built through those pitches technically paid? If so, do they violate Google’s spam policies? It’s a murky old world this SEO lark, eh?

If you’re a journalist, Connectively might be worth a shot. But with experts being charged for pitches, you probably won’t get as many responses. That might be a good thing. You might get less spam. Or you might just get spammed by SEOs with deep pockets. The jury’s out for now. 

Advertisement

My advice? Look for alternative methods like finding and reaching out to experts directly. You can easily use tools like Content Explorer to find folks who’ve written lots of content about the topic and are likely to be experts. 

For example, if you look for content with “backlinks” in the title and go to the Authors tab, you might see a familiar name. 😉 

Finding people to request insights from in Ahrefs' Content ExplorerFinding people to request insights from in Ahrefs' Content Explorer

I don’t know if I’d call myself an expert, but I’d be happy to give you a quote if you reached out on social media or emailed me (here’s how to find my email address).

Alternatively, you can bait your audience into giving you their insights on social media. I did this recently with a poll on X and included many of the responses in my guide to toxic backlinks.

Me, indirectly sourcing insights on social mediaMe, indirectly sourcing insights on social media

Either of these options is quicker than using HARO because you don’t have to sift through hundreds of responses looking for a needle in a haystack. If you disagree with me and still love HARO, feel free to tell me why on X 😉



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS