Connect with us

SEO

How Compression Can Be Used To Detect Low Quality Pages

Published

on

Compression can be used by search engines to detect low-quality pages. Although not widely known, it's useful foundational knowledge for SEO.

The concept of Compressibility as a quality signal is not widely known, but SEOs should be aware of it. Search engines can use web page compressibility to identify duplicate pages, doorway pages with similar content, and pages with repetitive keywords, making it useful knowledge for SEO.

Although the following research paper demonstrates a successful use of on-page features for detecting spam, the deliberate lack of transparency by search engines makes it difficult to say with certainty if search engines are applying this or similar techniques.

What Is Compressibility?

In computing, compressibility refers to how much a file (data) can be reduced in size while retaining essential information, typically to maximize storage space or to allow more data to be transmitted over the Internet.

TL/DR Of Compression

Compression replaces repeated words and phrases with shorter references, reducing the file size by significant margins. Search engines typically compress indexed web pages to maximize storage space, reduce bandwidth, and improve retrieval speed, among other reasons.

This is a simplified explanation of how compression works:

  • Identify Patterns:
    A compression algorithm scans the text to find repeated words, patterns and phrases
  • Shorter Codes Take Up Less Space:
    The codes and symbols use less storage space then the original words and phrases, which results in a smaller file size.
  • Shorter References Use Less Bits:
    The “code” that essentially symbolizes the replaced words and phrases uses less data than the originals.

A bonus effect of using compression is that it can also be used to identify duplicate pages, doorway pages with similar content, and pages with repetitive keywords.

Research Paper About Detecting Spam

This research paper is significant because it was authored by distinguished computer scientists known for breakthroughs in AI, distributed computing, information retrieval, and other fields.

Advertisement

Marc Najork

One of the co-authors of the research paper is Marc Najork, a prominent research scientist who currently holds the title of Distinguished Research Scientist at Google DeepMind. He’s a co-author of the papers for TW-BERT, has contributed research for increasing the accuracy of using implicit user feedback like clicks, and worked on creating improved AI-based information retrieval (DSI++: Updating Transformer Memory with New Documents), among many other major breakthroughs in information retrieval.

Dennis Fetterly

Another of the co-authors is Dennis Fetterly, currently a software engineer at Google. He is listed as a co-inventor in a patent for a ranking algorithm that uses links, and is known for his research in distributed computing and information retrieval.

Those are just two of the distinguished researchers listed as co-authors of the 2006 Microsoft research paper about identifying spam through on-page content features. Among the several on-page content features the research paper analyzes is compressibility, which they discovered can be used as a classifier for indicating that a web page is spammy.

Detecting Spam Web Pages Through Content Analysis

Although the research paper was authored in 2006, its findings remain relevant to today.

Then, as now, people attempted to rank hundreds or thousands of location-based web pages that were essentially duplicate content aside from city, region, or state names. Then, as now, SEOs often created web pages for search engines by excessively repeating keywords within titles, meta descriptions, headings, internal anchor text, and within the content to improve rankings.

Section 4.6 of the research paper explains:

Advertisement

“Some search engines give higher weight to pages containing the query keywords several times. For example, for a given query term, a page that contains it ten times may be higher ranked than a page that contains it only once. To take advantage of such engines, some spam pages replicate their content several times in an attempt to rank higher.”

The research paper explains that search engines compress web pages and use the compressed version to reference the original web page. They note that excessive amounts of redundant words results in a higher level of compressibility. So they set about testing if there’s a correlation between a high level of compressibility and spam.

They write:

“Our approach in this section to locating redundant content within a page is to compress the page; to save space and disk time, search engines often compress web pages after indexing them, but before adding them to a page cache.

…We measure the redundancy of web pages by the compression ratio, the size of the uncompressed page divided by the size of the compressed page. We used GZIP …to compress pages, a fast and effective compression algorithm.”

High Compressibility Correlates To Spam

The results of the research showed that web pages with at least a compression ratio of 4.0 tended to be low quality web pages, spam. However, the highest rates of compressibility became less consistent because there were fewer data points, making it harder to interpret.

Figure 9: Prevalence of spam relative to compressibility of page.

The researchers concluded:

Advertisement

“70% of all sampled pages with a compression ratio of at least 4.0 were judged to be spam.”

But they also discovered that using the compression ratio by itself still resulted in false positives, where non-spam pages were incorrectly identified as spam:

“The compression ratio heuristic described in Section 4.6 fared best, correctly identifying 660 (27.9%) of the spam pages in our collection, while misidentifying 2, 068 (12.0%) of all judged pages.

Using all of the aforementioned features, the classification accuracy after the ten-fold cross validation process is encouraging:

95.4% of our judged pages were classified correctly, while 4.6% were classified incorrectly.

More specifically, for the spam class 1, 940 out of the 2, 364 pages, were classified correctly. For the non-spam class, 14, 440 out of the 14,804 pages were classified correctly. Consequently, 788 pages were classified incorrectly.”

The next section describes an interesting discovery about how to increase the accuracy of using on-page signals for identifying spam.

Insight Into Quality Rankings

The research paper examined multiple on-page signals, including compressibility. They discovered that each individual signal (classifier) was able to find some spam but that relying on any one signal on its own resulted in flagging non-spam pages for spam, which are commonly referred to as false positive.

Advertisement

The researchers made an important discovery that everyone interested in SEO should know, which is that using multiple classifiers increased the accuracy of detecting spam and decreased the likelihood of false positives. Just as important, the compressibility signal only identifies one kind of spam but not the full range of spam.

The takeaway is that compressibility is a good way to identify one kind of spam but there are other kinds of spam that aren’t caught with this one signal. Other kinds of spam were not caught with the compressibility signal.

This is the part that every SEO and publisher should be aware of:

“In the previous section, we presented a number of heuristics for assaying spam web pages. That is, we measured several characteristics of web pages, and found ranges of those characteristics which correlated with a page being spam. Nevertheless, when used individually, no technique uncovers most of the spam in our data set without flagging many non-spam pages as spam.

For example, considering the compression ratio heuristic described in Section 4.6, one of our most promising methods, the average probability of spam for ratios of 4.2 and higher is 72%. But only about 1.5% of all pages fall in this range. This number is far below the 13.8% of spam pages that we identified in our data set.”

So, even though compressibility was one of the better signals for identifying spam, it still was unable to uncover the full range of spam within the dataset the researchers used to test the signals.

Combining Multiple Signals

The above results indicated that individual signals of low quality are less accurate. So they tested using multiple signals. What they discovered was that combining multiple on-page signals for detecting spam resulted in a better accuracy rate with less pages misclassified as spam.

Advertisement

The researchers explained that they tested the use of multiple signals:

“One way of combining our heuristic methods is to view the spam detection problem as a classification problem. In this case, we want to create a classification model (or classifier) which, given a web page, will use the page’s features jointly in order to (correctly, we hope) classify it in one of two classes: spam and non-spam.”

These are their conclusions about using multiple signals:

“We have studied various aspects of content-based spam on the web using a real-world data set from the MSNSearch crawler. We have presented a number of heuristic methods for detecting content based spam. Some of our spam detection methods are more effective than others, however when used in isolation our methods may not identify all of the spam pages. For this reason, we combined our spam-detection methods to create a highly accurate C4.5 classifier. Our classifier can correctly identify 86.2% of all spam pages, while flagging very few legitimate pages as spam.”

Key Insight:

Misidentifying “very few legitimate pages as spam” was a significant breakthrough. The important insight that everyone involved with SEO should take away from this is that one signal by itself can result in false positives. Using multiple signals increases the accuracy.

What this means is that SEO tests of isolated ranking or quality signals will not yield reliable results that can be trusted for making strategy or business decisions.

Takeaways

We don’t know for certain if compressibility is used at the search engines but it’s an easy to use signal that combined with others could be used to catch simple kinds of spam like thousands of city name doorway pages with similar content. Yet even if the search engines don’t use this signal, it does show how easy it is to catch that kind of search engine manipulation and that it’s something search engines are well able to handle today.

Here are the key points of this article to keep in mind:

Advertisement
  • Doorway pages with duplicate content is easy to catch because they compress at a higher ratio than normal web pages.
  • Groups of web pages with a compression ratio above 4.0 were predominantly spam.
  • Negative quality signals used by themselves to catch spam can lead to false positives.
  • In this particular test, they discovered that on-page negative quality signals only catch specific types of spam.
  • When used alone, the compressibility signal only catches redundancy-type spam, fails to detect other forms of spam, and leads to false positives.
  • Combing quality signals improves spam detection accuracy and reduces false positives.
  • Search engines today have a higher accuracy of spam detection with the use of AI like Spam Brain.

Read the research paper, which is linked from the Google Scholar page of Marc Najork:

Detecting spam web pages through content analysis

Featured Image by Shutterstock/pathdoc

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

New Google Trends SEO Documentation

Published

on

By

Google publishes new documentation for how to use Google Trends for search marketing

Google Search Central published new documentation on Google Trends, explaining how to use it for search marketing. This guide serves as an easy to understand introduction for newcomers and a helpful refresher for experienced search marketers and publishers.

The new guide has six sections:

  1. About Google Trends
  2. Tutorial on monitoring trends
  3. How to do keyword research with the tool
  4. How to prioritize content with Trends data
  5. How to use Google Trends for competitor research
  6. How to use Google Trends for analyzing brand awareness and sentiment

The section about monitoring trends advises there are two kinds of rising trends, general and specific trends, which can be useful for developing content to publish on a site.

Using the Explore tool, you can leave the search box empty and view the current rising trends worldwide or use a drop down menu to focus on trends in a specific country. Users can further filter rising trends by time periods, categories and the type of search. The results show rising trends by topic and by keywords.

To search for specific trends users just need to enter the specific queries and then filter them by country, time, categories and type of search.

The section called Content Calendar describes how to use Google Trends to understand which content topics to prioritize.

Advertisement

Google explains:

“Google Trends can be helpful not only to get ideas on what to write, but also to prioritize when to publish it. To help you better prioritize which topics to focus on, try to find seasonal trends in the data. With that information, you can plan ahead to have high quality content available on your site a little before people are searching for it, so that when they do, your content is ready for them.”

Read the new Google Trends documentation:

Get started with Google Trends

Featured Image by Shutterstock/Luis Molinero

Source link

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

All the best things about Ahrefs Evolve 2024

Published

on

All the best things about Ahrefs Evolve 2024

Hey all, I’m Rebekah and I am your Chosen One to “do a blog post for Ahrefs Evolve 2024”.

What does that entail exactly? I don’t know. In fact, Sam Oh asked me yesterday what the title of this post would be. “Is it like…Ahrefs Evolve 2024: Recap of day 1 and day 2…?” 

Even as I nodded, I couldn’t get over how absolutely boring that sounded. So I’m going to do THIS instead: a curation of all the best things YOU loved about Ahrefs’ first conference, lifted directly from X.

Let’s go!

OUR HUGE SCREEN

CONFERENCE VENUE ITSELF

It was recently named the best new skyscraper in the world, by the way.

 

OUR AMAZING SPEAKER LINEUP – SUPER INFORMATIVE, USEFUL TALKS!

 

Advertisement

GREAT MUSIC

 

AMAZING GOODIES

 

SELFIE BATTLE

Some background: Tim and Sam have a challenge going on to see who can take the most number of selfies with all of you. Last I heard, Sam was winning – but there is room for a comeback yet!

 

THAT BELL

Everybody’s just waiting for this one.

 

STICKER WALL

AND, OF COURSE…ALL OF YOU!

 

Advertisement

There’s a TON more content on LinkedIn – click here – but I have limited time to get this post up and can’t quite figure out how to embed LinkedIn posts so…let’s stop here for now. I’ll keep updating as we go along!



Source link

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

11 Tips For How To Find Great Writers

Published

on

By

11 Tips For How To Find Great Writers

Great content is the backbone of any successful SEO strategy.

Content provides information to users, facilitates ranking in the organic search results, and can be a significant driver in attracting backlinks to your website.

But how and where one sources such amazing content depends on a few factors. For one, you can write your own content, if you have the skills and time to do so.

On the other hand, you could hire a professional writer to craft content for you, but you need to know where to look!

Need an excellent writer? Consider these top tips on how and where to find experienced content writers.

Advertisement

1. Assess Your Content Needs

The first step to finding a great writer is to determine what type of writer you need. Believe it or not, there are many different kinds of copywriters and content writers (yes, they’re different), and they bring different specialties to the table.

Is your goal to craft SEO-friendly content that ranks in search engines? You’ll need a writer who understands on-page SEO best practices and the nuances of keyword usage.

Is your goal to drive conversions from a landing page on your website? You’ll need a direct-response copywriter skilled in sales copywriting and buyer psychology.

Also, these writers may advertise their services on different platforms, so it’s important to consider your needs early on so you know where to look!

Content Writers Vs. Copywriters

If you are looking for a writer who specializes in long-form, SEO-friendly content, you’ll want to find a content writer. Some examples of content writers include:

  • Blog writers – when your goal is to drive organic traffic, build brand awareness, and engage readers.
  • Article writers – when you need in-depth articles (for websites, magazines, or online publications) that educate readers on specific topics.
  • SEO writers – if you want to improve your website visibility and organic traffic to webpages.
  • Technical writers – for writing manuals, how-to guides, software documentation, and white papers.
  • Social media content writers – when you need short-form content for social media platforms like Instagram, X (Twitter), Facebook, or LinkedIn.

Now, if you are looking for a writer specializing in persuasive writing that compels people to take action (like buy a product or sign up for a service, you’ll want a copywriter.

Some examples of copywriters include:

Advertisement
  • Direct response copywriters – who specialize in writing sales letters, email campaigns, landing pages, and ads that inspire action
  • Sales copywriters – when you need product descriptions, sales pages, or promotional materials
  • Email copywriters – who write email sequences for marketing campaigns, newsletters, and product launches
  • Brand copywriters – who specialize in writing content that conveys your brand’s voice, tone, and values to build your brand identity (may include website copy, slogans, or ads)

Some content writers and copywriters offer several services. For example, it’s common to find a content writer who does blog writing, article writing, and SEO content.

However, copywriters and content writers are notably different in what they aim to achieve – sales vs. traffic, respectively.

Consider what you are trying to accomplish with your content and search for a writer with that skill set.

2. Browse Reputable Writer Directories And Platforms

Now, it’s time to find a writer. Easier said than done, right? Business owners are spoiled for choice when it comes to the number of freelancer websites available, but not all are created equal.

Ask Your Professional Network

Before venturing to a freelancer website, I suggest asking your professional network whether they know of any writers they might recommend.

Not only will you then get a referral from someone who can vouch for the writer’s services, but you’ll save a ton of time in your search.

Reach Out To Your Network

I highly recommend reaching out to your existing network to find writers who have a track record of proven results.

Advertisement

A referral from someone within your industry is even better. Ask them about their experience working with the writer and what results they generated.

Niche Facebook Groups

Facebook is a great source of freelance writers, especially within niche-specific Facebook Groups.

For example, if you’re looking for a travel writer, look for groups like the Association of Travel Writers or Travel Writers Exchange.

Many Facebook Groups also allow you to post jobs to find writers for hire.

LinkedIn Search

LinkedIn is a popular professional networking site that allows you to search for consultants, brands, and freelancers.

Simply use the LinkedIn search bar to find a “writer,” “copywriter,” “SEO writer,” etc.

Advertisement

You’ll see individuals who rank at the top for these keywords. Be sure to check out their portfolio and recommendations.

College Job Boards

Many university students are looking for part-time jobs and contract opportunities.

Check out your local university or college websites to see if they have a job board, then post the requirements of the role.

Content Agencies

Content marketing agencies specialize in content strategy and content writing, often for a variety of platforms.

While their rates may be more expensive than working with a freelance writer, you can often trust that there’s a higher degree of quality control.

You may also be able to source content for social media, email, and your website – all in one place.

Advertisement

Writer Directories

Writer directories like Compose.ly and blcklst.com allow writers to publish their portfolios, post their rates, and apply for jobs.

Some sites allow you to post an open role, while others allow you to contact the writers directly. Again, look for writers with an active portfolio and, ideally, client testimonials.

3. Request Content Examples

Once you’ve found a writer (or several) that you’d like to work with, it’s time to request more information.

Hiring a writer is a financial commitment, so do your due diligence to assess their portfolio and skills.

Always ask for examples of their work – particularly work related to your niche.

Unfortunately, stealing content examples is common practice online, so you don’t always know what you are getting; if they can send you an example with their name in the byline, that’s a safer bet.

Advertisement

Human Writers Vs. AI Content

The prevalence of AI-generated content has been on the rise. With tools like ChatGPT and contents.ai, it’s easy for businesses to turn to this fast, cheaper form of content.

But there is a lot of personality, uniqueness, and quality lost in AI content.

For one, AI content lacks the history of lived human experience to tell stories, provide relatable examples, and solve modern problems in your content.

Human writers are able to empathize with your readers and buyers, incorporating this sentiment and psychology into the content.

Also, with AI content, you’re at risk of generating material that’s identical to other pieces of content that are on the web.

This can hurt your brand and your SEO. Human writers are able to craft a unique story that’s specific to your brand voice and audience.

Advertisement

AI content has its place – such as in content planning and drafts – but should not be the basis of your entire content strategy.

While cheap, AI content can end up costing you in terms of brand visibility, user trust, and conversions.

4. Interview The Candidates

When “chatting” with a writer, a lot can be lost in translation via email or messenger. It’s always best to get on a live call to assess whether the candidate is a good fit for your brand and needs.

Just as much as you are looking for a writer with the right skills, you want to be sure they are a good character fit. Communication is important throughout the entire content planning and writing process.

Here are some questions to ask during your writer interview:

  • What types of writing do you specialize in?
  • Do you have experience in our industry?
  • How do you approach research for a topic you’re unfamiliar with?
  • How do you incorporate SEO best practices into your content writing (if applicable)?
  • Do you have experience working with content calendars, marketing teams, or campaign strategies?
  • What is your preferred workflow (e.g., strategy provided by client, first draft approval, round one revision, final approval)?
  • What’s your average turnaround time for a [type of content]?

These questions will give you a better understanding of the writer’s skills, style, and approach to writing, helping you find the right fit for your needs.

5. Look For Case Studies And Reviews

Whether you’re using your referral network, social media, or writer directories to find writers, look for their case studies or client reviews.

Advertisement

Many professional writers will have a website where they showcase their work and/or recommendations on LinkedIn or social media.

This “social proof” will make it evident what kind of results they have been able to generate for their clients.

6. Assess Their SEO Knowledge

If your goal is to grow your traffic, you’ll want a writer who understands SEO and how to incorporate it into their content.

They may not be an SEO expert, but they should know on-page best practices, such as keyword usage in the page title, heading structure and hierarchy, and the importance of internal linking.

It’s appropriate to ask them a few questions about their expertise and to request examples of SEO content. If they have case studies that showcase measurable results, even better.

7. Ask How They Measure Success

On the topic of results, you should ask writer candidates how they measure the success of their content.

Advertisement

Though many factors go into content performance – not all of which they will have control over – it’s still a fair question to assess their approach to content writing.

For example, if they are an SEO writer, do they measure success by organic traffic and reduced bounce rate? Do they tend to look at the number and position of keyword rankings? A great SEO writer will pay attention to these metrics.

Similarly, if they are a sales copywriter, do they track conversions? How do they determine what makes their copy successful? Do they make updates to the copy to improve performance?

Not only will this consideration get you thinking about how you quantify results, but it will also help you identify a writer who is results-driven.

8. Understand Their Pricing Structure

There are many different types of pricing structures writers may use to charge for content.

The most common is price-per-word, where the writer provides a set cost per each word of content written.

Advertisement

Freelance writers can charge anywhere from $0.05 to $2.00 per word, depending on their experience.

Another common approach is cost per page/post. This is where the writer typically determines an approximate content length and set cost.

For example, a short blog post may cost around $150, whereas a long blog post may cost $300+. This option is great if you want the costs to be predictable.

Be sure to discuss the writer’s preferred pricing structure and rates before you start on a project. Ideally, get your agreement set in writing so there is no confusion over the terms.

9. Know What’s Provided In Their Services

Some SEO writers only include the content and the H1 and H2 tags. Others include all on-page SEO.

Even further, some provide keyword research or content planning. For any writer, ask what their services include and what needs to be provided by you.

Advertisement

Do they need you to do the keyword research and create the blog strategy? Get clear about that from the beginning.

You should also ask whether edits and/or rewrites are included. Complete rewrites are rare; don’t expect most writers to write an entirely new piece without compensation.

Typically, writers offer one to two rounds of edits, or a refund if they miss the mark.

10. Discuss Your Expectations

Hiring a writer is like any other professional relationship in that you need to discuss your expectations at the start.

Know what’s expected of you, make sure they know what’s expected of them, and outline a clear process when it comes to creating content together.

Note that some writers offer refunds, while others do not. Discuss this at the beginning (and get it in writing) before you find yourself in a pickle.

Advertisement

11. Know That Great Content Is An Investment

With all this talk about pricing and payment terms, you may be wondering, “How much does great content cost?”

Unfortunately, the answer isn’t simple. Writers’ rates vary based on their industry expertise, years of experience, the results they have generated for clients, their location, and a range of other factors.

But what remains true is that you get what you pay for. Don’t expect high-quality sales copy from a “cheap” AI content service. Don’t expect high conversions on sales pages written by a novice versus an expert.

When it comes to driving results, you’ll want a content writer or copywriter who understands the nuances of SEO and buyer psychology.

They likely have years of experience and a proven track record of delivering results for clients. And they likely aren’t cheap.

Consider what it’s worth to your business to have interesting, original, high-converting content. Do you want to pay pennies for basic copy? Or do you want content that will bring a return on investment (ROI)?

Advertisement

Final Thoughts

While there are mixed opinions on what constitutes “great” content and how much great content costs, it remains true that human writers are the source of the best content around.

Able to empathize with buyers’ experiences and craft unique stories, human writers are more equipped than AI to create content that resonates with an audience.

Finding the best writer for your brand depends on the type of content you need and the return you aim to generate from your content.

Your content “budget” should, then, be based on your willingness to invest in content that will achieve the results you want.

I recommend researching your options and outlining clear expectations with your writer from the beginning. That is the path to a positive writer-client relationship and great content for your brand.

More resources: 

Advertisement

Featured Image: Roman Samborskyi/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending