Connect with us


AI Content Is Short-Term Arbitrage, Not Long-Term Strategy



AI Content Is Short-Term Arbitrage, Not Long-Term Strategy

For a few hundred bucks, you can hit the big red “publish” button and use generative AI to write every article you’ve ever wanted to write. It’s sorely tempting.

But beyond the short-term dopamine hit of publishing a thousand articles at once, for most businesses, the negatives of AI content will very quickly outweigh the positives.

First up—there is precedent for getting a Google manual action for publishing AI content at scale.

Back in November, the founder of an AI content tool tweeted about their “SEO heist”. They exported a competitor’s sitemap, turned every URL into an article title, and used AI to publish 1,800 articles:

In some ways, this is part of the cat-and-mouse game of SEO. A website identifies a traffic opportunity, their competitors follow suit. But in the month following the tweet, the site’s traffic tanked to virtually zero:

Most of the site’s rankings plummeted into non-existence, courtesy of a manual action:

List of lost keyword rankings and traffic.List of lost keyword rankings and traffic.

Crucially, I don’t think that publishing AI content means an automatic penalty. AI content detectors don’t work, and even if they did, Google is apparently agnostic to AI use—but it is not agnostic to bad content or bad actors.

And AI makes it very easy to make bad content:

Annotated screenshot of low-quality AI-generated content.Annotated screenshot of low-quality AI-generated content.

I think the penalty happened because:

  • They published 1,800 pages of low-quality content, with no images, virtually no formatting, and many errors, and
  • They tweeted about it and caught Google’s attention.

Even if you don’t tweet about your AI content efforts, the precedent matters: publishing tons of AI content with no oversight is penalty-worthy. For any business building its traffic and audience for the long term, even a small risk of a catastrophic outcome (like a penalty) should give pause for thought.

AI content is, by its nature, mediocre. Mediocrity should not be the end goal of your content strategy.

LLMs, like ChatGPT, work through a kind of averaging. Words are chosen based on how often they appear in a similar context in the model’s dataset, generating “new” content based largely on what everyone else has already said. As Britney Muller explains in her guide to LLMs:


“Instead of randomly drawing a word out of a hat, an LLM will focus only on the most probable next words… It’s like a musician reading sheet music, moving through the notes one at a time. The goal is to figure out what the next word is likely to be as the model processes each word in the sentence.”

Britney MullerBritney Muller

To borrow a phrase from Britney, AI-generated content represents the literal “average of everything online.” That’s useful for topics where there’s a single, objective answer (“when was Abraham Lincoln born?”), but less useful for any topic that benefits from nuance, or differing perspectives, or firsthand experience.

You can play with different prompting strategies to alter and shape the structure and style of AI content. But even assuming you go to that length (many AI content tools don’t offer that freedom), you can’t escape a few realities of AI content:

  • It contains no information gain: it can’t conduct research, or share personal experience, or vocalize a defensible opinion.
  • It gets things wrong: it suffers from hallucinations and regurgitates common mistakes and errors.
  • It doesn’t understand you or your business: try getting AI content to tactfully showcase your product in your content (like we do at Ahrefs).

…and this is before we worry about leaking sensitive information, accidental copyright infringement, or the million ways in which unsupervised content could perpetuate bias and misinformation.

It’s easy to look at traffic graphs for AI content and think that “mediocre” content is good enough. But returning to the example of the “SEO heist”, most of their (now lost) rankings were limited to very low competition keywords (as measured by Keyword Difficulty in Ahrefs):

List of keyword rankings and their low keyword difficulty.List of keyword rankings and their low keyword difficulty.

Mediocre content might perform well in uncontested SERPs, but it isn’t enough to compete in SERPs where companies have invested actual effort and resources into their content.

And crucially, it leaves a bad impression on the living, breathing people who read it:

Let’s assume your AI content works. You publish hundreds of articles and generate thousands and thousands of visits. Is that really the boon it sounds like?

For most companies that pursue SEO, blog posts quickly become the primary source of website visitors. For an extreme example, look at the pages that generate the most organic traffic for Zapier—they are almost entirely blog posts:

List of Zapier's top pages by organic traffic.List of Zapier's top pages by organic traffic.

This is estimated organic traffic (and doesn’t include traffic from other sources), but the point is clear: most of the interactions people have with your company are mediated by content.

Many visitors won’t ever see your carefully crafted homepage or product landing pages. Their entire perception of your company—its ethos, beliefs, quality standards, helpfulness—will be shaped by the blog posts they read.

Are you happy with AI content making that first impression?

Think of the time and effort that went into your core website pages: endless variations of copy and messaging, illustrations and visual design, tone of voice, rounds of review and finessing… and compare it to the effort that goes into AI content, published en masse, unread, unedited.

It’s easy to think of content as “just an acquisition channel,” but in reality, your 800 AI-generated SEO posts will have a bigger impact on the public perception of your brand than your latest product landing page.

The point of content marketing is to help sales. Everything you create should, in some way, help people to buy your product or service.

The types of keywords AI content is good at ranking for are typically low commercial value and unlikely to lead to a sale. By way of example, here’s the estimated traffic value for the “SEO heist” site’s organic traffic, at its peak:

Graph of traffic value: $117k from 590k pageviews.Graph of traffic value: $117k from 590k pageviews.


Traffic value measures the equivalent monthly cost if a site’s traffic from all keyword rankings was paid for through PPC advertising—so it acts as a good proxy for the commercial value of a keyword (a high traffic value means companies think the keyword is lucrative enough to bid on).

And here’s the Ahrefs blog, with a similar amount of estimated organic traffic… and a traffic value six times higher:

Graph of traffic value: $721k from 570k pageviews.Graph of traffic value: $721k from 570k pageviews.

Most of the benefit of AI content boils down to lots of traffic, fast—the quality and purchase intent of that traffic is a distant second.

Great, if your entire business model is monetizing mountains of traffic through affiliate links or ad networks. But for every other type of business, traffic is only half the battle. In order to help sales and grow the business, content also needs to:

  • Leave a lasting impression and help readers remember your company.
  • Encourage people to visit again and again (and not bounce forever on the first post).
  • Build trust in and affinity for the real people behind the brand.

Here’s another AI content example. How well does this guide to “Removing Dashes from ISBN Numbers in Excel” tick those boxes?

Example of AI-generated contentExample of AI-generated content

AI content is good for generating traffic but bad at building trust. There’s no recognisable voice, no firsthand experience or narrative, and no real person behind the writing (unless you take the Sports Illustrated route and also create AI-generated authors for your content).

At best, it’s like reading a Wikipedia page: even if you help the reader solve a problem, they won’t remember you for it. While traffic is great (and more traffic is usually better than less), it can’t come at the expense of trust.

Here’s the most important problem with AI content: there is no barrier to entry. Anyone can do it, virtually for free. If it’s easy for you to publish 1,000 articles, it’s easy for your competitors to do the same, and their competitors, and their competitors…

So even assuming you get good results from AI content—how long will those results last?

At best, AI content is a form of short-term arbitrage, a small window of opportunity to build tons of traffic before a competitor, or a dozen competitors, decide to do the same. With most AI-generated content being pretty similar, there will be no “loyalty” from readers—they will read whatever ranks highest, and it will only be a matter of time before your content is challenged by a bigger fish, a company with a bigger budget and better SEO team.

Over time, you will be outcompeted by companies able to put more effort into their articles. So just skip right to the end of the cycle and create content that has a defensible moat:

  • Interview real people and share new information that other publications haven’t covered,
  • Collect original data in the form of industry surveys, polls, and data analysis,
  • Tell personal stories and share the unique, firsthand experiences of the topic that nobody else can.

Or put another way:

Final thoughts

There are plenty of good use cases for LLMs in SEO and content marketing. You can brainstorm keywords and titles, generate metadata and alt text at scale, write regex queries and code snippets, and generally use LLMs as useful inputs into your creative process.

But for most businesses, hitting the big red “publish” button and publishing thousands of AI-generated articles is a pretty bad use of LLMs, and a pretty bad idea overall. And even if AI content gets good enough to render most of these objections irrelevant, we will still have the problem of zero barrier to entry; if it’s easy for you to do, it’s easy for your competitors.

AI content is short-term arbitrage, not a long-term strategy.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address


Technical SEO Checklist for 2024: A Comprehensive Guide



Technical SEO Checklist 2024 Comprehensive Strategies

Technical SEO Checklist 2024 Comprehensive Strategies

With Google getting a whopping total of six algorithmic updates and four core updates in 2023, you can bet the search landscape is more complicated (and competitive) to navigate nowadays.

To succeed in SEO this year, you will need to figure out what items to check and optimize to ensure your website stays visible. And if your goal is to not just make your website searchable, but have it rank at the top of search engine results, this technical SEO checklist for 2024 is essential.

Webmaster’s Note: This is part one of our three-part SEO checklist for 2024. I also have a longer guide on advanced technical SEO, which covers best practices and how to troubleshoot and solve common technical issues with your websites.

Technical SEO Essentials for 2024

Technical SEO refers to optimizations that are primarily focused on helping search engines access, crawl, interpret, and index your website without any issues. It lays the foundation for your site to be properly understood and served up by search engines to users.

1. Website Speed Optimization

A site’s loading speed is a significant ranking factor for search engines like Google, which prioritize user experience. Faster websites generally provide a more pleasant user experience, leading to increased engagement and improved conversion rates.

Server Optimization

Often, the reason why your website is loading slowly is because of the server it’s hosted on. It’s important to choose a high-quality server that ensures quick loading times from the get-go so you skip the headache that is server optimization.

Google recommends keeping your server response time under 200ms. To check your server’s response time, you need to know your website’s IP address. Once you have that, use your command prompt.

In the window that appears, type ping, followed by your website’s IP address. Press enter and the window should show how long it took your server to respond. 

If you find that your server goes above the recommended 200ms loading time, here’s what you need to check:

  1. Collect the data from your server and identify what is causing your response time to increase. 
  2. Based on what is causing the problem, you will need to implement server-side optimizations. This guide on how to reduce initial server response times can help you here.
  3. Measure your server response times after optimization to use as a benchmark. 
  4. Monitor any regressions after optimization.

If you work with a hosting service, then you should contact them when you need to improve server response times. A good hosting provider should have the right infrastructure, network connections, server hardware, and support services to accommodate these optimizations. They may also offer hosting options if your website needs more server resources to run smoothly.

Website Optimization

Aside from your server, there are a few other reasons that your website might be loading slowly. 

Here are some practices you can do:

  1. Compressing images to decrease file sizes without sacrificing quality
  2. Minimizing the code, eliminating unnecessary spaces, comments, and indentation.
  3. Using caching to store some data locally in a user’s browser to allow for quicker loading on subsequent visits.
  4. Implementing Content Delivery Networks (CDNs) to distribute the load, speeding up access for users situated far from the server.
  5. Lazy load your web pages to prioritize loading the objects or resources only your users need.

A common tool to evaluate your website speed is Google’s PageSpeed Insights or Google Lighthouse. Both tools can analyze the content of your website and then generate suggestions to improve its overall loading speed, all for free. There are also some third-party tools, like GTMetrix, that you could use as well.

Here’s an example of one of our website’s speeds before optimization. It’s one of the worst I’ve seen, and it was affecting our SEO.

slow site speed score from GTMetrixslow site speed score from GTMetrix

So we followed our technical SEO checklist. After working on the images, removing render-blocking page elements, and minifying code, the score greatly improved — and we saw near-immediate improvements in our page rankings. 

site speed optimization results from GTMetrixsite speed optimization results from GTMetrix

That said, playing around with your server settings, coding, and other parts of your website’s backend can mess it up if you don’t know what you’re doing. I suggest backing up all your files and your database before you start working on your website speed for that reason. 

2. Mobile-First Indexing

Mobile-first Indexing is a method used by Google that primarily uses the mobile version of the content for indexing and ranking. 

It’s no secret that Google places a priority on the mobile users’ experience, what with mobile-first indexing being used. Beyond that, optimizing your website for mobile just makes sense, given that a majority of people now use their phones to search online.

This change signifies that a fundamental shift in your approach to your website development and design is needed, and it should also be part of your technical SEO checklist.

  1. Ensuring the mobile version of your site contains the same high-quality, rich content as the desktop version.
  2. Make sure metadata is present on both versions of your site.
  3. Verify that structured data is present on both versions of your site.

Tools like Google’s mobile-friendly test can help you measure how effectively your mobile site is performing compared to your desktop versions, and to other websites as well.

3. Crawlability & Indexing Check

Always remember that crawlability and Indexing are the cornerstones of SEO. Crawlability refers to a search engine’s ability to access and crawl through a website’s content. Indexing is how search engines organize information after a crawl and before presenting results.

  1. Utilizing a well-structured robots.txt file to communicate with web crawlers about which of your pages should not be processed or scanned.
  2. Using XML sitemaps to guide search engines through your site’s content and ensure that all valuable content is found and indexed. There are several CMS plugins you can use to generate your sitemap.
  3. Ensuring that your website has a logical structure with a clear hierarchy, helps both users and bots navigate to your most important pages easily. 

Google Search Console is the tool you need to use to ensure your pages are crawled and indexed by Google. It also provides reports that identify any problems that prevent crawlers from indexing your pages. 

4. Structured Data Markup

Structured Data Markup is a coding language that communicates website information in a more organized and richer format to search engines. This plays a strategic role in the way search engines interpret and display your content, enabling enhanced search results through “rich snippets” such as stars for reviews, prices for products, or images for recipes.

Doing this allows search engines to understand and display extra information directly in the search results from it.

Key Takeaway

With all the algorithm changes made in 2023, websites need to stay adaptable and strategic to stay at the top of the search results page. Luckily for you, this technical SEO checklist for 2024 can help you do just that. Use this as a guide to site speed optimization, indexing, and ensuring the best experience for mobile and desktop users.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading


Why Google Seems To Favor Big Brands & Low-Quality Content




Why Google Seems To Favor Big Brands & Low-Quality Content

Many people are convinced that Google shows a preference for big brands and ranking low quality content, something that many feel has become progressively worse. This may not be a matter of perception, something is going on, nearly everyone has an anecdote of poor quality search results. The possible reasons for it are actually quite surprising.

Google Has Shown Favoritism In The Past

This isn’t the first time that Google’s search engine results pages (SERPs) have shown a bias that favored big brand websites. During the early years of Google’s algorithm it was obvious that sites with a lot of PageRank ranked for virtually anything they wanted.

For example, I remember a web design company that built a lot of websites, creating a network of backlinks, raising their PageRank to a remarkable level normally seen only in big corporate sites like IBM. As a consequence they ranked for the two-word keyword phrase, Web Design and virtually every other variant like Web Design + [any state in the USA].

Everyone knew that websites with a PageRank of 10, the highest level shown on Google’s toolbar, practically had a free pass in the SERPs, resulting in big brand sites outranking more relevant webpages. It didn’t go unnoticed when Google eventually adjusted their algorithm to fix this issue.

The point of this anecdote is to point out an instance of where Google’s algorithm unintentionally created a bias that favored big brands.

Here are are other  algorithm biases that publishers exploited:

  • Top 10 posts
  • Longtail “how-to” articles
  • Misspellings
  • Free Widgets in footer that contained links (always free to universities!)

Big Brands And Low Quality Content

There are two things that have been a constant for all of Google’s history:

  • Low quality content
  • Big brands crowding out small independent publishers

Anyone that’s ever searched for a recipe knows that the more general the recipe the lower the quality of recipe that gets ranked. Search for something like cream of chicken soup and the main ingredient for nearly every recipe is two cans of chicken soup.

A search for Authentic Mexican Tacos results in recipes with these ingredients:

  • Soy sauce
  • Ground beef
  • “Cooked chicken”
  • Taco shells (from the store!)
  • Beer

Not all recipe SERPs are bad. But some of the more general recipes Google ranks are so basic that a hobo can cook them on a hotplate.

Robin Donovan (Instagram), a cookbook author and online recipe blogger observed:

“I think the problem with google search rankings for recipes these days (post HCU) are much bigger than them being too simple.

The biggest problem is that you get a bunch of Reddit threads or sites with untested user-generated recipes, or scraper sites that are stealing recipes from hardworking bloggers.

In other words, content that is anything but “helpful” if what you want is a tested and well written recipe that you can use to make something delicious.”

Explanations For Why Google’s SERPs Are Broken

It’s hard not to get away from the perception that Google’s rankings for a variety of topics always seem to default to big brand websites and low quality webpages.

Small sites grow to become big brands that dominate the SERPs, it happens. But that’s the thing, even when a small site gets big, it’s now another big brand dominating the SERPs.

Typical explanations for poor SERPs:

  • It’s a conspiracy to increase ad clicks
  • Content itself these days are low quality across the board
  • Google doesn’t have anything else to rank
  • It’s the fault of SEOs
  • Affiliates
  • Poor SERPs is Google’s scheme to drive more ad clicks
  • Google promotes big brands because [insert your conspiracy]

So what’s going on?

People Love Big Brands & Garbage Content

The recent Google anti-trust lawsuit exposed the importance of the Navboost algorithm signals as a major ranking factor. Navboost is an algorithm that interprets user engagement signals to understand what topics a webpage is relevant for, among other things.

The idea of using engagement signals as an indicator of what users expect to see makes sense. After all, Google is user-centric and who better to decide what’s best for users than the users themselves, right?

Well, consider that arguably the the biggest and most important song of 1991, Smells Like Teen Spirt by Nirvana, didn’t make the Billboard top 100 for that year. Michael Bolton and Rod Stewart made the list twice, with Rod Stewart top ranked for a song called “The Motown Song” (anyone remember that one?)

Nirvana didn’t make the charts until the next year…

My opinion, given that we know that user interactions are a strong ranking signal, is that Google’s search rankings follow a similar pattern related to users’ biases.

People tend to choose what they know. It’s called a Familiarity Bias.

Consumers have a habit of choosing things that are familiar over those that are unfamiliar. This preference shows up in product choices that prefer brands, for example.

Behavioral scientist, Jason Hreha, defines Familiarity Bias like this:

“The familiarity bias is a phenomenon in which people tend to prefer familiar options over unfamiliar ones, even when the unfamiliar options may be better. This bias is often explained in terms of cognitive ease, which is the feeling of fluency or ease that people experience when they are processing familiar information. When people encounter familiar options, they are more likely to experience cognitive ease, which can make those options seem more appealing.”

Except for certain queries (like those related to health), I don’t think Google makes an editorial decision to certain kinds of websites, like brands.

Google uses many signals for ranking. But Google is strongly user focused.

I believe it’s possible that strong user preferences can carry a more substantial weight than Reviews System signals. How else to explain why Google seemingly has a bias for big brand websites with fake reviews rank better than honest independent review sites?

It’s not like Google’s algorithms haven’t created poor search results in the past.

  • Google’s Panda algorithm was designed to get rid of a bias for cookie cutter content.
  • The Reviews System is a patch to fix Google’s bias for content that’s about reviews but aren’t necessarily reviews.

If Google has systems for catching low quality sites that their core algorithm would otherwise rank, why do big brands and poor quality content still rank?

I believe the answer is that is what users prefer to see those sites, as indicated by user interaction signals.

The big question to ask is whether Google will continue to rank what users biases and inexperience trigger user satisfaction signals.  Or will Google continue serving the sugar-frosted bon-bons that users crave?

Should Google make the choice to rank quality content at the risk that users find it too hard to understand?

Or should publishers give up and focus on creating for the lowest common denominator like the biggest popstars do?

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading


Google Announces Gemma: Laptop-Friendly Open Source AI




Google Announces Gemma: Laptop-Friendly Open Source AI

Google released an open source large language model based on the technology used to create Gemini that is powerful yet lightweight, optimized to be used in environments with limited resources like on a laptop or cloud infrastructure.

Gemma can be used to create a chatbot, content generation tool and pretty much anything else that a language model can do. This is the tool that SEOs have been waiting for.

It is released in two versions, one with two billion parameters (2B) and another one with seven billion parameters (7B). The number of parameters indicates the model’s complexity and potential capability. Models with more parameters can achieve a better understanding of language and generate more sophisticated responses, but they also require more resources to train and run.

The purpose of releasing Gemma is to democratize access to state of the art Artificial Intelligence that is trained to be safe and responsible out of the box, with a toolkit to further optimize it for safety.

Gemma By DeepMind

The model is developed to be lightweight and efficient which makes it ideal for getting it into the hands of more end users.

Google’s official announcement noted the following key points:

  • “We’re releasing model weights in two sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants.
  • A new Responsible Generative AI Toolkit provides guidance and essential tools for creating safer AI applications with Gemma.
  • We’re providing toolchains for inference and supervised fine-tuning (SFT) across all major frameworks: JAX, PyTorch, and TensorFlow through native Keras 3.0.
  • Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM, make it easy to get started with Gemma.
  • Pre-trained and instruction-tuned Gemma models can run on your laptop, workstation, or Google Cloud with easy deployment on Vertex AI and Google Kubernetes Engine (GKE).
  • Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
  • Terms of use permit responsible commercial usage and distribution for all organizations, regardless of size.”

Analysis Of Gemma

According to an analysis by an Awni Hannun, a machine learning research scientist at Apple, Gemma is optimized to be highly efficient in a way that makes it suitable for use in low-resource environments.

Hannun observed that Gemma has a vocabulary of 250,000 (250k) tokens versus 32k for comparable models. The importance of that is that Gemma can recognize and process a wider variety of words, allowing it to handle tasks with complex language. His analysis suggests that this extensive vocabulary enhances the model’s versatility across different types of content. He also believes that it may help with math, code and other modalities.

It was also noted that the “embedding weights” are massive (750 million). The embedding weights are a reference to the parameters that help in mapping words to representations of their meanings and relationships.

An important feature he called out is that the embedding weights, which encode detailed information about word meanings and relationships, are used not just in processing input part but also in generating the model’s output. This sharing improves the efficiency of the model by allowing it to better leverage its understanding of language when producing text.

For end users, this means more accurate, relevant, and contextually appropriate responses (content) from the model, which improves its use in conetent generation as well as for chatbots and translations.

He tweeted:

“The vocab is massive compared to other open source models: 250K vs 32k for Mistral 7B

Maybe helps a lot with math / code / other modalities with a heavy tail of symbols.

Also the embedding weights are big (~750M params), so they get shared with the output head.”

In a follow-up tweet he also noted an optimization in training that translates into potentially more accurate and refined model responses, as it enables the model to learn and adapt more effectively during the training phase.

He tweeted:

“The RMS norm weight has a unit offset.

Instead of “x * weight” they do “x * (1 + weight)”.

I assume this is a training optimization. Usually the weight is initialized to 1 but likely they initialize close to 0. Similar to every other parameter.”

He followed up that there are more optimizations in data and training but that those two factors are what especially stood out.

Designed To Be Safe And Responsible

An important key feature is that it is designed from the ground up to be safe which makes it ideal for deploying for use. Training data was filtered to remove personal and sensitive information. Google also used reinforcement learning from human feedback (RLHF) to train the model for responsible behavior.

It was further debugged with manual re-teaming, automated testing and checked for capabilities for unwanted and dangerous activities.

Google also released a toolkit for helping end-users further improve safety:

“We’re also releasing a new Responsible Generative AI Toolkit together with Gemma to help developers and researchers prioritize building safe and responsible AI applications. The toolkit includes:

  • Safety classification: We provide a novel methodology for building robust safety classifiers with minimal examples.
  • Debugging: A model debugging tool helps you investigate Gemma’s behavior and address potential issues.
  • Guidance: You can access best practices for model builders based on Google’s experience in developing and deploying large language models.”

Read Google’s official announcement:

Gemma: Introducing new state-of-the-art open models

Featured Image by Shutterstock/Photo For Everything

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading


Follow by Email