Connect with us

SEO

Are ChatGPT, Bard and Dolly 2.0 Trained On Pirated Content?

Published

on

Are ChatGPT, Bard and Dolly 2.0 Trained On Pirated Content?

Large Language Models (LLMs) like ChatGPT, Bard and even open source versions are trained on public Internet content. But there are also indications that popular AIs might also be trained on datasets created from pirated books.

Is Dolly 2.0 Trained on Pirated Content?

Dolly 2.0 is an open source AI that was recently released. The intent behind Dolly is to democratize AI by  making it available to everyone who wants to create something with it, even commercial products.

But there’s also a privacy issue with concentrating AI technology in the hands of three major corporations and trusting them with private data.

Given a choice, many businesses would prefer to not hand off private data to third parties like Google, OpenAI and Meta.

Even Mozilla, the open source browser and app company, is investing in growing the open source AI ecosystem.

The intent behind open source AI is unquestionably good.

But there is  an issue with the data that is used to train these large language models because some of it consists of pirated content.

Open source ChatGPT clone, Dolly 2.0, was created by a company called DataBricks  (learn more about Dolly 2.0)

Dolly 2.0 is based on an Open Source Large Language Model (LLM) called Pythia (which was created by an open source group called, EleutherAI).

EleutherAI created eight versions of LLMs of different sizes within the Pythia family of LLMs.

One version of Pythia, a 12 billion parameter version, is the one used by DataBricks to create Dolly 2.0, as well as with a dataset that DataBricks created themselves (a dataset of questions and answers that was used to train the Dolly 2.0 AI to take instructions)

The thing about the EleutherAI Pythia LLM is that it was trained using a dataset called the Pile.

The Pile dataset is comprised of multiple sets of English language texts, one of which is a dataset called Books3. The Books3 dataset contains the text of books that were pirated and hosted at a pirate site called, bibliotik.

This is what the DataBricks announcement says:

“Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees.”

Pythia LLM Was Created With the Pile Dataset

The Pythia research paper by EleutherAI that mentions that Pythia was trained using the Pile dataset.

This is a quote from the Pythia research paper:

“We train 8 model sizes each on both the Pile …and the Pile after deduplication, providing 2 copies of the suite which can be compared.”

Deduplication means that they removed redundant data, it’s a process for creating a cleaner dataset.

So what’s in Pile? There’s a Pile research paper that explains what’s in that dataset.

Here’s a quote from the research paper for Pile where it says that they use the Books3 dataset:

“In addition we incorporate several existing highquality datasets: Books3 (Presser, 2020)…”

The Pile dataset research paper links to a tweet by Shawn Presser, that says what is in the Books3 dataset:

“Suppose you wanted to train a world-class GPT model, just like OpenAI. How? You have no data.

Now you do. Now everyone does.

Presenting “books3”, aka “all of bibliotik”

– 196,640 books
– in plain .txt
– reliable, direct download, for years: https://the-eye.eu/public/AI/pile_preliminary_components/books3.tar.gz”

So… the above quote clearly states that the Pile dataset was used to train the Pythia LLM which in turn served as the foundation for the Dolly 2.0 open source AI.

Is Google Bard Trained on Pirated Content?

The Washington Post recently published a review of Google’s Colossal Clean Crawled Corpus dataset (also known as C4 – PDF research paper here) in which they discovered that Google’s dataset also contains pirated content.

The C4 dataset is important because it’s one of the datasets used to train Google’s LaMDA LLM, a version of which is what Bard is based on.

The actual dataset is called Infiniset and the C4 dataset makes up about 12.5% of the total text used to train LaMDA. Citations to those facts about Bard can be found here.

The Washington Post news article published:

“The three biggest sites were patents.google.com No. 1, which contains text from patents issued around the world; wikipedia.org No. 2, the free online encyclopedia; and scribd.com No. 3, a subscription-only digital library.

Also high on the list: b-ok.org No. 190, a notorious market for pirated e-books that has since been seized by the U.S. Justice Department.

At least 27 other sites identified by the U.S. government as markets for piracy and counterfeits were present in the data set.”

The flaw in the Washington Post analysis is that they’re looking at a version of the C4 but not necessarily the one that LaMDA was trained on.

The research paper for the C4 dataset was published in July 2020. Within a year of publication another research paper was published that discovered that the C4 dataset was biased against people of color and the LGBT community.

The research paper is titled, Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus (PDF research paper here).

It was discovered by the researchers that the dataset contained negative sentiment against people of Arab identies and excluded documents that were associated with Blacks, Hispanics, and documents that mention sexual orientation.

The researchers wrote:

“Our examination of the excluded data suggests that documents associated with Black and Hispanic authors and documents mentioning sexual orientations are significantly more likely to be excluded by C4.EN’s blocklist filtering, and that many excluded documents contained non-offensive or non-sexual content (e.g., legislative discussions of same-sex marriage, scientific and medical content).

This exclusion is a form of allocational harms …and exacerbates existing (language-based) racial inequality as well as stigmatization of LGBTQ+ identities…

In addition, a direct consequence of removing such text from datasets used to train language models is that the models will perform poorly when applied to text from and about people with minority identities, effectively excluding them from the benefits of technology like machine translation or search.”

It was concluded that the filtering of “bad words” and other attempts to “clean” the dataset was too simplistic and warranted are more nuanced approach.

Those conclusions are important because they show that it was well known that the C4 dataset was flawed.

LaMDA was developed in 2022 (two years after the C4 dataset) and the associated LaMDA research paper says that it was trained with C4.

But that’s just a research paper. What happens in real-life on a production model can be vastly different from what’s in the research paper.

When discussing a research paper it’s important to remember that Google consistently says that what’s in a patent or research paper isn’t necessarily what’s in use in Google’s algorithm.

Google is highly likely to be aware of those conclusions and it’s not unreasonable to assume that Google developed a new version of C4 for the production model, not just to address inequities in the dataset but to bring it up to date.

Google doesn’t say what’s in their algorithm, it’s a black box. So we can’t say with certainty that the technology underlying Google Bard was trained on pirated content.

To make it even clearer, Bard was released in 2023, using a lightweight version of LaMDA. Google has not defined what a lightweight version of LaMDA is.

So there’s no way to know what content was contained within the datasets used to train the lightweight version of LaMDA that powers Bard.

One can only speculate as to what content was used to train Bard.

Does GPT-4 Use Pirated Content?

OpenAI is extremely private about the datasets used to train GPT-4. The last time OpenAI mentioned datasets is in the PDF research paper for GPT-3 published in 2020 and even there it’s somewhat vague and imprecise about what’s in the datasets.

The TowardsDataScience website in 2021 published an interesting review of the available information in which they conclude that indeed some pirated content was used to train early versions of GPT.

They write:

“…we find evidence that BookCorpus directly violated copyright restrictions for hundreds of books that should not have been redistributed through a free dataset.

For example, over 200 books in BookCorpus explicitly state that they “may not be reproduced, copied and distributed for commercial or non-commercial purposes.””

It’s difficult to conclude whether GPT-4 used any pirated content.

Is There A Problem With Using Pirated Content?

One would think that it may be unethical to use pirated content to train a large language model and profit from the use of that content.

But the laws may actually allow this kind of use.

I asked Kenton J. Hutcherson, Internet Attorney at Hutcherson Law what he thought about the use of pirated content in the context of training large language models.

Specifically, I asked if someone uses Dolly 2.0, which may be partially created with pirated books, would commercial entities who create applications with Dolly 2.0 be exposed to copyright infringement claims?

Kenton answered:

“A claim for copyright infringement from the copyright holders of the pirated books would likely fail because of fair use.

Fair use protects transformative uses of copyrighted works.

Here, the pirated books are not being used as books for people to read, but as inputs to an artificial intelligence training dataset.

A similar example came into play with the use of thumbnails on search results pages. The thumbnails are not there to replace the webpages they preview. They serve a completely different function—they preview the page.

That is transformative use.”

Karen J. Bernstein of Bernstein IP offered a similar opinion.

“Is the use of the pirated content a fair use? Fair use is a commonly used defense in these instances.

The concept of the fair use defense only exists under US copyright law.

Fair use is analyzed under a multi-factor analysis that the Supreme Court set forth in a 1994 landmark case.

Under this scenario, there will be questions of how much of the pirated content was taken from the books and what was done to the content (was it “transformative”), and whether such content is taking the market away from the copyright creator.”

AI technology is bounding forward at an unprecedented pace, seemingly evolving on a week to week basis. Perhaps in a reflection of the competition and the financial windfall to be gained from success, Google and OpenAI are becoming increasingly private about how their AI models are trained.

Should they be more open about such information? Can they be trusted that their datasets are fair and non-biased?

The use of pirated content to create these AI models may be legally protected as fair use, but just because one can does that mean one should?

Featured image by Shutterstock/Roman Samborskyi



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

10 Tips on How to Rock a Small PPC Budget

Published

on

By

10 Tips on How to Rock a Small PPC Budget

Many advertisers have a tight budget for pay-per-click (PPC) advertising, making it challenging to maximize results.

One of the first questions that often looms large is, “How much should we spend?” It’s a pivotal question, one that sets the stage for the entire PPC strategy.

Read on for tips to get started or further optimize budgets for your PPC program to maximize every dollar spent.

1. Set Expectations For The Account

With a smaller budget, managing expectations for the size and scope of the account will allow you to keep focus.

A very common question is: How much should our company spend on PPC?

To start, you must balance your company’s PPC budget with the cost, volume, and competition of keyword searches in your industry.

You’ll also want to implement a well-balanced PPC strategy with display and video formats to engage consumers.

First, determine your daily budget. For example, if the monthly budget is $2,000, the daily budget would be set at $66 per day for the entire account.

The daily budget will also determine how many campaigns you can run at the same time in the account because that $66 will be divided up among all campaigns.

Be aware that Google Ads and Microsoft Ads may occasionally exceed the daily budget to maximize results. The overall monthly budget, however, should not exceed the Daily x Number of Days in the Month.

Now that we know our daily budget, we can focus on prioritizing our goals.

2. Prioritize Goals

Advertisers often have multiple goals per account. A limited budget will also limit the number of campaigns – and the number of goals – you should focus on.

Some common goals include:

  • Brand awareness.
  • Leads.
  • Sales.
  • Repeat sales.

In the example below, the advertiser uses a small budget to promote a scholarship program.

They are using a combination of leads (search campaign) and awareness (display campaign) to divide up a daily budget of $82.

Screenshot from author, May 2024

The next several features can help you laser-focus campaigns to allocate your budget to where you need it most.

Remember, these settings will restrict traffic to the campaign. If you aren’t getting enough traffic, loosen up/expand the settings.

3. Location Targeting

Location targeting is a core consideration in reaching the right audience and helps manage a small ad budget.

To maximize a limited budget, you should focus on only the essential target locations where your customers are located.

While that seems obvious, you should also consider how to refine that to direct the limited budget to core locations. For example:

  • You can refine location targeting by states, cities, ZIP codes, or even a radius around your business.
  • Choosing locations to target should be focused on results.
  • The smaller the geographic area, the less traffic you will get, so balance relevance with budget.
  • Consider adding negative locations where you do not do business to prevent irrelevant clicks that use up precious budget.

If the reporting reveals targeted locations where campaigns are ineffective, consider removing targeting to those areas. You can also try a location bid modifier to reduce ad serving in those areas.

managing ppc budget by location interactionScreenshot by author from Google Ads, May 2024

4. Ad Scheduling

Ad scheduling also helps to control budget by only running ads on certain days and at certain hours of the day.

With a smaller budget, it can help to limit ads to serve only during hours of business operation. You can choose to expand that a bit to accommodate time zones and for searchers doing research outside of business hours.

If you sell online, you are always open, but review reporting for hourly results over time to determine if there are hours of the day with a negative return on investment (ROI).

Limit running PPC ads if the reporting reveals hours of the day when campaigns are ineffective.

Manage a small ppc budget by hour of dayScreenshot by author from Google Ads, May 2024

5. Set Negative Keywords

A well-planned negative keyword list is a golden tactic for controlling budgets.

The purpose is to prevent your ad from showing on keyword searches and websites that are not a good match for your business.

  • Generate negative keywords proactively by brainstorming keyword concepts that may trigger ads erroneously.
  • Review query reports to find irrelevant searches that have already led to clicks.
  • Create lists and apply to the campaign.
  • Repeat on a regular basis because ad trends are always evolving!

6. Smart Bidding

Smart Bidding is a game-changer for efficient ad campaigns. Powered by Google AI, it automatically adjusts bids to serve ads to the right audience within budget.

The AI optimizes the bid for each auction, ideally maximizing conversions while staying within your budget constraints.

Smart bidding strategies available include:

  • Maximize Conversions: Automatically adjust bids to generate as many conversions as possible for the budget.
  • Target Return on Ad Spend (ROAS): This method predicts the value of potential conversions and adjusts bids in real time to maximize return.
  • Target Cost Per Action (CPA): Advertisers set a target cost-per-action (CPA), and Google optimizes bids to get the most conversions within budget and the desired cost per action.

7. Try Display Only Campaigns

display ads for small ppc budgetsScreenshot by author from Google Ads, May 2024

For branding and awareness, a display campaign can expand your reach to a wider audience affordably.

Audience targeting is an art in itself, so review the best options for your budget, including topics, placements, demographics, and more.

Remarketing to your website visitors is a smart targeting strategy to include in your display campaigns to re-engage your audience based on their behavior on your website.

Let your ad performance reporting by placements, audiences, and more guide your optimizations toward the best fit for your business.

audience targeting options for small ppc budgetScreenshot by Lisa Raehsler from Google Ads, May 2024

8. Performance Max Campaigns

Performance Max (PMax) campaigns are available in Google Ads and Microsoft Ads.

In short, automation is used to maximize conversion results by serving ads across channels and with automated ad formats.

This campaign type can be useful for limited budgets in that it uses AI to create assets, select channels, and audiences in a single campaign rather than you dividing the budget among multiple campaign types.

Since the success of the PMax campaign depends on the use of conversion data, that data will need to be available and reliable.

9. Target Less Competitive Keywords

Some keywords can have very high cost-per-click (CPC) in a competitive market. Research keywords to compete effectively on a smaller budget.

Use your analytics account to discover organic searches leading to your website, Google autocomplete, and tools like Google Keyword Planner in the Google Ads account to compare and get estimates.

In this example, a keyword such as “business accounting software” potentially has a lower CPC but also lower volume.

Ideally, you would test both keywords to see how they perform in a live campaign scenario.

comparing keywords for small ppc budgetsScreenshot by author from Google Ads, May 2024

10. Manage Costly Keywords

High volume and competitive keywords can get expensive and put a real dent in the budget.

In addition to the tip above, if the keyword is a high volume/high cost, consider restructuring these keywords into their own campaign to monitor and possibly set more restrictive targeting and budget.

Levers that can impact costs on this include experimenting with match types and any of the tips in this article. Explore the opportunity to write more relevant ad copy to these costly keywords to improve quality.

Every Click Counts

As you navigate these strategies, you will see that managing a PPC account with a limited budget isn’t just about monetary constraints.

Rocking your small PPC budgets involves strategic campaign management, data-driven decisions, and ongoing optimizations.

In the dynamic landscape of paid search advertising, every click counts, and with the right approach, every click can translate into meaningful results.

More resources: 


Featured Image: bluefish_ds/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

What Are They Really Costing You?

Published

on

By

What Are They Really Costing You?

This post was sponsored by Adpulse. The opinions expressed in this article are the sponsor’s own.

As managers of paid media, one question drives us all: “How do I improve paid ad performance?”. 

Given that our study found close variant search terms perform poorly, yet more than half of the average budget on Google & Microsoft Ads is being spent on them, managing their impact effectively could well be one of your largest optimization levers toward driving significant improvements in ROI. 

“Close variants help you connect with people who are looking for your business, despite slight variations in the way they search.” support.google.com

Promising idea…but what about the execution?

We analyzed over 4.5 million clicks and 400,000 conversions to answer this question: With the rise in close variants (intent matching) search terms, what impact are they having on budgets and account performance? Spoiler alert, the impact is substantial. 


True Match Vs. Close Variants: How Do They Perform?

To understand close variant (CV) performance, we must first define the difference between a true match and a close variant. 

 

What Is a True Match? 

We still remember the good-old-days where keyword match types gave you control over the search terms they triggered, so for this study we used the literal match types to define ‘close variant’ vs ‘true match’. 

  • Exact match keyword => search term matches the keyword exactly. 
  • Phrase match keyword => search term must contain the keyword (same word order).
  • Broad match keyword => search term must contain every individual word in the keyword, but the word order does not matter (the way modified broad match keywords used to work).   

 

What Is a Close Variant? 

If you’re not familiar with close variants (intent matching) search terms, think of them as search terms that are ‘fuzzy matched’ to the keywords you are actually bidding on. 

Some of these close variants are highly relevant and represent a real opportunity to expand your keywords in a positive way. 

Some are close-ish, but the conversions are expensive. 

And (no shocks here) some are truly wasteful. 

….Both Google and Microsoft Ads do this, and you can’t opt-out.

To give an example: if you were a music therapist, you might bid on the phrase match keyword “music therapist”. An example of a true match search term would be ‘music therapist near me’ because it contains the keyword in its true form (phrase match in this case) and a CV might be ‘music and art therapy’.


How Do Close Variants Compare to True Match?

Short answer… poorly, on both Google and Microsoft Ads. Interestingly however, Google showed the worst performance on both metrics assessed, CPA and ROAS. 

Image created by Adpulse, May 2024

1718772963 395 What Are They Really Costing You

Image created by Adpulse, May 2024

Want to see the data – jump to it here…

CVs have been embraced by both platforms with (as earlier stated), on average more than half of your budget being spent on CV variant matches. That’s a lot of expansion to reach searches you’re not directly bidding for, so it’s clearly a major driver of performance in your account and, therefore, deserving of your attention. 

We anticipated a difference in metrics between CVs and true match search terms, since the true match search terms directly align with the keywords you’re bidding on, derived from your intimate knowledge of the business offering. 

True match conversions should therefore be the low-hanging fruit, leaving the rest for the platforms to find via CVs. Depending on the cost and ROI, this isn’t inherently bad, but logically we would assume CVs would perform worse than true matches, which is exactly what we observed. 


How Can You Limit Wastage on Close Variants?

You can’t opt out of them, however, if your goal is to manage their impact on performance, you can use these three steps to move the needle in the right direction. And of course, if you’re relying on CVs to boost volume, you’ll need to take more of a ‘quality-screening’ rather than a hard-line ‘everything-must-go’ approach to your CV clean out!

 

Step 1: Diagnose Your CV Problem 

We’re a helpful bunch at Adpulse so while we were scoping our in-app solution, we built a simple spreadsheet that you can use to diagnose how healthy your CVs are. Just make a copy, paste in your keyword and search term data then run the analysis for yourself. Then you can start to clean up any wayward CVs identified. Of course, by virtue of technology, it’s both faster and more advanced in the Adpulse Close Variant Manager 😉.

 

Step 2: Suggested Campaign Structures for Easier CV Management  

Brand Campaigns

If you don’t want competitors or general searches being matched to your brand keywords, this strategy will solve for that. 

Set up one ad group with your exact brand keyword/s, and another ad group with phrase brand keyword/s, then employ the negative keyword strategies in Step 3 below. You might be surprised at how many CVs have nothing to do with your brand, and identifying variants (and adding negative keywords) becomes easy with this structure.

Don’t forget to add your phrase match brand negatives to non-brand campaigns (we love negative lists for this).

Non-Brand Campaigns with Larger Budgets

We suggest a campaign structure with one ad group per match type:

Example Ad Groups:

    • General Plumbers – Exact
    • General Plumbers – Phrase
    • General Plumbers – Broad
    • Emergency Plumbers – Exact
    • Emergency Plumbers – Phrase
    • Emergency Plumbers – Broad

This allows you to more easily identify variants so you can eliminate them quickly. This also allows you to find new keyword themes based on good quality CVs, and add them easily to the campaign. 

Non-Brand Campaigns with Smaller Budgets

Smaller budgets mean the upside of having more data per ad group outweighs the upside of making it easier to trim unwanted CVs, so go for a simpler theme-based ad group structure:

Example Ad Groups:

    • General Plumbers
    • Emergency Plumbers

 

Step 3: Ongoing Actions to Tame Close Variants

Adding great CVs as keywords and poor CVs as negatives on a regular basis is the only way to control their impact.

For exact match ad groups we suggest adding mainly root negative keywords. For example, if you were bidding on [buy mens walking shoes] and a CV appeared for ‘mens joggers’, you could add the single word “joggers” as a phrase/broad match negative keyword, which would prevent all future searches that contain joggers. If you added mens joggers as a negative keyword, other searches that contain the word joggers would still be eligible to trigger. 

In ad groups that contain phrase or broad match keywords you shouldn’t use root negatives unless you’re REALLY sure that the root negative should never appear in any search term. You’ll probably find that you use the whole search term added as an exact match negative much more often than using root negs.


The Proof: What (and Why) We Analyzed

We know CVs are part of the conversations marketers frequently have, and by virtue of the number of conversations we have with agencies each week, we’ve witnessed the increase of CV driven frustration amongst marketers. 

Internally we reached a tipping point and decided to data dive to see if it just felt like a large problem, or if it actually IS a large enough problem that we should devote resources to solving it in-app. First stop…data. 

Our study of CV performance started with thousands of Google and Microsoft Ads accounts, using last 30-day data to May 2024, filtered to exclude:

  • Shopping or DSA campaigns/Ad Groups.
  • Accounts with less than 10 conversions.
  • Accounts with a conversion rate above 50%.
  • For ROAS comparisons, any accounts with a ROAS below 200% or above 2500%.

Search terms in the study are therefore from keyword-based search campaigns where those accounts appear to have a reliable conversion tracking setup and have enough conversion data to be individually meaningful.

The cleaned data set comprised over 4.5 million clicks and 400,000 conversions (over 30 days) across Google and Microsoft Ads; a large enough data set to answer questions about CV performance with confidence.

Interestingly, each platform appears to have a different driver for their lower CV performance. 

CPA Results:

Google Ads was able to maintain its conversion rate, but it chased more expensive clicks to achieve it…in fact, clicks at almost double the average CPC of true match! Result: their CPA of CVs worked out roughly double the CPA of true match.                 

Microsoft Ads only saw slightly poorer CPA performance within CVs; their conversion rate was much lower compared to true match, but their saving grace was that they had significantly lower CPCs, and you can afford to have a lower conversion rate if your click costs are also lower. End outcome? Microsoft Ads CPA on CVs was only slightly more expensive when compared to their CPA on true matches; a pleasant surprise 🙂.

What Are They Really Costing You

Image created by Adpulse, May 2024

ROAS Results:

Both platforms showed a similar story; CVs delivered roughly half the ROAS of their true match cousins, with Microsoft Ads again being stronger overall. 

 

1718772963 395 What Are They Really Costing You

Image created by Adpulse, May 2024

Underlying Data:

For the data nerds amongst us (at Adpulse we self-identify here !) 

1718772963 88 What Are They Really Costing You

Image created by Adpulse, May 2024


TL;DR

Close variant search terms consume, on average, more than half an advertiser’s budget whilst in most cases, performing significantly worse than search terms that actually match the keywords. How much worse? Read above for details ^. Enough that managing their impact effectively could well be one of your largest optimization levers toward driving significant improvements in account ROI. 


Image Credits

Featured Image: Image by Adpulse. Used with permission.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

How To Uncover Traffic Declines In Google Search Console And How To Fix Them

Published

on

By

How To Uncover Traffic Declines In Google Search Console And How To Fix Them

Google Search Console is an essential tool that offers critical insights into your website’s performance in Google search results.

Occasionally, you might observe a sudden decline in organic traffic, and it’s crucial to understand the potential causes behind this drop. The data stored within Google Search Console (GSC) can be vital in troubleshooting and understanding what has happened to your website.

Before troubleshooting GSC traffic declines, it’s important to understand first what Google says about assessing traffic graphs in GSC and how it reports on different metrics.

Understanding Google Search Console Metrics

Google’s documentation on debugging Search traffic drops is relatively comprehensive (compared to the guidance given in other areas) and can, for the most part, help prevent any immediate or unnecessary panic should there be a change in data.

Despite this, I often find that Search Console data is misunderstood by both clients and those in the first few years of SEO and learning the craft.

Image from Google Search Central, May 2024

Even with these definitions, if your clicks and impressions graphs begin to resemble any of the above graph examples, there can be wider meanings.

Search Central description  It could also be a sign that…
Large drop from an algorithmic update, site-wide security, or spam issue This could also signal a serious technical issue, such as accidentally deploying a noindex onto a URL or returning the incorrect status code – I’ve seen it before where the URL renders content but returns a 410.
Seasonality You will know your seasonality better than anyone, but if this graph looks inverse it could be a sign that during peak search times, Google is rotating the search engine results pages (SERPs) and choosing not to rank your site highly. This could be because, during peak search periods, there is a slight intent shift in the queries’ dominant interpretation.
Technical issues across your site, changing interests This type of graph could also represent seasonality (both as a gradual decline or increase).
Reporting glitch ¯_(ツ)_/¯ This graph can represent intermittent technical issues as well as reporting glitches. Similar to the alternate reasons for graphs like Seasonality, it could represent a short-term shift in the SERPs and what meets the needs of an adjusted dominant interpretation of a query.

Clicks & Impressions

Google filters Click and Impression data in Google Search Console through a combination of technical methods and policies designed to ensure the accuracy, reliability, and integrity of the reported data.

Reasons for this include:

  • Spam and bot filtering.
  • Duplicate data removal.
  • User privacy/protection.
  • Removing “invalid activities.”
  • Data aggregation and sampling.

One of the main reasons I’ve seen GSC change the numbers showing the UI and API is down to the setting of thresholds.

Google may set thresholds for including data in reports to prevent skewed metrics due to very low-frequency queries or impressions. For example, data for queries that result in very few impressions might be excluded from reports to maintain the statistical reliability of the metrics.

Average Position

Google Search Console produces the Average Position metric by calculating the average ranking of a website’s URLs for a specific query or set of queries over a defined period of time.

Each time a URL appears in the search results for a query, its position is recorded. For instance, if a URL appears in the 3rd position for one query and in the 7th position for another query, these positions are logged separately.

As we enter the era of AI Overviews, John Mueller has confirmed via Slack conversations that appearing in a generative snapshot will affect the average position of the query and/or URL in the Search Console UI.

1718702762 996 How To Uncover Traffic Declines In Google Search Console AndSource: John Mueller via The SEO Community Slack channel

I don’t rely on the average position metric in GSC for rank tracking, but it can be useful in trying to debug whether or not Google is having issues establishing a single dominant page for specific queries.

Understanding how the tool compiles data allows you to better diagnose the reasons as to why, and correlate data with other events such as Google updates or development deployments.

Google Updates

A Google broad core algorithm update is a significant change to Google’s search algorithm intended to improve the relevance and quality of search results.

These updates do not target specific sites or types of content but alter specific systems that make up the “core” to an extent it is noteworthy for Google to announce that an update is happening.

Google makes updates to the various individual systems all the time, so the lack of a Google announcement does not disqualify a Google update from being the cause of a change in traffic.

For example, the website in the below screenshot saw a decline from the March 2023 core update but then recovered in the November 2023 core update.

GSC: the website saw a decline from the March 2023 core updateScreenshot by author from Google Search Console, May 2024

The following screenshot shows another example of a traffic decline correlating with a Google update, and it also shows that recovery doesn’t always occur with future updates.

traffic decline correlating with a Google updateScreenshot by author from Google Search Console, May 2024

This site is predominantly informational content supporting a handful of marketing landing pages (a traditional SaaS model) and has seen a steady decline correlating with the September 2023 helpful content update.

How To Fix This

Websites negatively impacted by a broad core update can’t fix specific issues to recover.

Webmasters should focus on providing the best possible content and improving overall site quality.

Recovery, however, may occur when the next broad core update is rolled out if the site has improved in quality and relevance or Google adjusts specific systems and signal weightings back in the favour of your site.

In SEO terminology, we also refer to these traffic changes as an algorithmic penalty, which can take time to recover from.

SERP Layout Updates

Given the launch of AI Overviews, I feel many SEO professionals will conduct this type of analysis in the coming months.

In addition to AI Overviews, Google can choose to include a number of different SERP features ranging from:

  • Shopping results.
  • Map Packs.
  • X (Twitter) carousels.
  • People Also Ask accordions.
  • Featured snippets.
  • Video thumbnails.

All of these not only detract and distract users from the traditional organic results, but they also cause pixel shifts.

From our testing of SGE/AI Overviews, we see traditional results being pushed down anywhere between 1,000 and 1,500 pixels.

When this happens you’re not likely to see third-party rank tracking tools show a decrease, but you will see clicks decline in GSC.

The impact of SERP features on your traffic depends on two things:

  • The type of feature introduced.
  • Whether your users predominantly use mobile or desktop.

Generally, SERP features are more impactful to mobile traffic as they greatly increase scroll depth, and the user screen is much smaller.

You can establish your dominant traffic source by looking at the device breakdown in Google Search Console:

Device by users: clicks and impressionsImage from author’s website, May 2024

You can then compare the two graphs in the UI, or by exporting data via the API with it broken down by devices.

How To Fix This

When Google introduces new SERP features, you can adjust your content and site to become “more eligible” for them.

Some are driven by structured data, and others are determined by Google systems after processing your content.

If Google has introduced a feature that results in more zero-click searches for a particular query, you need to first quantify the traffic loss and then adjust your strategy to become more visible for similar and associated queries that still feature in your target audience’s overall search journey.

Seasonality Traffic Changes

Seasonality in demand refers to predictable fluctuations in consumer interest and purchasing behavior that occur at specific times of the year, influenced by factors such as holidays, weather changes, and cultural events.

Notably, a lot of ecommerce businesses will see peaks in the run-up to Christmas and Thanksgiving, whilst travel companies will see seasonality peaks at different times of the year depending on the destinations and vacation types they cater to.

The below screenshot is atypical of a business that has a seasonal peak in the run-up to Christmas.

seasonal peaks as measured in GSCScreenshot by author from Google Search Console, May 2024

You will see these trends in the Performance Report section and likely see users and sessions mirrored in other analytics platforms.

During a seasonal peak, Google may choose to alter the SERPs in terms of which websites are ranked and which SERP features appear. This occurs when the increase in search demand also brings with it a change in user intent, thus changing the dominant interpretation of the query.

In the travel sector, the shift is often from a research objective to a commercial objective. Out-of-season searchers are predominantly researching destinations or looking for deals, and when it is time to book, they’re using the same search queries but looking to book.

As a result, webpages with a value proposition that caters more to the informational intent are either “demoted” in rankings or swapped out in favor of webpages that (in Google’s eyes) better cater to users in satisfying the commercial intent.

How To Fix This

There is no direct fix for traffic increases and decreases caused by seasonality.

However, you can adjust your overall SEO strategy to accommodate this and work to create visibility for the website outside of peak times by creating content to meet the needs and intent of users who may have a more research and information-gathering intent.

Penalties & Manual Actions

A Google penalty is a punitive action taken against a website by Google, reducing its search rankings or removing it from search results, typically due to violations of Google’s guidelines.

As well as receiving a notification in GSC, you’ll typically see a sharp decrease in traffic, akin to the graph below:

Google traffic decline from penaltyScreenshot by author from Google Search Console, May 2024

Whether or not the penalty is partial or sitewide will depend on how bad the traffic decline is, and also the type (or reason) as to why you received a penalty in the first place will determine what efforts are required and how long it will take to recover.

Changes In PPC Strategies

A common issue I encounter working with organizations is a disconnect in understanding that, sometimes, altering a PPC campaign can affect organic traffic.

An example of this is brand. If you start running a paid search campaign on your brand, you can often expect to see a decrease in branded clicks and CTR. As most organizations have separate vendors for this, it isn’t often communicated that this will be the case.

The Search results performance report in GSC can help you identify whether or not you have cannibalization between your SEO and PPC. From this report, you can correlate branded and non-branded traffic drops with the changelog from those in command of the PPC campaign.

How To Fix This

Ensuring that all stakeholders understand why there have been changes to organic traffic, and that the traffic (and user) isn’t lost, it is now being attributed to Paid.

Understanding if this is the “right decision” or not requires a conversation with those managing the PPC campaigns, and if they are performing and providing a strong ROAS, then the organic traffic loss needs to be acknowledged and accepted.

Recovering Site Traffic

Recovering from Google updates can take time.

Recently, John Mueller has said that sometimes, to recover, you need to wait for another update cycle.

However, this doesn’t mean you shouldn’t be active in trying to improve your website and better align with what Google wants to reward and relying on Google reversing previous signal weighting changes.

It’s critical that you start doing all the right things as soon as possible. The earlier that you identify and begin to solve problems, the earlier that you open up the potential for recovery. The time it takes to recover depends on what caused the drop in the first place, and there might be multiple factors to account for. Building a better website for your audience that provides them with better experiences and better service is always the right thing to do.

More resources: 


Featured Image: Ground Picture/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending