SEO
Are ChatGPT, Bard and Dolly 2.0 Trained On Pirated Content?

Large Language Models (LLMs) like ChatGPT, Bard and even open source versions are trained on public Internet content. But there are also indications that popular AIs might also be trained on datasets created from pirated books.
Is Dolly 2.0 Trained on Pirated Content?
Dolly 2.0 is an open source AI that was recently released. The intent behind Dolly is to democratize AI by making it available to everyone who wants to create something with it, even commercial products.
But there’s also a privacy issue with concentrating AI technology in the hands of three major corporations and trusting them with private data.
Given a choice, many businesses would prefer to not hand off private data to third parties like Google, OpenAI and Meta.
Even Mozilla, the open source browser and app company, is investing in growing the open source AI ecosystem.
The intent behind open source AI is unquestionably good.
But there is an issue with the data that is used to train these large language models because some of it consists of pirated content.
Open source ChatGPT clone, Dolly 2.0, was created by a company called DataBricks (learn more about Dolly 2.0)
Dolly 2.0 is based on an Open Source Large Language Model (LLM) called Pythia (which was created by an open source group called, EleutherAI).
EleutherAI created eight versions of LLMs of different sizes within the Pythia family of LLMs.
One version of Pythia, a 12 billion parameter version, is the one used by DataBricks to create Dolly 2.0, as well as with a dataset that DataBricks created themselves (a dataset of questions and answers that was used to train the Dolly 2.0 AI to take instructions)
The thing about the EleutherAI Pythia LLM is that it was trained using a dataset called the Pile.
The Pile dataset is comprised of multiple sets of English language texts, one of which is a dataset called Books3. The Books3 dataset contains the text of books that were pirated and hosted at a pirate site called, bibliotik.
This is what the DataBricks announcement says:
“Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees.”
Pythia LLM Was Created With the Pile Dataset
The Pythia research paper by EleutherAI that mentions that Pythia was trained using the Pile dataset.
This is a quote from the Pythia research paper:
“We train 8 model sizes each on both the Pile …and the Pile after deduplication, providing 2 copies of the suite which can be compared.”
Deduplication means that they removed redundant data, it’s a process for creating a cleaner dataset.
So what’s in Pile? There’s a Pile research paper that explains what’s in that dataset.
Here’s a quote from the research paper for Pile where it says that they use the Books3 dataset:
“In addition we incorporate several existing highquality datasets: Books3 (Presser, 2020)…”
The Pile dataset research paper links to a tweet by Shawn Presser, that says what is in the Books3 dataset:
“Suppose you wanted to train a world-class GPT model, just like OpenAI. How? You have no data.
Now you do. Now everyone does.
Presenting “books3”, aka “all of bibliotik”
– 196,640 books
– in plain .txt
– reliable, direct download, for years: https://the-eye.eu/public/AI/pile_preliminary_components/books3.tar.gz”
So… the above quote clearly states that the Pile dataset was used to train the Pythia LLM which in turn served as the foundation for the Dolly 2.0 open source AI.
Is Google Bard Trained on Pirated Content?
The Washington Post recently published a review of Google’s Colossal Clean Crawled Corpus dataset (also known as C4 – PDF research paper here) in which they discovered that Google’s dataset also contains pirated content.
The C4 dataset is important because it’s one of the datasets used to train Google’s LaMDA LLM, a version of which is what Bard is based on.
The actual dataset is called Infiniset and the C4 dataset makes up about 12.5% of the total text used to train LaMDA. Citations to those facts about Bard can be found here.
The Washington Post news article published:
“The three biggest sites were patents.google.com No. 1, which contains text from patents issued around the world; wikipedia.org No. 2, the free online encyclopedia; and scribd.com No. 3, a subscription-only digital library.
Also high on the list: b-ok.org No. 190, a notorious market for pirated e-books that has since been seized by the U.S. Justice Department.
At least 27 other sites identified by the U.S. government as markets for piracy and counterfeits were present in the data set.”
The flaw in the Washington Post analysis is that they’re looking at a version of the C4 but not necessarily the one that LaMDA was trained on.
The research paper for the C4 dataset was published in July 2020. Within a year of publication another research paper was published that discovered that the C4 dataset was biased against people of color and the LGBT community.
The research paper is titled, Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus (PDF research paper here).
It was discovered by the researchers that the dataset contained negative sentiment against people of Arab identies and excluded documents that were associated with Blacks, Hispanics, and documents that mention sexual orientation.
The researchers wrote:
“Our examination of the excluded data suggests that documents associated with Black and Hispanic authors and documents mentioning sexual orientations are significantly more likely to be excluded by C4.EN’s blocklist filtering, and that many excluded documents contained non-offensive or non-sexual content (e.g., legislative discussions of same-sex marriage, scientific and medical content).
This exclusion is a form of allocational harms …and exacerbates existing (language-based) racial inequality as well as stigmatization of LGBTQ+ identities…
In addition, a direct consequence of removing such text from datasets used to train language models is that the models will perform poorly when applied to text from and about people with minority identities, effectively excluding them from the benefits of technology like machine translation or search.”
It was concluded that the filtering of “bad words” and other attempts to “clean” the dataset was too simplistic and warranted are more nuanced approach.
Those conclusions are important because they show that it was well known that the C4 dataset was flawed.
LaMDA was developed in 2022 (two years after the C4 dataset) and the associated LaMDA research paper says that it was trained with C4.
But that’s just a research paper. What happens in real-life on a production model can be vastly different from what’s in the research paper.
When discussing a research paper it’s important to remember that Google consistently says that what’s in a patent or research paper isn’t necessarily what’s in use in Google’s algorithm.
Google is highly likely to be aware of those conclusions and it’s not unreasonable to assume that Google developed a new version of C4 for the production model, not just to address inequities in the dataset but to bring it up to date.
Google doesn’t say what’s in their algorithm, it’s a black box. So we can’t say with certainty that the technology underlying Google Bard was trained on pirated content.
To make it even clearer, Bard was released in 2023, using a lightweight version of LaMDA. Google has not defined what a lightweight version of LaMDA is.
So there’s no way to know what content was contained within the datasets used to train the lightweight version of LaMDA that powers Bard.
One can only speculate as to what content was used to train Bard.
Does GPT-4 Use Pirated Content?
OpenAI is extremely private about the datasets used to train GPT-4. The last time OpenAI mentioned datasets is in the PDF research paper for GPT-3 published in 2020 and even there it’s somewhat vague and imprecise about what’s in the datasets.
The TowardsDataScience website in 2021 published an interesting review of the available information in which they conclude that indeed some pirated content was used to train early versions of GPT.
They write:
“…we find evidence that BookCorpus directly violated copyright restrictions for hundreds of books that should not have been redistributed through a free dataset.
For example, over 200 books in BookCorpus explicitly state that they “may not be reproduced, copied and distributed for commercial or non-commercial purposes.””
It’s difficult to conclude whether GPT-4 used any pirated content.
Is There A Problem With Using Pirated Content?
One would think that it may be unethical to use pirated content to train a large language model and profit from the use of that content.
But the laws may actually allow this kind of use.
I asked Kenton J. Hutcherson, Internet Attorney at Hutcherson Law what he thought about the use of pirated content in the context of training large language models.
Specifically, I asked if someone uses Dolly 2.0, which may be partially created with pirated books, would commercial entities who create applications with Dolly 2.0 be exposed to copyright infringement claims?
Kenton answered:
“A claim for copyright infringement from the copyright holders of the pirated books would likely fail because of fair use.
Fair use protects transformative uses of copyrighted works.
Here, the pirated books are not being used as books for people to read, but as inputs to an artificial intelligence training dataset.
A similar example came into play with the use of thumbnails on search results pages. The thumbnails are not there to replace the webpages they preview. They serve a completely different function—they preview the page.
That is transformative use.”
Karen J. Bernstein of Bernstein IP offered a similar opinion.
“Is the use of the pirated content a fair use? Fair use is a commonly used defense in these instances.
The concept of the fair use defense only exists under US copyright law.
Fair use is analyzed under a multi-factor analysis that the Supreme Court set forth in a 1994 landmark case.
Under this scenario, there will be questions of how much of the pirated content was taken from the books and what was done to the content (was it “transformative”), and whether such content is taking the market away from the copyright creator.”
AI technology is bounding forward at an unprecedented pace, seemingly evolving on a week to week basis. Perhaps in a reflection of the competition and the financial windfall to be gained from success, Google and OpenAI are becoming increasingly private about how their AI models are trained.
Should they be more open about such information? Can they be trusted that their datasets are fair and non-biased?
The use of pirated content to create these AI models may be legally protected as fair use, but just because one can does that mean one should?
Featured image by Shutterstock/Roman Samborskyi
SEO
Link Building Outreach for Noobs

Link outreach is the process of contacting other websites to ask for a backlink to your website.
For example, here’s an outreach email we sent as part of a broken link building campaign:
In this guide, you’ll learn how to get started with link outreach and how to get better results.
How to do link outreach
Link outreach is a four-step process:
1. Find prospects
No matter how amazing your email is, you won’t get responses if it’s not relevant to the person you’re contacting. This makes finding the right person to contact equally as important as crafting a great email.
Who to reach out to depends on your link building strategy. Here’s a table summarizing who you should find for the following link building tactics:
As a quick example, here’s how you would find sites likely to accept your guest posts:
- Go to Content Explorer
- Enter a related topic and change the dropdown to “In title”
- Filter for English results
- Filter for results with 500+ words
- Go to the “Websites” tab


This shows you the websites getting the most search traffic to content about your target topic.
From here, you’d want to look at the Authors column to prioritize sites with multiple authors, as this suggests that they may accept guest posts.


If you want to learn how to find prospects for different link building tactics, I recommend reading the resource below.
2. Find their contact details
Once you’ve curated a list of people to reach out to, you’ll need to find their contact information.
Typically, this is their email address. The easiest way to find this is to use an email lookup tool like Hunter.io. All you need to do is enter the first name, last name, and domain of your target prospect. Hunter will find their email for you:


To prevent tearing your hair from searching for hundreds of emails one-by-one, most email lookup tools allow you to upload a CSV list of names and domains. Hunter also has a Google Sheets add-on to make this even easier.


3. Send a personalized pitch
Knowing who to reach out to is half the battle won. The next ‘battle’ to win is actually getting the person to care.
Think about it. For someone to link to you, the following things need to happen:
- They must read your email
- They must be convinced to check out your content
- They must open the target page and complete all administrative tasks (log in to their CMS, find the link, etc.)
- They must link to you or swap out links
That’s a lot of steps. Most people don’t care enough to do this. That’s why there’s more to link outreach than just writing the perfect email (I’ll cover this in the next section).
For now, let’s look at how to craft an amazing email. To do that, you need to answer three questions:
- Why should they open your email? — The subject line needs to capture attention in a busy inbox.
- Why should they read your email? — The body needs to be short and hook the reader in.
- Why should they link to you? — Your pitch needs to be compelling: What’s in it for them and why is your content link-worthy?
For example, here’s how we wrote our outreach email based on the three questions:


Here’s another outreach email we wrote, this time for a campaign building links to our content marketing statistics post:


4. Follow up, once
People are busy and their inboxes are crowded. They might have missed your email or read it and forgot.
Solve this by sending a short polite follow-up.


One is good enough. There’s no need to spam the other person with countless follow-up emails hoping for a different outcome. If they’re not interested, they’re not interested.
Link outreach tips
In theory, link outreach is simply finding the right person and asking them for a link. But there is more to it than that. I’ll explore some additional tips to help improve your outreach.
Don’t over-personalize
Some SEOs swear by the sniper approach to link outreach. That is: Each email is 100% customized to the person you are targeting.
But our experience taught us that over-personalization isn’t better. We ran link-building campaigns that sent hyper-personalized emails and got no results.
It makes logical sense: Most people just don’t do favors for strangers. I’m not saying it doesn’t happen—it does—but rarely will your amazing, hyper-personalized pitch change someone’s mind.
So, don’t spend all your time tweaking your email just to eke out minute gains.
Avoid common templates
My first reaction seeing this email is to delete it:


Why? Because it’s a template I’ve seen many times in my inbox. And so have many others.
Another reason: Not only did he reference a post I wrote six years ago, it was a guest post, i.e., I do not have control over the site. This shows why finding the right prospects is important. He even got my name wrong.
Templates do work, but bad ones don’t. You can’t expect to copy-paste one from a blog post and hope to achieve success.
A better approach is to use the scoped shotgun approach: use a template but with dynamic variables.


You can do this with tools like Pitchbox and Buzzstream.
This can help achieve a decent level of personalization so your email isn’t spammy. But it doesn’t spend all your time writing customized emails for every prospect.
Send lots of emails
When we polled 800+ people on X and LinkedIn about their link outreach results, the average conversion rate was only 1-5%.


This is why you need to send more emails. If you run the numbers, it just makes sense:
- 100 outreach emails with a 1% success rate = 1 link
- 1,000 outreach emails with a 1% success rate = 10 links
I’m not saying to spam everyone. But if you want more high-quality links, you need to reach out to more high-quality prospects.
Build a brand
A few years ago, we published a link building case study:
- 515 outreach emails
- 17.55% reply rate
- 5.75% conversion rate
Pretty good results! Except the top comments were about how we only succeeded because of our brand:


It’s true; we acknowledge it. But I think the takeaway here isn’t that we should repeat the experiment with an unknown website. The takeaway is that more SEOs should be focused on building a brand.
We’re all humans—we rely on heuristics to make judgments. In this case, it’s branding. If your brand is recognizable, it solves the “stranger” problem—people know you, like you, and are more likely to link.
The question then: How do you build a brand?
I’d like to quote our Chief Marketing Officer Tim Soulo here:
What is a strong brand if not a consistent output of high-quality work that people enjoy? Ahrefs’ content team has been publishing top-notch content for quite a few years on our blog and YouTube channel. Slowly but surely, we were able to reach tens of millions of people and instill the idea that “Ahrefs’ content = quality content”—which now clearly works to our advantage.
Ahrefs was once unknown, too. So, don’t be disheartened if no one is willing to link to you today. Rome wasn’t built in a day.
Trust the process and create incredible content. Show it to people. You’ll build your brand and reputation that way.
Build relationships with people in your industry
Outreach starts before you even ask for a link.
Think about it: People don’t do favors for strangers but they will for friends. If you want to build and maintain relationships in the industry, way before you start any link outreach campaigns.
Don’t just rely on emails either. Direct messages (DMs) on LinkedIn and X, phone calls—they all work. For example, Patrick Stox, our Product Advisor, used to have a list of contacts he regularly reached out to. He’d hop on calls and even send fruit baskets.
Create systems and automations
In its most fundamental form, link outreach is really about finding more people and sending more emails.
Doing this well is all about building systems and automations.
We have a few videos on how to build a team and a link-building system, so I recommend that you check them out.
Final thoughts
Good link outreach is indistinguishable from good business development.
In business development, your chances of success will increase if you:
- Pitch the right partners
- Have a strong brand
- Have prior relationships with them
- Pitch the right collaboration ideas
The same goes for link outreach. Follow the principles above and you will see more success for your link outreach campaigns.
Any questions or comments? Let me know on Twitter X.
SEO
Research Shows Tree Of Thought Prompting Better Than Chain Of Thought

Researchers discovered a way to defeat the safety guardrails in GPT4 and GPT4-Turbo, unlocking the ability to generate harmful and toxic content, essentially beating a large language model with another large language model.
The researchers discovered that the use of tree-of-thought (ToT)reasoning to repeat and refine a line of attack was useful for jailbreaking another large language model.
What they found is that the ToT approach was successful against GPT4, GPT4-Turbo, and PaLM-2, using a remarkably low number of queries to obtain a jailbreak, on average less than thirty queries.
Tree Of Thoughts Reasoning
A Google research paper from around May 2022 discovered Chain of Thought Prompting.
Chain of Thought (CoT) is a prompting strategy used on a generative AI to make it follow a sequence of steps in order to solve a problem and complete a task. The CoT method is often accompanied with examples to show the LLM how the steps work in a reasoning task.
So, rather than just ask a generative AI like Midjourney or ChatGPT to do a task, the chain of thought method instructs the AI how to follow a path of reasoning that’s composed of a series of steps.
Tree of Thoughts (ToT) reasoning, sometimes referred to as Tree of Thought (singular) is essentially a variation and improvement of CoT, but they’re two different things.
Tree of Thoughts reasoning is similar to CoT. The difference is that rather than training a generative AI to follow a single path of reasoning, ToT is built on a process that allows for multiple paths so that the AI can stop and self-assess then come up with alternate steps.
Tree of Thoughts reasoning was developed in May 2023 in a research paper titled Tree of Thoughts: Deliberate Problem Solving with Large Language Models (PDF)
The research paper describes Tree of Thought:
“…we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.
ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.
Our experiments show that ToT significantly enhances language models’ problem-solving abilities…”
Tree Of Attacks With Pruning (TAP)
This new method of jailbreaking large language models is called Tree of Attacks with Pruning, TAP. TAP uses two LLMs, one for attacking and the other for evaluating.
TAP is able to outperform other jailbreaking methods by significant margins, only requiring black-box access to the LLM.
A black box, in computing, is where one can see what goes into an algorithm and what comes out. But what happens in the middle is unknown, thus it’s said to be in a black box.
Tree of thoughts (TAP) reasoning is used against a targeted LLM like GPT-4 to repetitively try different prompting, assess the results, then if necessary change course if that attempt is not promising.
This is called a process of iteration and pruning. Each prompting attempt is analyzed for the probability of success. If the path of attack is judged to be a dead end, the LLM will “prune” that path of attack and begin another and better series of prompting attacks.
This is why it’s called a “tree” in that rather than using a linear process of reasoning which is the hallmark of chain of thought (CoT) prompting, tree of thought prompting is non-linear because the reasoning process branches off to other areas of reasoning, much like a human might do.
The attacker issues a series of prompts, the evaluator evaluates the responses to those prompts and then makes a decision as to what the next path of attack will be by making a call as to whether the current path of attack is irrelevant or not, plus it also evaluates the results to determine the likely success of prompts that have not yet been tried.
What’s remarkable about this approach is that this process reduces the number of prompts needed to jailbreak GPT-4. Additionally, a greater number of jailbreaking prompts are discovered with TAP than with any other jailbreaking method.
The researchers observe:
“In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM.
TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thoughts reasoning until one of the generated prompts jailbreaks the target.
Crucially, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks.
Using tree-of-thought reasoning allows TAP to navigate a large search space of prompts and pruning reduces the total number of queries sent to the target.
In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks.”
Tree Of Thought (ToT) Outperforms Chain Of Thought (CoT) Reasoning
Another interesting conclusion reached in the research paper is that, for this particular task, ToT reasoning outperforms CoT reasoning, even when adding pruning to the CoT method, where off topic prompting is pruned and discarded.
ToT Underperforms With GPT 3.5 Turbo
The researchers discovered that ChatGPT 3.5 Turbo didn’t perform well with CoT, revealing the limitations of GPT 3.5 Turbo. Actually, GPT 3.5 performed exceedingly poorly, dropping from 84% success rate to only a 4.2% success rate.
This is their observation about why GPT 3.5 underperforms:
“We observe that the choice of the evaluator can affect the performance of TAP: changing the attacker from GPT4 to GPT3.5-Turbo reduces the success rate from 84% to 4.2%.
The reason for the reduction in success rate is that GPT3.5-Turbo incorrectly determines that the target model is jailbroken (for the provided goal) and, hence, preemptively stops the method.
As a consequence, the variant sends significantly fewer queries than the original method…”
What This Mean For You
While it’s amusing that the researchers use the ToT method to beat an LLM with another LLM, it also highlights the usefulness of ToT for generating surprising new directions in prompting in order to achieve higher levels of output.
- TL/DR Takeaways:
- Tree of Thought prompting outperformed Chain of Thought methods
- GPT 3.5 worked significantly poorly in comparison to GPT 4 in ToT
- Pruning is a useful part of a prompting strategy
- Research showed that ToT is superior to CoT in an intensive reasoning task like jailbreaking an LLM
Read the original research paper:
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically (PDF)
Featured Image by Shutterstock/THE.STUDIO
SEO
The Lean Guide (With Template)

A competitive analysis (or market competitive analysis) is a process where you collect information about competitors to gain an edge over them and get more customers.
However, the problem is that “traditional” competitive analysis is overkill for most businesses — it requires impractical data and takes too long to complete (and it’s very expensive if you choose to outsource).
A solution to that is a lean approach to the process — and that’s what this guide is about.
In other words, we’ll focus on the most important data you need to answer the question: “Why would people choose them over you?”. No boring theory, outtakes from marketing history, or spending hours digging up nice-to-have information.
In this guide, you will find:
- A real-life competitive analysis example.
- Templates: one for input data and one for a slide deck to present your analysis to others.
- Step-by-step instructions.
Our template consists of two documents: a slide deck and a spreadsheet.
The Slide deck is the output document. It will help you present the analysis to your boss or your teammates.
The spreadsheet is the input document. You will find tables that act as the data source for the charts from the slide deck, as well as a prompt to use in ChatGPT to help you with user review research.


We didn’t focus on aesthetics here; every marketer likes to do slide decks their own way, so feel free to edit everything you’ll find there.
With that out of the way, let’s talk about the process. The template consists of these six tasks:
- Identify your direct competitors.
- Compare share of voice.
- Compare pricing and features.
- Find strong and weak points based on reviews.
- Compare purchasing convenience.
- Present conclusions.
Going forward, we’ll explain why these steps matter and show how to complete them.
Direct competitors are businesses that offer a similar solution to the same audience.
They matter a lot more than indirect competitors (i.e. businesses with different products but targeting the same audience as you) because you’ll be compared with them often (e.g. in product reviews and rankings). Plus, your audience is more likely to gravitate towards them when considering different options.
You probably have a few direct competitors in mind already, but here are a few ways to find others based on organic search and paid search ads.
Our basis for the analysis was Landingi, a SaaS for building landing pages (we chose that company randomly). So in our case, we found these 3 direct competitors.


Look at keyword overlap
Keyword overlap uncovers sites that target the same organic keywords as you. Some sites will compete with you for traffic but not for customers (e.g. G2 may share some keywords with Landingi but they’re a different business). However, in many cases, you will find direct competitors just by looking at this marketing channel.
- Go to Ahrefs’ Site Explorer and enter your site’s address.
- Scroll down to Organic competitors.
- Visit the URLs to pick 3 – 5 direct competitors.


To double-check the choice of competitors, we also looked at who was bidding for search ads on Google.
See who’s advertising
If someone is spending money to show ads for keywords related to what you do, that’s a strong indication they are a direct competitor.
- Go to Ahrefs’ Keywords Explorer.
- Type in a few broad keywords related to your niche, like “landing page builder” or “landing page tool”.
- Go to the Ads history report.
- Visit the sites that have a high presence of ads in the SERPs (Search Engine Result Pages).


Once you’re done checking both reports, write down competitors in the deck.
You can also take screenshots of the reports and add them to your deck to show the supporting data for your argument.


Share of voice is a measure of your reach in any given channel compared to competitors.
A bigger share of voice (SOV) means that your competitors are more likely to reach your audience. In other words, they may be promoting more effectively than you.
In our example, we found that Landingi’s SOV was the lowest in both of these channels.
Organic:


And social media:


Here’s how we got that data using Ahrefs and Brand24.
Organic share of voice
Before we start, make sure you have a project set up in Ahrefs’ Rank Tracker.


Now:
- Go to Ahrefs’ Competitive Analysis and enter your and your competitors’s sites as shown below.


- On the next screen, set the country with the most important market for your business and set the filters like this:


- Select keywords that sound most relevant to your business (even if you don’t rank for them yet) and Add them to Rank Tracker.


- Go to Rank Tracker, open your project, and look for Competitors/Overview. This report will uncover automatically calculated Share of Voice.


- Add the numbers in corresponding cells inside the sheet and paste the graph inside the slide deck.


It’s normal that the numbers don’t add up to 100%. SOV is calculated by including sites that compete with you in traffic but are not your direct competitors, e.g. blogs.
Social share of voice
We can also measure our share of voice across social media channels using Brand24.
- Go to Brand24.
- Start a New project for your brand and each competitor. Use the competitors’ brand name as the keyword to monitor.
- Go to the Comparison report and compare your project with competitors.


- Take a screenshot of the SOV charts and paste them into the slide deck. Make sure the charts are set to “social media”.


Consumers often choose solutions that offer the best value for money — simple as that. And that typically comes down to two things:
- Whether you have the features they care about. We’ll use all features available across all plans to see how likely the product is to satisfy user needs.
- How much they will need to pay. Thing is, the topic of pricing is tricky: a) when assessing affordability, people often focus on the least expensive option available and use it as a benchmark, b) businesses in the SaaS niche offer custom plans. So to make things more practical, we’ll compare the cheapest plans, but feel free to run this analysis across all pricing tiers.
After comparing our example company to competitors, we found that it goes head-to-head with Unbounce as the most feature-rich solution on the market.


Here’s how we got that data.
- Note down your and your competitors’ product features. One of the best places to get this information is pricing pages. Some brands even publish their own competitor comparisons — you may find them helpful too.
- While making the list, place a “1” in the cell corresponding to the brand that offers the solution.


- Enter the price of the cheapest plan (excluding free plans).


- Once finished, copy the chart and paste it inside the deck.
User reviews can show incredibly valuable insight into your competitors’ strong and weak points. Here’s why this matters:
- Improving on what your competitors’ customers appreciate could help you attract similar customers and possibly win some over.
- Dissatisfaction with competitors is a huge opportunity. Some businesses are built solely to fix what other companies can’t fix.
Here’s a sample from our analysis:


And here’s how we collated the data using ChatGPT. Important: repeat the process for each competitor.
- Open ChatGPT and enter the prompt from the template.


- Go to G2, Capterra, or Trustpilot and find a competitor’s reviews with ratings from 2 – 4 (i.e. one rating above the lowest and one below the highest possible). Reason:
businesses sometimes solicit five-star reviews, whereas dissatisfied customers tend to leave one-star reviews in a moment of frustration. The most actionable feedback usually comes in between.
- Copy and paste the content of the reviews into ChatGPT (don’t hit enter yet).
- Once you’re done pasting all reviews, hit enter in ChatGPT to run the analysis.


- Paste the graphs into the deck. If you want the graphs to look different, don’t hesitate to ask the AI.
There’s a faster alternative, but it’s a bit more advanced.
Instead of copy-pasting, you can use a scraping tool like this one to get all reviews at once. The downside here is that not all review sources will a have scraping tool available.
Lastly, we’ll see how easy it is to actually buy your products, and compare the experience to your competitors.
This is a chance to simplify your checkout process, and even learn from any good habits your competitors have adopted.
For example, we found that our sample company had probably nothing to worry about in this area — they ticked almost all of the boxes.


Here’s how to complete this step:
- Place a “1” if you or any of your competitors offer convenience features listed in the template.
- Once done, copy the chart and paste it into the deck.
This is the part of the presentation where you sum up all of your findings and suggest a course of action.
Here are two examples:
- Landingi had the lowest SOV in the niche, and that is never good. So the conclusion might be to go a level deeper and do an SEO competitive analysis, and to increase social media presence by creating more share-worthy content like industry surveys, design/CRO tips, or in-house data studies.
- Although the brand had a very high purchasing convenience score, during the analysis we found that there was a $850 gap between the monthly full plan and the previous tier. The conclusion here might be to offer a custom plan (like competitors do) to fill that gap.
We encourage you to take your time here and think about what would make the most sense for your business.
Tip
It’s good to be specific in your conclusions, but don’t go too deep. Competitive analysis concerns many aspects of the business, so it’s best to give other departments a chance to chime in. Just because your competitors have a few unique features doesn’t necessarily mean you need to build them too.
Final thoughts
A competitive analysis is one of the most fruitful exercises in marketing. It can show you areas for improvement, give ideas for new features, and help you discover gaps in your strategy. It wouldn’t be an exaggeration to say that it’s fundamental to running a successful business.
Just don’t forget to balance “spying” on your competitors with innovation. After all, you probably don’t want to become an exact copy of someone else’s brand.
In other words, use competitive analysis to keep up with your competitors, but don’t let that erase what’s unique about your brand or make you forget your big vision.
Got comments or questions? Ping me on X.
-
SOCIAL7 days ago
Musk regrets controversial post but won’t bow to advertiser ‘blackmail’
-
SEO6 days ago
A Year Of AI Developments From OpenAI
-
SOCIAL6 days ago
Is this X’s (formerly Twitter) final goodbye to big advertisers? It looks like it
-
SEO5 days ago
GPT Store Set To Launch In 2024 After ‘Unexpected’ Delays
-
SEARCHENGINES5 days ago
Google Core Update Done Followed By Intense Search Volatility, New Structured Data, Google Ads Head Steps Down & 20 Years Covering Search
-
PPC7 days ago
5 Quick & Easy Ways to Get More Referral Traffic (+Examples)
-
MARKETING7 days ago
Take back your ROI by owning your data
-
TECHNOLOGY6 days ago
Next-gen chips, Amazon Q, and speedy S3