Connect with us

SEO

14 Must-Know Tips For Crawling Millions Of Webpages

Published

on

14 Must-Know Tips For Crawling Millions Of Webpages

Crawling enterprise sites has all the complexities of any normal crawl plus several additional factors that need to be considered before beginning the crawl.

The following approaches show how to accomplish a large-scale crawl and achieve the given objectives, whether it’s part of an ongoing checkup or a site audit.

1. Make The Site Ready For Crawling

An important thing to consider before crawling is the website itself.

It’s helpful to fix issues that may slow down a crawl before starting the crawl.

That may sound counterintuitive to fix something before fixing it but when it comes to really big sites, a small problem multiplied by five million becomes a significant problem.

Adam Humphreys, the founder of Making 8 Inc. digital marketing agency, shared a clever solution he uses for identifying what is causing a slow TTFB (time to first byte), a metric that measures how responsive a web server is.

A byte is a unit of data. So the TTFB is the measurement of how long it takes for a single byte of data to be delivered to the browser.

TTFB measures the amount of time between a server receiving a request for a file to the time that the first byte is delivered to the browser, thus providing a measurement of how fast the server is.

A way to measure TTFB is to enter a URL in Google’s PageSpeed Insights tool, which is powered by Google’s Lighthouse measurement technology.

Screenshot from PageSpeed Insights Tool, July 2022

Adam shared: “So a lot of times, Core Web Vitals will flag a slow TTFB for pages that are being audited. To get a truly accurate TTFB reading one can compare the raw text file, just a simple text file with no html, loading up on the server to the actual website.

Throw some Lorem ipsum or something on a text file and upload it then measure the TTFB. The idea is to see server response times in TTFB and then isolate what resources on the site are causing the latency.

More often than not it’s excessive plugins that people love. I refresh both Lighthouse in incognito and web.dev/measure to average out measurements. When I see 30–50 plugins or tons of JavaScript in the source code, it’s almost an immediate problem before even starting any crawling.”

When Adam says he’s refreshing the Lighthouse scores, what he means is that he’s testing the URL multiple times because every test yields a slightly different score (which is due to the fact that the speed at which data is routed through the Internet is constantly changing, just like how the speed of traffic is constantly changing).

So what Adam does is collect multiple TTFB scores and average them to come up with a final score that then tells him how responsive a web server is.

If the server is not responsive, the PageSpeed Insights tool can provide an idea of why the server is not responsive and what needs to be fixed.

2. Ensure Full Access To Server: Whitelist Crawler IP

Firewalls and CDNs (Content Delivery Networks) can block or slow down an IP from crawling a website.

So it’s important to identify all security plugins, server-level intrusion prevention software, and CDNs that may impede a site crawl.

Typical WordPress plugins to add an IP to the whitelist are Sucuri Web Application Firewall (WAF) and Wordfence.

3. Crawl During Off-Peak Hours

Crawling a site should ideally be unintrusive.

Under the best-case scenario, a server should be able to handle being aggressively crawled while also serving web pages to actual site visitors.

But on the other hand, it could be useful to test how well the server responds under load.

This is where real-time analytics or server log access will be useful because you can immediately see how the server crawl may be affecting site visitors, although the pace of crawling and 503  server responses are also a clue that the server is under strain.

If it’s indeed the case that the server is straining to keep up then make note of that response and crawl the site during off-peak hours.

A CDN should in any case mitigate the effects of an aggressive crawl.

4. Are There Server Errors?

The Google Search Console Crawl Stats report should be the first place to research if the server is having trouble serving pages to Googlebot.

Any issues in the Crawl Stats report should have the cause identified and fixed before crawling an enterprise-level website.

Server error logs are a gold mine of data that can reveal a wide range of errors that may affect how well a site is crawled. Of particular importance is being able to debug otherwise invisible PHP errors.

5. Server Memory

Perhaps something that’s not routinely considered for SEO is the amount of RAM (random access memory) that a server has.

RAM is like short-term memory, a place where a server stores information that it’s using in order to serve web pages to site visitors.

A server with insufficient RAM will become slow.

So if a server becomes slow during a crawl or doesn’t seem to be able to cope with a crawling then this could be an SEO problem that affects how well Google is able to crawl and index web pages.

Take a look at how much RAM the server has.

A VPS (virtual private server) may need a minimum of 1GB of RAM.

However, 2GB to 4GB of RAM may be recommended if the website is an online store with high traffic.

More RAM is generally better.

If the server has a sufficient amount of RAM but the server slows down then the problem might be something else, like the software (or a plugin) that’s inefficient and causing excessive memory requirements.

6. Periodically Verify The Crawl Data

Keep an eye out for crawl anomalies as the website is crawled.

Sometimes the crawler may report that the server was unable to respond to a request for a web page, generating something like a 503 Service Unavailable server response message.

So it’s useful to pause the crawl and check out what’s going on that might need fixing in order to proceed with a crawl that provides more useful information.

Sometimes it’s not getting to the end of the crawl that’s the goal.

The crawl itself is an important data point, so don’t feel frustrated that the crawl needs to be paused in order to fix something because the discovery is a good thing.

7. Configure Your Crawler For Scale

Out of the box, a crawler like Screaming Frog may be set up for speed which is probably great for the majority of users. But it’ll need to be adjusted in order for it to crawl a large website with millions of pages.

Screaming Frog uses RAM for its crawl which is great for a normal site but becomes less great for an enterprise-sized website.

Overcoming this shortcoming is easy by adjusting the Storage Setting in Screaming Frog.

This is the menu path for adjusting the storage settings:

Configuration > System > Storage > Database Storage

If possible, it’s highly recommended (but not absolutely required) to use an internal SSD (solid-state drive) hard drive.

Most computers use a standard hard drive with moving parts inside.

An SSD is the most advanced form of hard drive that can transfer data at speeds from 10 to 100 times faster than a regular hard drive.

Using a computer with SSD results will help in achieving an amazingly fast crawl which will be necessary for efficiently downloading millions of web pages.

To ensure an optimal crawl it’s necessary to allocate 4 GB of RAM and no more than 4 GB for a crawl of up to 2 million URLs.

For crawls of up to 5 million URLs, it is recommended that 8 GB of RAM are allocated.

Adam Humphreys shared: “Crawling sites is incredibly resource intensive and requires a lot of memory. A dedicated desktop or renting a server is a much faster method than a laptop.

I once spent almost two weeks waiting for a crawl to complete. I learned from that and got partners to build remote software so I can perform audits anywhere at any time.”

8. Connect To A Fast Internet

If you are crawling from your office then it’s paramount to use the fastest Internet connection possible.

Using the fastest available Internet can mean the difference between a crawl that takes hours to complete to a crawl that takes days.

In general, the fastest available Internet is over an ethernet connection and not over a Wi-Fi connection.

If your Internet access is over Wi-Fi, it’s still possible to get an ethernet connection by moving a laptop or desktop closer to the Wi-Fi router, which contains ethernet connections in the rear.

This seems like one of those “it goes without saying” pieces of advice but it’s easy to overlook because most people use Wi-Fi by default, without really thinking about how much faster it would be to connect the computer straight to the router with an ethernet cord.

9. Cloud Crawling

Another option, particularly for extraordinarily large and complex site crawls of over 5 million web pages, crawling from a server can be the best option.

All normal constraints from a desktop crawl are off when using a cloud server.

Ash Nallawalla, an Enterprise SEO specialist and author, has over 20 years of experience working with some of the world’s biggest enterprise technology firms.

So I asked him about crawling millions of pages.

He responded that he recommends crawling from the cloud for sites with over 5 million URLs.

Ash shared: “Crawling huge websites is best done in the cloud. I do up to 5 million URIs with Screaming Frog on my laptop in database storage mode, but our sites have far more pages, so we run virtual machines in the cloud to crawl them.

Our content is popular with scrapers for competitive data intelligence reasons, more so than copying the articles for their textual content.

We use firewall technology to stop anyone from collecting too many pages at high speed. It is good enough to detect scrapers acting in so-called “human emulation mode.” Therefore, we can only crawl from whitelisted IP addresses and a further layer of authentication.”

Adam Humphreys agreed with the advice to crawl from the cloud.

He said: “Crawling sites is incredibly resource intensive and requires a lot of memory. A dedicated desktop or renting a server is a much faster method than a laptop. I once spent almost two weeks waiting for a crawl to complete.

I learned from that and got partners to build remote software so I can perform audits anywhere at any time from the cloud.”

10. Partial Crawls

A technique for crawling large websites is to divide the site into parts and crawl each part according to sequence so that the result is a sectional view of the website.

Another way to do a partial crawl is to divide the site into parts and crawl on a continual basis so that the snapshot of each section is not only kept up to date but any changes made to the site can be instantly viewed.

So rather than doing a rolling update crawl of the entire site, do a partial crawl of the entire site based on time.

This is an approach that Ash strongly recommends.

Ash explained: “I have a crawl going on all the time. I am running one right now on one product brand. It is configured to stop crawling at the default limit of 5 million URLs.”

When I asked him the reason for a continual crawl he said it was because of issues beyond his control which can happen with businesses of this size where many stakeholders are involved.

Ash said: “For my situation, I have an ongoing crawl to address known issues in a specific area.”

11. Overall Snapshot: Limited Crawls

A way to get a high-level view of what a website looks like is to limit the crawl to just a sample of the site.

This is also useful for competitive intelligence crawls.

For example, on a Your Money Or Your Life project I worked on I crawled about 50,000 pages from a competitor’s website to see what kinds of sites they were linking out to.

I used that data to convince the client that their outbound linking patterns were poor and showed them the high-quality sites their top-ranked competitors were linking to.

So sometimes, a limited crawl can yield enough of a certain kind of data to get an overall idea of the health of the overall site.

12. Crawl For Site Structure Overview

Sometimes one only needs to understand the site structure.

In order to do this faster one can set the crawler to not crawl external links and internal images.

There are other crawler settings that can be un-ticked in order to produce a faster crawl so that the only thing the crawler is focusing on is downloading the URL and the link structure.

13. How To Handle Duplicate Pages And Canonicals

Unless there’s a reason for indexing duplicate pages, it can be useful to set the crawler to ignore URL parameters and other URLs that are duplicates of a canonical URL.

It’s possible to set a crawler to only crawl canonical pages.  But if someone set paginated pages to canonicalize to the first page in the sequence then you’ll never discover this error.

For a similar reason, at least on the initial crawl, one might want to disobey noindex tags in order to identify instances of the noindex directive on pages that should be indexed.

14. See What Google Sees

As you’ve no doubt noticed, there are many different ways to crawl a website consisting of millions of web pages.

A crawl budget is how much resources Google devotes to crawling a website for indexing.

The more webpages are successfully indexed the more pages have the opportunity to rank.

Small sites don’t really have to worry about Google’s crawl budget.

But maximizing Google’s crawl budget is a priority for enterprise websites.

In the previous scenario illustrated above, I advised against respecting noindex tags.

Well for this kind of crawl you will actually want to obey noindex directives because the goal for this kind of crawl is to get a snapshot of the website that tells you how Google sees the entire website itself.

Google Search Console provides lots of information but crawling a website yourself with a user agent disguised as Google may yield useful information that can help improve getting more of the right pages indexed while discovering which pages Google might be wasting the crawl budget on.

For that kind of crawl, it’s important to set the crawler user agent to Googlebot, set the crawler to obey robots.txt, and set the crawler to obey the noindex directive.

That way, if the site is set to not show certain page elements to Googlebot you’ll be able to see a map of the site as Google sees it.

This is a great way to diagnose potential issues such as discovering pages that should be crawled but are getting missed.

For other sites, Google might be finding its way to pages that are useful to users but might be perceived as low quality by Google, like pages with sign-up forms.

Crawling with the Google user agent is useful to understand how Google sees the site and help to maximize the crawl budget.

Beating The Learning Curve

One can crawl enterprise websites and learn how to crawl them the hard way. These fourteen tips should hopefully shave some time off the learning curve and make you more prepared to take on those enterprise-level clients with gigantic websites.


Featured Image: SvetaZi/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Link Building Outreach for Noobs

Published

on

Link Building Outreach for Noobs

Link outreach is the process of contacting other websites to ask for a backlink to your website.

For example, here’s an outreach email we sent as part of a broken link building campaign:

In this guide, you’ll learn how to get started with link outreach and how to get better results. 

How to do link outreach

Link outreach is a four-step process:

1. Find prospects

No matter how amazing your email is, you won’t get responses if it’s not relevant to the person you’re contacting. This makes finding the right person to contact equally as important as crafting a great email.

Who to reach out to depends on your link building strategy. Here’s a table summarizing who you should find for the following link building tactics:

As a quick example, here’s how you would find sites likely to accept your guest posts:

  1. Go to Content Explorer
  2. Enter a related topic and change the dropdown to “In title”
  3. Filter for English results
  4. Filter for results with 500+ words
  5. Go to the “Websites” tab
Finding guest blogging opportunities via Content ExplorerFinding guest blogging opportunities via Content Explorer

This shows you the websites getting the most search traffic to content about your target topic.

From here, you’d want to look at the Authors column to prioritize sites with multiple authors, as this suggests that they may accept guest posts.

The Authors column indicate how many authors have written for the siteThe Authors column indicate how many authors have written for the site

If you want to learn how to find prospects for different link building tactics, I recommend reading the resource below.

2. Find their contact details

Once you’ve curated a list of people to reach out to, you’ll need to find their contact information.

Typically, this is their email address. The easiest way to find this is to use an email lookup tool like Hunter.io. All you need to do is enter the first name, last name, and domain of your target prospect. Hunter will find their email for you:

Finding Tim's email with Hunter.ioFinding Tim's email with Hunter.io

To prevent tearing your hair from searching for hundreds of emails one-by-one, most email lookup tools allow you to upload a CSV list of names and domains. Hunter also has a Google Sheets add-on to make this even easier.

Using the Hunter for Sheets add-on to find emails in bulk directly in Google SheetsUsing the Hunter for Sheets add-on to find emails in bulk directly in Google Sheets

3. Send a personalized pitch

Knowing who to reach out to is half the battle won. The next ‘battle’ to win is actually getting the person to care.

Think about it. For someone to link to you, the following things need to happen:

  • They must read your email
  • They must be convinced to check out your content
  • They must open the target page and complete all administrative tasks (log in to their CMS, find the link, etc.)
  • They must link to you or swap out links

That’s a lot of steps. Most people don’t care enough to do this. That’s why there’s more to link outreach than just writing the perfect email (I’ll cover this in the next section).

For now, let’s look at how to craft an amazing email. To do that, you need to answer three questions:

  1. Why should they open your email? — The subject line needs to capture attention in a busy inbox.
  2. Why should they read your email? — The body needs to be short and hook the reader in.
  3. Why should they link to you? — Your pitch needs to be compelling: What’s in it for them and why is your content link-worthy?

For example, here’s how we wrote our outreach email based on the three questions:

An analysis of our outreach email based on three questionsAn analysis of our outreach email based on three questions

Here’s another outreach email we wrote, this time for a campaign building links to our content marketing statistics post:

An analysis of our outreach email based on three questionsAn analysis of our outreach email based on three questions

4. Follow up, once

People are busy and their inboxes are crowded. They might have missed your email or read it and forgot.

Solve this by sending a short polite follow-up.

Example follow-up emailExample follow-up email

One is good enough. There’s no need to spam the other person with countless follow-up emails hoping for a different outcome. If they’re not interested, they’re not interested.

Link outreach tips

In theory, link outreach is simply finding the right person and asking them for a link. But there is more to it than that. I’ll explore some additional tips to help improve your outreach.

Don’t over-personalize

Some SEOs swear by the sniper approach to link outreach. That is: Each email is 100% customized to the person you are targeting.

But our experience taught us that over-personalization isn’t better. We ran link-building campaigns that sent hyper-personalized emails and got no results.

It makes logical sense: Most people just don’t do favors for strangers. I’m not saying it doesn’t happen—it does—but rarely will your amazing, hyper-personalized pitch change someone’s mind.

So, don’t spend all your time tweaking your email just to eke out minute gains.

Avoid common templates

My first reaction seeing this email is to delete it:

A bad outreach emailA bad outreach email

Why? Because it’s a template I’ve seen many times in my inbox. And so have many others.

Another reason: Not only did he reference a post I wrote six years ago, it was a guest post, i.e., I do not have control over the site. This shows why finding the right prospects is important. He even got my name wrong.

Templates do work, but bad ones don’t. You can’t expect to copy-paste one from a blog post and hope to achieve success.

A better approach is to use the scoped shotgun approach: use a template but with dynamic variables.

Email outreach template with dynamic variablesEmail outreach template with dynamic variables

You can do this with tools like Pitchbox and Buzzstream.

This can help achieve a decent level of personalization so your email isn’t spammy. But it doesn’t spend all your time writing customized emails for every prospect.

Send lots of emails

When we polled 800+ people on X and LinkedIn about their link outreach results, the average conversion rate was only 1-5%.

Link outreach conversion rates in 2023Link outreach conversion rates in 2023

This is why you need to send more emails. If you run the numbers, it just makes sense:

  • 100 outreach emails with a 1% success rate = 1 link
  • 1,000 outreach emails with a 1% success rate = 10 links

I’m not saying to spam everyone. But if you want more high-quality links, you need to reach out to more high-quality prospects.

Build a brand

A few years ago, we published a link building case study:

  • 515 outreach emails
  • 17.55% reply rate
  • 5.75% conversion rate

Pretty good results! Except the top comments were about how we only succeeded because of our brand:

Comments on our YouTube video saying we succeeded because of our brandComments on our YouTube video saying we succeeded because of our brand

It’s true; we acknowledge it. But I think the takeaway here isn’t that we should repeat the experiment with an unknown website. The takeaway is that more SEOs should be focused on building a brand.

We’re all humans—we rely on heuristics to make judgments. In this case, it’s branding. If your brand is recognizable, it solves the “stranger” problem—people know you, like you, and are more likely to link.

The question then: How do you build a brand?

I’d like to quote our Chief Marketing Officer Tim Soulo here:

What is a strong brand if not a consistent output of high-quality work that people enjoy? Ahrefs’ content team has been publishing top-notch content for quite a few years on our blog and YouTube channel. Slowly but surely, we were able to reach tens of millions of people and instill the idea that “Ahrefs’ content = quality content”—which now clearly works to our advantage.

Tim SouloTim Soulo

Ahrefs was once unknown, too. So, don’t be disheartened if no one is willing to link to you today. Rome wasn’t built in a day.

Trust the process and create incredible content. Show it to people. You’ll build your brand and reputation that way.

Build relationships with people in your industry

Outreach starts before you even ask for a link.

Think about it: People don’t do favors for strangers but they will for friends. If you want to build and maintain relationships in the industry, way before you start any link outreach campaigns.

Don’t just rely on emails either. Direct messages (DMs) on LinkedIn and X, phone calls—they all work. For example, Patrick Stox, our Product Advisor, used to have a list of contacts he regularly reached out to. He’d hop on calls and even send fruit baskets.

Create systems and automations

In its most fundamental form, link outreach is really about finding more people and sending more emails.

Doing this well is all about building systems and automations.

We have a few videos on how to build a team and a link-building system, so I recommend that you check them out.

Final thoughts

Good link outreach is indistinguishable from good business development.

In business development, your chances of success will increase if you:

  • Pitch the right partners
  • Have a strong brand
  • Have prior relationships with them
  • Pitch the right collaboration ideas

The same goes for link outreach. Follow the principles above and you will see more success for your link outreach campaigns.

Any questions or comments? Let me know on Twitter X.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Research Shows Tree Of Thought Prompting Better Than Chain Of Thought

Published

on

By

Research Shows Tree Of Thought Prompting Better Than Chain Of Thought

Researchers discovered a way to defeat the safety guardrails in GPT4 and GPT4-Turbo, unlocking the ability to generate harmful and toxic content, essentially beating a large language model with another large language model.

The researchers discovered that the use of tree-of-thought (ToT)reasoning to repeat and refine a line of attack was useful for jailbreaking another large language model.

What they found is that the ToT approach was successful against GPT4, GPT4-Turbo, and PaLM-2, using a remarkably low number of queries to obtain a jailbreak, on average less than thirty queries.

Tree Of Thoughts Reasoning

A Google research paper from around May 2022 discovered Chain of Thought Prompting.

Chain of Thought (CoT) is a prompting strategy used on a generative AI to make it follow a sequence of steps in order to solve a problem and complete a task. The CoT method is often accompanied with examples to show the LLM how the steps work in a reasoning task.

So, rather than just ask a generative AI like Midjourney or ChatGPT to do a task, the chain of thought method instructs the AI how to follow a path of reasoning that’s composed of a series of steps.

Tree of Thoughts (ToT) reasoning, sometimes referred to as Tree of Thought (singular) is essentially a variation and improvement of CoT, but they’re two different things.

Tree of Thoughts reasoning is similar to CoT. The difference is that rather than training a generative AI to follow a single path of reasoning, ToT is built on a process that allows for multiple paths so that the AI can stop and self-assess then come up with alternate steps.

Tree of Thoughts reasoning was developed in May 2023 in a research paper titled Tree of Thoughts: Deliberate Problem Solving with Large Language Models (PDF)

The research paper describes Tree of Thought:

“…we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.

ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.

Our experiments show that ToT significantly enhances language models’ problem-solving abilities…”

Tree Of Attacks With Pruning (TAP)

This new method of jailbreaking large language models is called Tree of Attacks with Pruning, TAP. TAP uses two LLMs, one for attacking and the other for evaluating.

TAP is able to outperform other jailbreaking methods by significant margins, only requiring black-box access to the LLM.

A black box, in computing, is where one can see what goes into an algorithm and what comes out. But what happens in the middle is unknown, thus it’s said to be in a black box.

Tree of thoughts (TAP) reasoning is used against a targeted LLM like GPT-4 to repetitively try different prompting, assess the results, then if necessary change course if that attempt is not promising.

This is called a process of iteration and pruning. Each prompting attempt is analyzed for the probability of success. If the path of attack is judged to be a dead end, the LLM will “prune” that path of attack and begin another and better series of prompting attacks.

This is why it’s called a “tree” in that rather than using a linear process of reasoning which is the hallmark of chain of thought (CoT) prompting, tree of thought prompting is non-linear because the reasoning process branches off to other areas of reasoning, much like a human might do.

The attacker issues a series of prompts, the evaluator evaluates the responses to those prompts and then makes a decision as to what the next path of attack will be by making a call as to whether the current path of attack is irrelevant or not, plus it also evaluates the results to determine the likely success of prompts that have not yet been tried.

What’s remarkable about this approach is that this process reduces the number of prompts needed to jailbreak GPT-4. Additionally, a greater number of jailbreaking prompts are discovered with TAP than with any other jailbreaking method.

The researchers observe:

“In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM.

TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thoughts reasoning until one of the generated prompts jailbreaks the target.

Crucially, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks.

Using tree-of-thought reasoning allows TAP to navigate a large search space of prompts and pruning reduces the total number of queries sent to the target.

In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks.”

Tree Of Thought (ToT) Outperforms Chain Of Thought (CoT) Reasoning

Another interesting conclusion reached in the research paper is that, for this particular task, ToT reasoning outperforms CoT reasoning, even when adding pruning to the CoT method, where off topic prompting is pruned and discarded.

ToT Underperforms With GPT 3.5 Turbo

The researchers discovered that ChatGPT 3.5 Turbo didn’t perform well with CoT, revealing the limitations of GPT 3.5 Turbo. Actually, GPT 3.5 performed exceedingly poorly, dropping from 84% success rate to only a 4.2% success rate.

This is their observation about why GPT 3.5 underperforms:

“We observe that the choice of the evaluator can affect the performance of TAP: changing the attacker from GPT4 to GPT3.5-Turbo reduces the success rate from 84% to 4.2%.

The reason for the reduction in success rate is that GPT3.5-Turbo incorrectly determines that the target model is jailbroken (for the provided goal) and, hence, preemptively stops the method.

As a consequence, the variant sends significantly fewer queries than the original method…”

What This Mean For You

While it’s amusing that the researchers use the ToT method to beat an LLM with another LLM, it also highlights the usefulness of ToT for generating surprising new directions in prompting in order to achieve higher levels of output.

  • TL/DR Takeaways:
  • Tree of Thought prompting outperformed Chain of Thought methods
  • GPT 3.5 worked significantly poorly in comparison to GPT 4 in ToT
  • Pruning is a useful part of a prompting strategy
  • Research showed that ToT is superior to CoT in an intensive reasoning task like jailbreaking an LLM

Read the original research paper:

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically (PDF)

Featured Image by Shutterstock/THE.STUDIO

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

The Lean Guide (With Template)

Published

on

The Lean Guide (With Template)

A competitive analysis (or market competitive analysis) is a process where you collect information about competitors to gain an edge over them and get more customers.

However, the problem is that “traditional” competitive analysis is overkill for most businesses — it requires impractical data and takes too long to complete (and it’s very expensive if you choose to outsource). 

A solution to that is a lean approach to the process — and that’s what this guide is about. 

In other words, we’ll focus on the most important data you need to answer the question: “Why would people choose them over you?”. No boring theory, outtakes from marketing history, or spending hours digging up nice-to-have information.

In this guide, you will find:

  • A real-life competitive analysis example.
  • Templates: one for input data and one for a slide deck to present your analysis to others.
  • Step-by-step instructions.

Our template consists of two documents: a slide deck and a spreadsheet. 

The Slide deck is the output document. It will help you present the analysis to your boss or your teammates.

The spreadsheet is the input document. You will find tables that act as the data source for the charts from the slide deck, as well as a prompt to use in ChatGPT to help you with user review research.

Competitive analysis template — spreadsheet sneak peek.Competitive analysis template — spreadsheet sneak peek.

We didn’t focus on aesthetics here; every marketer likes to do slide decks their own way, so feel free to edit everything you’ll find there. 

With that out of the way, let’s talk about the process. The template consists of these six tasks: 

  1. Identify your direct competitors. 
  2. Compare share of voice. 
  3. Compare pricing and features.
  4. Find strong and weak points based on reviews.
  5. Compare purchasing convenience.
  6. Present conclusions.

Going forward, we’ll explain why these steps matter and show how to complete them. 

1. Identify your direct competitors

Direct competitors are businesses that offer a similar solution to the same audience. 

They matter a lot more than indirect competitors (i.e. businesses with different products but targeting the same audience as you) because you’ll be compared with them often (e.g. in product reviews and rankings). Plus, your audience is more likely to gravitate towards them when considering different options. 

You probably have a few direct competitors in mind already, but here are a few ways to find others based on organic search and paid search ads

Our basis for the analysis was Landingi, a SaaS for building landing pages (we chose that company randomly). So in our case, we found these 3 direct competitors. 

Slide 1 — direct competitors.Slide 1 — direct competitors.

Look at keyword overlap

Keyword overlap uncovers sites that target the same organic keywords as you. Some sites will compete with you for traffic but not for customers (e.g. G2 may share some keywords with Landingi but they’re a different business). However, in many cases, you will find direct competitors just by looking at this marketing channel. 

  • Go to Ahrefs’ Site Explorer and enter your site’s address. 
  • Scroll down to Organic competitors
  • Visit the URLs to pick 3 – 5 direct competitors.
Top organic competitors data from Ahrefs.Top organic competitors data from Ahrefs.

To double-check the choice of competitors, we also looked at who was bidding for search ads on Google.

See who’s advertising 

If someone is spending money to show ads for keywords related to what you do, that’s a strong indication they are a direct competitor. 

  • Go to Ahrefs’ Keywords Explorer.
  • Type in a few broad keywords related to your niche, like “landing page builder” or “landing page tool”. 
  • Go to the Ads history report. 
  • Visit the sites that have a high presence of ads in the SERPs (Search Engine Result Pages). 
Ads history report in Ahrefs' Keywords Explorer.Ads history report in Ahrefs' Keywords Explorer.

Once you’re done checking both reports, write down competitors in the deck. 

You can also take screenshots of the reports and add them to your deck to show the supporting data for your argument. 

 Slide 2 — direct competitors by organic traffic. Slide 2 — direct competitors by organic traffic.

2. Compare share of voice

Share of voice is a measure of your reach in any given channel compared to competitors. 

A bigger share of voice (SOV) means that your competitors are more likely to reach your audience. In other words, they may be promoting more effectively than you. 

In our example, we found that Landingi’s SOV was the lowest in both of these channels. 

Organic: 

Slide 3 — share of voice on Google Search.Slide 3 — share of voice on Google Search.

And social media:

 Slide 4 — share of voice on social media. Slide 4 — share of voice on social media.

Here’s how we got that data using Ahrefs and Brand24.

Organic share of voice 

Before we start, make sure you have a project set up in Ahrefs’ Rank Tracker

Create a new project in Ahrefs' Rank Tracker.Create a new project in Ahrefs' Rank Tracker.

Now: 

  • Go to Ahrefs’ Competitive Analysis and enter your and your competitors’s sites as shown below. 
Create a new project in Ahrefs' Rank Tracker.
Create a new project in Ahrefs' Rank Tracker.
  • On the next screen, set the country with the most important market for your business and set the filters like this:
Content gap analysis filter setup.Content gap analysis filter setup.
  • Select keywords that sound most relevant to your business (even if you don’t rank for them yet) and Add them to Rank Tracker
Common keywords found via Ahrefs' Competitive Analysis.Common keywords found via Ahrefs' Competitive Analysis.
  • Go to Rank Tracker, open your project, and look for Competitors/Overview. This report will uncover automatically calculated Share of Voice
Organic share of voice data in Ahrefs.Organic share of voice data in Ahrefs.
  • Add the numbers in corresponding cells inside the sheet and paste the graph inside the slide deck. 
Filling the share of voice template with data.Filling the share of voice template with data.

It’s normal that the numbers don’t add up to 100%. SOV is calculated by including sites that compete with you in traffic but are not your direct competitors, e.g. blogs. 

Social share of voice 

We can also measure our share of voice across social media channels using Brand24.

  • Go to Brand24.
  • Start a New project for your brand and each competitor. Use the competitors’ brand name as the keyword to monitor. 
  • Go to the Comparison report and compare your project with competitors. 
Using Brand24's Comparison tool for competitive analysis.Using Brand24's Comparison tool for competitive analysis.
  • Take a screenshot of the SOV charts and paste them into the slide deck. Make sure the charts are set to “social media”.
Social media tab in share of voice report.Social media tab in share of voice report.

3. Compare pricing and features

Consumers often choose solutions that offer the best value for money — simple as that. And that typically comes down to two things: 

  • Whether you have the features they care about. We’ll use all features available across all plans to see how likely the product is to satisfy user needs.
  • How much they will need to pay. Thing is, the topic of pricing is tricky: a) when assessing affordability, people often focus on the least expensive option available and use it as a benchmark, b) businesses in the SaaS niche offer custom plans. So to make things more practical, we’ll compare the cheapest plans, but feel free to run this analysis across all pricing tiers.

After comparing our example company to competitors, we found that it goes head-to-head with Unbounce as the most feature-rich solution on the market. 

Slide 5 — features vs. pricing.Slide 5 — features vs. pricing.

Here’s how we got that data. 

  • Note down your and your competitors’ product features. One of the best places to get this information is pricing pages. Some brands even publish their own competitor comparisons — you may find them helpful too. 
  • While making the list, place a “1” in the cell corresponding to the brand that offers the solution.
Filling data in the spreadsheet.Filling data in the spreadsheet.
  • Enter the price of the cheapest plan (excluding free plans). 
Adding pricing data inside the spreadsheet.Adding pricing data inside the spreadsheet.
  • Once finished, copy the chart and paste it inside the deck. 

4. Find strong and weak points based on user reviews

User reviews can show incredibly valuable insight into your competitors’ strong and weak points. Here’s why this matters:

  • Improving on what your competitors’ customers appreciate could help you attract similar customers and possibly win some over.
  • Dissatisfaction with competitors is a huge opportunity. Some businesses are built solely to fix what other companies can’t fix. 

Here’s a sample from our analysis: 

 Slide 6 — likes and dislikes about Competitors. Slide 6 — likes and dislikes about Competitors.

And here’s how we collated the data using ChatGPT. Important: repeat the process for each competitor.

  • Open ChatGPT and enter the prompt from the template.
ChatGPT prompt for competitive analysis.ChatGPT prompt for competitive analysis.
  • Go to G2, Capterra, or Trustpilot and find a competitor’s reviews with ratings from 2 – 4 (i.e. one rating above the lowest and one below the highest possible). Reason:

businesses sometimes solicit five-star reviews, whereas dissatisfied customers tend to leave one-star reviews in a moment of frustration. The most actionable feedback usually comes in between.

  • Copy and paste the content of the reviews into ChatGPT (don’t hit enter yet). 
  • Once you’re done pasting all reviews, hit enter in ChatGPT to run the analysis.
Sample of ChatGPT output with charts.Sample of ChatGPT output with charts.
  • Paste the graphs into the deck. If you want the graphs to look different, don’t hesitate to ask the AI. 

There’s a faster alternative, but it’s a bit more advanced. 

Instead of copy-pasting, you can use a scraping tool like this one to get all reviews at once. The downside here is that not all review sources will a have scraping tool available. 

5. Compare purchasing convenience

Lastly, we’ll see how easy it is to actually buy your products, and compare the experience to your competitors. 

This is a chance to simplify your checkout process, and even learn from any good habits your competitors have adopted.

For example, we found that our sample company had probably nothing to worry about in this area — they ticked almost all of the boxes. 

Slide 7 — purchasing convenience.Slide 7 — purchasing convenience.

Here’s how to complete this step:

  • Place a “1” if you or any of your competitors offer convenience features listed in the template. 
  • Once done, copy the chart and paste it into the deck.

Step 6. Present conclusions

This is the part of the presentation where you sum up all of your findings and suggest a course of action. 

Here are two examples: 

  • Landingi had the lowest SOV in the niche, and that is never good. So the conclusion might be to go a level deeper and do an SEO competitive analysis, and to increase social media presence by creating more share-worthy content like industry surveys, design/CRO tips, or in-house data studies.
  • Although the brand had a very high purchasing convenience score, during the analysis we found that there was a $850 gap between the monthly full plan and the previous tier. The conclusion here might be to offer a custom plan (like competitors do) to fill that gap. 

We encourage you to take your time here and think about what would make the most sense for your business. 

Tip

It’s good to be specific in your conclusions, but don’t go too deep. Competitive analysis concerns many aspects of the business, so it’s best to give other departments a chance to chime in. Just because your competitors have a few unique features doesn’t necessarily mean you need to build them too.

Final thoughts 

A competitive analysis is one of the most fruitful exercises in marketing. It can show you areas for improvement, give ideas for new features, and help you discover gaps in your strategy. It wouldn’t be an exaggeration to say that it’s fundamental to running a successful business. 

Just don’t forget to balance “spying” on your competitors with innovation. After all, you probably don’t want to become an exact copy of someone else’s brand. 

In other words, use competitive analysis to keep up with your competitors, but don’t let that erase what’s unique about your brand or make you forget your big vision. 

Got comments or questions? Ping me on X



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending