Connect with us

SEO

11 Disadvantages Of ChatGPT Content

Published

on

11 Disadvantages Of ChatGPT Content

ChatGPT produces content that is comprehensive and plausibly accurate.

But researchers, artists, and professors warn of shortcomings to be aware of which degrade the quality of the content.

In this article, we’ll look at 11 disadvantages of ChatGPT content. Let’s dive in.

1. Phrase Usage Makes It Detectable As Non-Human

Researchers studying how to detect machine-generated content have discovered patterns that make it sound unnatural.

One of these quirks is how AI struggles with idioms.

An idiom is a phrase or saying with a figurative meaning attached to it, for example, “every cloud has a silver lining.” 

A lack of idioms within a piece of content can be a signal that the content is machine-generated – and this can be part of a detection algorithm.

This is what the 2022 research paper Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers says about this quirk in machine-generated content:

“Complex phrasal features are based on the frequency of specific words and phrases within the analyzed text that occur more frequently in human text.

…Of these complex phrasal features, idiom features retain the most predictive power in detection of current generative models.”

This inability to use idioms contributes to making ChatGPT output sound and read unnaturally.

2. ChatGPT Lacks Ability For Expression

An artist commented on how the output of ChatGPT mimics what art is, but lacks the actual qualities of artistic expression.

Expression is the act of communicating thoughts or feelings.

ChatGPT output doesn’t contain expressions, only words.

It cannot produce content that touches people emotionally on the same level as a human can – because it has no actual thoughts or feelings.

Musical artist Nick Cave, in an article posted to his Red Hand Files newsletter, commented on a ChatGPT lyric that was sent to him, which was created in the style of Nick Cave.

He wrote:

“What makes a great song great is not its close resemblance to a recognizable work.

…it is the breathless confrontation with one’s vulnerability, one’s perilousness, one’s smallness, pitted against a sense of sudden shocking discovery; it is the redemptive artistic act that stirs the heart of the listener, where the listener recognizes in the inner workings of the song their own blood, their own struggle, their own suffering.”

Cave called the ChatGPT lyrics a mockery.

This is the ChatGPT lyric that resembles a Nick Cave lyric:

“I’ve got the blood of angels, on my hands
I’ve got the fire of hell, in my eyes
I’m the king of the abyss, I’m the ruler of the dark
I’m the one that they fear, in the shadows they hark”

And this is an actual Nick Cave lyric (Brother, My Cup Is Empty):

“Well I’ve been sliding down on rainbows
I’ve been swinging from the stars
Now this wretch in beggar’s clothing
Bangs his cup across the bars
Look, this cup of mine is empty!
Seems I’ve misplaced my desires
Seems I’m sweeping up the ashes
Of all my former fires”

It’s easy to see that the machine-generated lyric resembles the artist’s lyric, but it doesn’t really communicate anything.

Nick Cave’s lyrics tell a story that resonates with the pathos, desire, shame, and willful deception of the person speaking in the song. It expresses thoughts and feelings.

It’s easy to see why Nick Cave calls it a mockery.

3. ChatGPT Does Not Produce Insights

An article published in The Insider quoted an academic who noted that academic essays generated by ChatGPT lack insights about the topic.

ChatGPT summarizes the topic but does not offer a unique insight into the topic.

Humans create through knowledge, but also through their personal experience and subjective perceptions.

Professor Christopher Bartel of Appalachian State University is quoted by The Insider as saying that, while a ChatGPT essay may exhibit high grammar qualities and sophisticated ideas, it still lacked insight.

Bartel said:

“They are really fluffy. There’s no context, there’s no depth or insight.”

Insight is the hallmark of a well-done essay and it’s something that ChatGPT is not particularly good at.

This lack of insight is something to keep in mind when evaluating machine-generated content.

4. ChatGPT Is Too Wordy

A research paper published in January 2023 discovered patterns in ChatGPT content that makes it less suitable for critical applications.

The paper is titled, How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.

The research showed that humans preferred answers from ChatGPT in more than 50% of questions answered related to finance and psychology.

But ChatGPT failed at answering medical questions because humans preferred direct answers – something the AI didn’t provide.

The researchers wrote:

“…ChatGPT performs poorly in terms of helpfulness for the medical domain in both English and Chinese.

The ChatGPT often gives lengthy answers to medical consulting in our collected dataset, while human experts may directly give straightforward answers or suggestions, which may partly explain why volunteers consider human answers to be more helpful in the medical domain.”

ChatGPT tends to cover a topic from different angles, which makes it inappropriate when the best answer is a direct one.

Marketers using ChatGPT must take note of this because site visitors requiring a direct answer will not be satisfied with a verbose webpage.

And good luck ranking an overly wordy page in Google’s featured snippets, where a succinct and clearly expressed answer that can work well in Google Voice may have a better chance to rank than a long-winded answer.

OpenAI, the makers of ChatGPT, acknowledges that giving verbose answers is a known limitation.

The announcement article by OpenAI states:

“The model is often excessively verbose…”

The ChatGPT bias toward providing long-winded answers is something to be mindful of when using ChatGPT output, as you may encounter situations where shorter and more direct answers are better.

5. ChatGPT Content Is Highly Organized With Clear Logic

ChatGPT has a writing style that is not only verbose but also tends to follow a template that gives the content a unique style that isn’t human.

This inhuman quality is revealed in the differences between how humans and machines answer questions.

The movie Blade Runner has a scene featuring a series of questions designed to reveal whether the subject answering the questions is a human or an android.

These questions were a part of a fictional test called the “Voigt-Kampff test“.

One of the questions is:

“You’re watching television. Suddenly you realize there’s a wasp crawling on your arm. What do you do?”

A normal human response would be to say something like they would scream, walk outside and swat it, and so on.

But when I posed this question to ChatGPT, it offered a meticulously organized answer that summarized the question and then offered logical multiple possible outcomes – failing to answer the actual question.

Screenshot Of ChatGPT Answering A Voight-Kampff Test Question

Screenshot from ChatGPT, January 2023

The answer is highly organized and logical, giving it a highly unnatural feel, which is undesirable.

6. ChatGPT Is Overly Detailed And Comprehensive

ChatGPT was trained in a way that rewarded the machine when humans were happy with the answer.

The human raters tended to prefer answers that had more details.

But sometimes, such as in a medical context, a direct answer is better than a comprehensive one.

What that means is that the machine needs to be prompted to be less comprehensive and more direct when those qualities are important.

From OpenAI:

“These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.”

7. ChatGPT Lies (Hallucinates Facts)

The above-cited research paper, How Close is ChatGPT to Human Experts?, noted that ChatGPT has a tendency to lie.

It reports:

“When answering a question that requires professional knowledge from a particular field, ChatGPT may fabricate facts in order to give an answer…

For example, in legal questions, ChatGPT may invent some non-existent legal provisions to answer the question.

…Additionally, when a user poses a question that has no existing answer, ChatGPT may also fabricate facts in order to provide a response.”

The Futurism website documented instances where machine-generated content published on CNET was wrong and full of “dumb errors.”

CNET should have had an idea this could happen, because OpenAI published a warning about incorrect output:

“ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.”

CNET claims to have submitted the machine-generated articles to human review prior to publication.

A problem with human review is that ChatGPT content is designed to sound persuasively correct, which may fool a reviewer who is not a topic expert.

8. ChatGPT Is Unnatural Because It’s Not Divergent

The research paper, How Close is ChatGPT to Human Experts? also noted that human communication can have indirect meaning, which requires a shift in topic to understand it.

ChatGPT is too literal, which causes the answers to sometimes miss the mark because the AI overlooks the actual topic.

The researchers wrote:

“ChatGPT’s responses are generally strictly focused on the given question, whereas humans’ are divergent and easily shift to other topics.

In terms of the richness of content, humans are more divergent in different aspects, while ChatGPT prefers focusing on the question itself.

Humans can answer the hidden meaning under the question based on their own common sense and knowledge, but the ChatGPT relies on the literal words of the question at hand…”

Humans are better able to diverge from the literal question, which is important for answering “what about” type questions.

For example, if I ask:

“Horses are too big to be a house pet. What about raccoons?”

The above question is not asking if a raccoon is an appropriate pet. The question is about the size of the animal.

ChatGPT focuses on the appropriateness of the raccoon as a pet instead of focusing on the size.

Screenshot of an Overly Literal ChatGPT Answer

11 Disadvantages Of ChatGPT ContentScreenshot from ChatGPT, January 2023

9. ChatGPT Contains A Bias Towards Being Neutral

The output of ChatGPT is generally neutral and informative. It’s a bias in the output that can appear helpful but isn’t always.

The research paper we just discussed noted that neutrality is an unwanted quality when it comes to legal, medical, and technical questions.

Humans tend to pick a side when offering these kinds of opinions.

10. ChatGPT Is Biased To Be Formal

ChatGPT output has a bias that prevents it from loosening up and answering with ordinary expressions. Instead, its answers tend to be formal.

Humans, on the other hand, tend to answer questions with a more colloquial style, using everyday language and slang – the opposite of formal.

ChatGPT doesn’t use abbreviations like GOAT or TL;DR.

The answers also lack instances of irony, metaphors, and humor, which can make ChatGPT content overly formal for some content types.

The researchers write:

“…ChatGPT likes to use conjunctions and adverbs to convey a logical flow of thought, such as “In general”, “on the other hand”, “Firstly,…, Secondly,…, Finally” and so on.

11. ChatGPT Is Still In Training

ChatGPT is currently still in the process of training and improving.

OpenAI recommends that all content generated by ChatGPT should be reviewed by a human, listing this as a best practice.

OpenAI suggests keeping humans in the loop:

“Wherever possible, we recommend having a human review outputs before they are used in practice.

This is especially critical in high-stakes domains, and for code generation.

Humans should be aware of the limitations of the system, and have access to any information needed to verify the outputs (for example, if the application summarizes notes, a human should have easy access to the original notes to refer back).”

Unwanted Qualities Of ChatGPT

It’s clear that there are many issues with ChatGPT that make it unfit for unsupervised content generation. It contains biases and fails to create content that feels natural or contains genuine insights.

Further, its inability to feel or author original thoughts makes it a poor choice for generating artistic expressions.

Users should apply detailed prompts in order to generate content that is better than the default content it tends to output.

Lastly, human review of machine-generated content is not always enough, because ChatGPT content is designed to appear correct, even when it’s not.

That means it’s important that human reviewers are subject-matter experts who can discern between correct and incorrect content on a specific topic.

More resources: 


Featured image by Shutterstock/fizkes



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Mozilla VPN Security Risks Discovered

Published

on

By

Mozilla VPN Security Risks Discovered

Mozilla published the results of a recent third-party security audit of its VPN services as part of it’s commitment to user privacy and security. The survey revealed security issues which were presented to Mozilla to be addressed with fixes to ensure user privacy and security.

Many search marketers use VPNs during the course of their business especially when using a Wi-Fi connection in order to protect sensitive data, so the  trustworthiness of a VNP is essential.

Mozilla VPN

A Virtual Private Network (VPN), is a service that hides (encrypts) a user’s Internet traffic so that no third party (like an ISP) can snoop and see what sites a user is visiting.

VPNs also add a layer of security from malicious activities such as session hijacking which can give an attacker full access to the websites a user is visiting.

There is a high expectation from users that the VPN will protect their privacy when they are browsing on the Internet.

Mozilla thus employs the services of a third party to conduct a security audit to make sure their VPN is thoroughly locked down.

Security Risks Discovered

The audit revealed vulnerabilities of medium or higher severity, ranging from Denial of Service (DoS). risks to keychain access leaks (related to encryption) and the lack of access controls.

Cure53, the third party security firm, discovered and addressed several risks. Among the issues were potential VPN leaks to the vulnerability of a rogue extension that disabled the VPN.

The scope of the audit encompassed the following products:

  • Mozilla VPN Qt6 App for macOS
  • Mozilla VPN Qt6 App for Linux
  • Mozilla VPN Qt6 App for Windows
  • Mozilla VPN Qt6 App for iOS
  • Mozilla VPN Qt6 App for Androi

These are the risks identified by the security audit:

  • FVP-03-003: DoS via serialized intent
  • FVP-03-008: Keychain access level leaks WG private key to iCloud
  • VP-03-010: VPN leak via captive portal detection
  • FVP-03-011: Lack of local TCP server access controls
  • FVP-03-012: Rogue extension can disable VPN using mozillavpnnp (High)

The rogue extension issue was rated as high severity. Each risk was subsequently addressed by Mozilla.

Mozilla presented the results of the security audit as part of their commitment to transparency and to maintain the trust and security of their users. Conducting a third party security audit is a best practice for a VPN provider that helps assure that the VPN is trustworthy and reliable.

Read Mozilla’s announcement:
Mozilla VPN Security Audit 2023

Featured Image by Shutterstock/Meilun

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Link Building Outreach for Noobs

Published

on

Link Building Outreach for Noobs

Link outreach is the process of contacting other websites to ask for a backlink to your website.

For example, here’s an outreach email we sent as part of a broken link building campaign:

In this guide, you’ll learn how to get started with link outreach and how to get better results. 

How to do link outreach

Link outreach is a four-step process:

1. Find prospects

No matter how amazing your email is, you won’t get responses if it’s not relevant to the person you’re contacting. This makes finding the right person to contact equally as important as crafting a great email.

Who to reach out to depends on your link building strategy. Here’s a table summarizing who you should find for the following link building tactics:

As a quick example, here’s how you would find sites likely to accept your guest posts:

  1. Go to Content Explorer
  2. Enter a related topic and change the dropdown to “In title”
  3. Filter for English results
  4. Filter for results with 500+ words
  5. Go to the “Websites” tab
Finding guest blogging opportunities via Content ExplorerFinding guest blogging opportunities via Content Explorer

This shows you the websites getting the most search traffic to content about your target topic.

From here, you’d want to look at the Authors column to prioritize sites with multiple authors, as this suggests that they may accept guest posts.

The Authors column indicate how many authors have written for the siteThe Authors column indicate how many authors have written for the site

If you want to learn how to find prospects for different link building tactics, I recommend reading the resource below.

2. Find their contact details

Once you’ve curated a list of people to reach out to, you’ll need to find their contact information.

Typically, this is their email address. The easiest way to find this is to use an email lookup tool like Hunter.io. All you need to do is enter the first name, last name, and domain of your target prospect. Hunter will find their email for you:

Finding Tim's email with Hunter.ioFinding Tim's email with Hunter.io

To prevent tearing your hair from searching for hundreds of emails one-by-one, most email lookup tools allow you to upload a CSV list of names and domains. Hunter also has a Google Sheets add-on to make this even easier.

Using the Hunter for Sheets add-on to find emails in bulk directly in Google SheetsUsing the Hunter for Sheets add-on to find emails in bulk directly in Google Sheets

3. Send a personalized pitch

Knowing who to reach out to is half the battle won. The next ‘battle’ to win is actually getting the person to care.

Think about it. For someone to link to you, the following things need to happen:

  • They must read your email
  • They must be convinced to check out your content
  • They must open the target page and complete all administrative tasks (log in to their CMS, find the link, etc.)
  • They must link to you or swap out links

That’s a lot of steps. Most people don’t care enough to do this. That’s why there’s more to link outreach than just writing the perfect email (I’ll cover this in the next section).

For now, let’s look at how to craft an amazing email. To do that, you need to answer three questions:

  1. Why should they open your email? — The subject line needs to capture attention in a busy inbox.
  2. Why should they read your email? — The body needs to be short and hook the reader in.
  3. Why should they link to you? — Your pitch needs to be compelling: What’s in it for them and why is your content link-worthy?

For example, here’s how we wrote our outreach email based on the three questions:

An analysis of our outreach email based on three questionsAn analysis of our outreach email based on three questions

Here’s another outreach email we wrote, this time for a campaign building links to our content marketing statistics post:

An analysis of our outreach email based on three questionsAn analysis of our outreach email based on three questions

4. Follow up, once

People are busy and their inboxes are crowded. They might have missed your email or read it and forgot.

Solve this by sending a short polite follow-up.

Example follow-up emailExample follow-up email

One is good enough. There’s no need to spam the other person with countless follow-up emails hoping for a different outcome. If they’re not interested, they’re not interested.

Link outreach tips

In theory, link outreach is simply finding the right person and asking them for a link. But there is more to it than that. I’ll explore some additional tips to help improve your outreach.

Don’t over-personalize

Some SEOs swear by the sniper approach to link outreach. That is: Each email is 100% customized to the person you are targeting.

But our experience taught us that over-personalization isn’t better. We ran link-building campaigns that sent hyper-personalized emails and got no results.

It makes logical sense: Most people just don’t do favors for strangers. I’m not saying it doesn’t happen—it does—but rarely will your amazing, hyper-personalized pitch change someone’s mind.

So, don’t spend all your time tweaking your email just to eke out minute gains.

Avoid common templates

My first reaction seeing this email is to delete it:

A bad outreach emailA bad outreach email

Why? Because it’s a template I’ve seen many times in my inbox. And so have many others.

Another reason: Not only did he reference a post I wrote six years ago, it was a guest post, i.e., I do not have control over the site. This shows why finding the right prospects is important. He even got my name wrong.

Templates do work, but bad ones don’t. You can’t expect to copy-paste one from a blog post and hope to achieve success.

A better approach is to use the scoped shotgun approach: use a template but with dynamic variables.

Email outreach template with dynamic variablesEmail outreach template with dynamic variables

You can do this with tools like Pitchbox and Buzzstream.

This can help achieve a decent level of personalization so your email isn’t spammy. But it doesn’t spend all your time writing customized emails for every prospect.

Send lots of emails

When we polled 800+ people on X and LinkedIn about their link outreach results, the average conversion rate was only 1-5%.

Link outreach conversion rates in 2023Link outreach conversion rates in 2023

This is why you need to send more emails. If you run the numbers, it just makes sense:

  • 100 outreach emails with a 1% success rate = 1 link
  • 1,000 outreach emails with a 1% success rate = 10 links

I’m not saying to spam everyone. But if you want more high-quality links, you need to reach out to more high-quality prospects.

Build a brand

A few years ago, we published a link building case study:

  • 515 outreach emails
  • 17.55% reply rate
  • 5.75% conversion rate

Pretty good results! Except the top comments were about how we only succeeded because of our brand:

Comments on our YouTube video saying we succeeded because of our brandComments on our YouTube video saying we succeeded because of our brand

It’s true; we acknowledge it. But I think the takeaway here isn’t that we should repeat the experiment with an unknown website. The takeaway is that more SEOs should be focused on building a brand.

We’re all humans—we rely on heuristics to make judgments. In this case, it’s branding. If your brand is recognizable, it solves the “stranger” problem—people know you, like you, and are more likely to link.

The question then: How do you build a brand?

I’d like to quote our Chief Marketing Officer Tim Soulo here:

What is a strong brand if not a consistent output of high-quality work that people enjoy? Ahrefs’ content team has been publishing top-notch content for quite a few years on our blog and YouTube channel. Slowly but surely, we were able to reach tens of millions of people and instill the idea that “Ahrefs’ content = quality content”—which now clearly works to our advantage.

Tim SouloTim Soulo

Ahrefs was once unknown, too. So, don’t be disheartened if no one is willing to link to you today. Rome wasn’t built in a day.

Trust the process and create incredible content. Show it to people. You’ll build your brand and reputation that way.

Build relationships with people in your industry

Outreach starts before you even ask for a link.

Think about it: People don’t do favors for strangers but they will for friends. If you want to build and maintain relationships in the industry, way before you start any link outreach campaigns.

Don’t just rely on emails either. Direct messages (DMs) on LinkedIn and X, phone calls—they all work. For example, Patrick Stox, our Product Advisor, used to have a list of contacts he regularly reached out to. He’d hop on calls and even send fruit baskets.

Create systems and automations

In its most fundamental form, link outreach is really about finding more people and sending more emails.

Doing this well is all about building systems and automations.

We have a few videos on how to build a team and a link-building system, so I recommend that you check them out.

Final thoughts

Good link outreach is indistinguishable from good business development.

In business development, your chances of success will increase if you:

  • Pitch the right partners
  • Have a strong brand
  • Have prior relationships with them
  • Pitch the right collaboration ideas

The same goes for link outreach. Follow the principles above and you will see more success for your link outreach campaigns.

Any questions or comments? Let me know on Twitter X.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Research Shows Tree Of Thought Prompting Better Than Chain Of Thought

Published

on

By

Research Shows Tree Of Thought Prompting Better Than Chain Of Thought

Researchers discovered a way to defeat the safety guardrails in GPT4 and GPT4-Turbo, unlocking the ability to generate harmful and toxic content, essentially beating a large language model with another large language model.

The researchers discovered that the use of tree-of-thought (ToT)reasoning to repeat and refine a line of attack was useful for jailbreaking another large language model.

What they found is that the ToT approach was successful against GPT4, GPT4-Turbo, and PaLM-2, using a remarkably low number of queries to obtain a jailbreak, on average less than thirty queries.

Tree Of Thoughts Reasoning

A Google research paper from around May 2022 discovered Chain of Thought Prompting.

Chain of Thought (CoT) is a prompting strategy used on a generative AI to make it follow a sequence of steps in order to solve a problem and complete a task. The CoT method is often accompanied with examples to show the LLM how the steps work in a reasoning task.

So, rather than just ask a generative AI like Midjourney or ChatGPT to do a task, the chain of thought method instructs the AI how to follow a path of reasoning that’s composed of a series of steps.

Tree of Thoughts (ToT) reasoning, sometimes referred to as Tree of Thought (singular) is essentially a variation and improvement of CoT, but they’re two different things.

Tree of Thoughts reasoning is similar to CoT. The difference is that rather than training a generative AI to follow a single path of reasoning, ToT is built on a process that allows for multiple paths so that the AI can stop and self-assess then come up with alternate steps.

Tree of Thoughts reasoning was developed in May 2023 in a research paper titled Tree of Thoughts: Deliberate Problem Solving with Large Language Models (PDF)

The research paper describes Tree of Thought:

“…we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.

ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices.

Our experiments show that ToT significantly enhances language models’ problem-solving abilities…”

Tree Of Attacks With Pruning (TAP)

This new method of jailbreaking large language models is called Tree of Attacks with Pruning, TAP. TAP uses two LLMs, one for attacking and the other for evaluating.

TAP is able to outperform other jailbreaking methods by significant margins, only requiring black-box access to the LLM.

A black box, in computing, is where one can see what goes into an algorithm and what comes out. But what happens in the middle is unknown, thus it’s said to be in a black box.

Tree of thoughts (TAP) reasoning is used against a targeted LLM like GPT-4 to repetitively try different prompting, assess the results, then if necessary change course if that attempt is not promising.

This is called a process of iteration and pruning. Each prompting attempt is analyzed for the probability of success. If the path of attack is judged to be a dead end, the LLM will “prune” that path of attack and begin another and better series of prompting attacks.

This is why it’s called a “tree” in that rather than using a linear process of reasoning which is the hallmark of chain of thought (CoT) prompting, tree of thought prompting is non-linear because the reasoning process branches off to other areas of reasoning, much like a human might do.

The attacker issues a series of prompts, the evaluator evaluates the responses to those prompts and then makes a decision as to what the next path of attack will be by making a call as to whether the current path of attack is irrelevant or not, plus it also evaluates the results to determine the likely success of prompts that have not yet been tried.

What’s remarkable about this approach is that this process reduces the number of prompts needed to jailbreak GPT-4. Additionally, a greater number of jailbreaking prompts are discovered with TAP than with any other jailbreaking method.

The researchers observe:

“In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM.

TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thoughts reasoning until one of the generated prompts jailbreaks the target.

Crucially, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks.

Using tree-of-thought reasoning allows TAP to navigate a large search space of prompts and pruning reduces the total number of queries sent to the target.

In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks.”

Tree Of Thought (ToT) Outperforms Chain Of Thought (CoT) Reasoning

Another interesting conclusion reached in the research paper is that, for this particular task, ToT reasoning outperforms CoT reasoning, even when adding pruning to the CoT method, where off topic prompting is pruned and discarded.

ToT Underperforms With GPT 3.5 Turbo

The researchers discovered that ChatGPT 3.5 Turbo didn’t perform well with CoT, revealing the limitations of GPT 3.5 Turbo. Actually, GPT 3.5 performed exceedingly poorly, dropping from 84% success rate to only a 4.2% success rate.

This is their observation about why GPT 3.5 underperforms:

“We observe that the choice of the evaluator can affect the performance of TAP: changing the attacker from GPT4 to GPT3.5-Turbo reduces the success rate from 84% to 4.2%.

The reason for the reduction in success rate is that GPT3.5-Turbo incorrectly determines that the target model is jailbroken (for the provided goal) and, hence, preemptively stops the method.

As a consequence, the variant sends significantly fewer queries than the original method…”

What This Mean For You

While it’s amusing that the researchers use the ToT method to beat an LLM with another LLM, it also highlights the usefulness of ToT for generating surprising new directions in prompting in order to achieve higher levels of output.

  • TL/DR Takeaways:
  • Tree of Thought prompting outperformed Chain of Thought methods
  • GPT 3.5 worked significantly poorly in comparison to GPT 4 in ToT
  • Pruning is a useful part of a prompting strategy
  • Research showed that ToT is superior to CoT in an intensive reasoning task like jailbreaking an LLM

Read the original research paper:

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically (PDF)

Featured Image by Shutterstock/THE.STUDIO

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending