Connect with us

SEO

Content Optimization: The Complete Guide

Published

on

Content Optimization: The Complete Guide

Content optimization helps you get more out of your content efforts, but you don’t necessarily need to make the same optimizations as everyone else. It depends on your goal. 

In this guide, you’ll learn how to optimize content for SEO, conversions, and social shares.

But first, let’s make sure we’re on the same page…

What is content optimization?

Content optimization is the process of improving content to ensure it stands the best possible chance of meeting its desired goal. That may be ranking on the first page of Google, increasing social shares, or attracting your best customers.

Why is content optimization important?

Content optimization dramatically improves your content’s performance and helps you meet your marketing goals.

Without it, you miss out on visibility, rankings, traffic, leads, and sales.

The challenge is that the optimization techniques that move the needle forward aren’t always immediately apparent.

For instance, optimizing content for SEO vs. conversions requires two very different approaches. The former involves keyword research, while the latter involves copywriting and a product-led approach.

How to optimize content for SEO

Before you think about attracting email subscribers or leads for your business, you need to start from the top. So let’s look at the different ways you can optimize your content for SEO and get traffic to your site consistently.

1. Make sure you’re targeting a keyword with traffic potential

Optimizing for a keyword that nobody searches for is pointless. Even if you rank #1, you won’t get any traffic.

To identify keywords with high traffic potential, here’s what you should do:

  1. Go to Ahrefs’ Keywords Explorer
  2. Enter one or multiple broad keywords related to your topic
  3. Hit Search

For example, when you enter “content marketing” and check the Matching terms report, you’ll get around 26,000 keyword ideas with search volumes, Keyword Difficulty (KD), and other valuable data:

Matching terms report results

To ensure that you’re finding keywords with the potential to attract traffic from organic search, add a minimum Traffic Potential filter. This metric shows the estimated monthly organic traffic to the current top-ranking page, so it’s a reasonable estimate of how much traffic you can get by ranking in pole position. 

Matching terms report results

If your website is new and has low authority, it also pays to filter for low-KD keywords to unearth less competitive topics.

Matching terms report results

Recommended Reading: How to Do Keyword Research for SEO

2. Make sure it aligns with search intent

Unless your content aligns with what searchers are looking for, you’re dead in the water before you start. That’s because Google prioritizes search intent. If your content fails to answer searchers’ questions, this is a signal that your content is a poor match for the query and doesn’t deserve to rank.

The easiest way to understand search intent is to use the current top-ranking results as a proxy. Specifically, you can analyze them for the three Cs of search intent:

  1. Content type – The type of content on the SERPs (e.g., blog post, product page, landing page, category page). If the top 10 positions for your keyword show blog posts, stick to blog posts. Don’t try to shoehorn your product page into the SERPs; it won’t work!
  2. Content format – The content format in the search results (e.g., how-to, step-by-step guide, listicle, review). The top-competing posts will indicate what the searcher predominantly wants to know. If the first page of Google shows listicles, go with a listicle. If it shows guides, go with a guide. You get the idea.
  3. Content angle – The unique selling point of the competing content on the SERPs (e.g., discounts, inexpensive strategies, free shipping). While it’s crucial to stand out from the competition, you should still consider the similarities between top-ranking posts.

For example, if we look at the search results for the keyword “seo tips,” we see that the content type is blog post, the content format is listicle, and the dominant content angle is traffic boosting:

Google SERP for "seo tips"

If you want to stand the best chance of ranking for this query, you should follow suit.

This is what we did with our list of SEO tips.

Recommended Reading: Searcher Intent: The Overlooked ‘Ranking Factor’ You Should Be Optimizing For

3. Make sure it covers everything searchers want to know

Does your post stack up against the competition?

Conduct a content gap analysis to see how you fare. The idea here is to identify potentially missing subtopics that searchers want to know and brainstorm how you can do better.

You can do this quickly by examining the three top-ranking posts most similar to yours (i.e., you may want to ignore that random landing page at the second spot if you’re writing a how-to guide):

  1. Paste the URL of your page into Ahrefs’ Site Explorer
  2. Go to the Content Gap tool
  3. Enter the URLs of the top three posts for your keyword
  4. Click Show keywords

For example, if we plug in our guide to guest blogging and a few similar top-ranking pages, one subtopic jumps out right away: a definition.

Content Gap report results

We don’t rank for this because we do not have a definition on our page, so it’d probably be best if we added one.

4. Make sure it’s easy and enticing to read

Most people don’t read webpages from beginning to end. Instead, they scan the main points and pick out phrases that jump at them.

Here are four practical ways to make your content more enticing and easier to skim:

  • Eliminate fluff – Clichés, low-impact adverbs, and hard-to-read sentences repel users. Before publishing your post, use the Hemingway Editor, Grammarly, or ProWritingAid to catch these errors.
  • Increase visual comprehension – Long walls of text overwhelm readers. Use short paragraphs and bullet points (like what we’re doing here), have bold key takeaways, and include relevant images to make your post more reader-friendly.
  • Add a table of contents (ToC) in long posts – The ToC offers easy navigation and tells readers the list of topics covered.
  • Prioritize important information – A well-optimized post makes valuable information accessible. Don’t make readers dig through them! Put your best ideas at the top. Leave the nice-to-know information at the bottom.

You’ll notice that we’re doing many of these things in this post. For example, if you’re reading this on desktop, there should be a floating ToC on the left:

Excerpt of Ahrefs' blog article and ToC on the left

Recommended reading: SEO Copywriting: 12 Easy Tips for Better Content and Higher Rankings

5. Make sure it has a compelling title tag and description

The title tag and meta description is the first thing searchers see on the SERPs.

Ideally, they should describe what your content is about at a glance. It’s a bonus if they set your post apart from competing posts. (This goes back to our point on content angle earlier!)

Here are a few tips to keep in mind when writing them:

  • Match search intent – It should be clear that your page matches what the searcher is looking for from your title tag and meta description alone.
  • Keep them short and sweet – Google truncates title tags and meta descriptions after a certain length. This is usually around 70 characters for title tags and 120 characters for meta descriptions, although it varies.
  • Include your keyword – This helps searchers see at a glance that your page is a relevant match to their search.
  • Highlight specificity – Specific data points increase credibility and respect. Compare “How to Attract Customers in a Month” with “How to Attract 2,738 Customers in a Month on a Shoestring Budget.” Which compels you to click more? 

6. Make sure it has enough backlinks

Backlinks help you get into Google’s good books, as they’re one of the top three ranking factors.

Our search traffic study discovered that the more backlinks a page has, the more organic traffic it attracts. The graph below shows the trend between monthly organic search traffic and the number of backlinks from external websites (referring domains).

Line graph showing the more referring domains, the higher the organic search traffic

So if you want to rank high on the SERPs, you’ll need to build links from authoritative and relevant websites.

Here’s an easy way to find the number of websites that link to your page:

  1. Go to Site Explorer
  2. Enter your page URL
  3. Hit Search

You’ll see the number of referring domains on the Overview report.

Site Explorer overview of Ahrefs' guide to guest blogging

You can then plug your keyword into Keywords Explorer and check the KD score to see if you have anywhere near the estimated number of referring domains needed to rank in the top 10:

Keywords Explorer overview for the term "guest blogging"

If this number is way higher than the number of referring domains to your page, that may be what’s holding you back.

How to optimize content for conversions

SEO may bring you lots of targeted traffic, but that traffic will be useless if your content doesn’t convert. And the trick to doing this is to pair product-focused content with snazzy copywriting skills that pack a punch. So let’s go through how to do that.

1. Make sure it targets a keyword with business potential

The true mark of content marketing success is not ranking on the first page on Google. It’s ranking and attracting a steady stream of leads and sales.

Unfortunately, many businesses create mountains of blog posts without considering the business potential of their keyword or topic.

You’ve probably seen these posts lurking around. They’re often about topics that have nothing to do with the business’s product, and they always end with a pushy call to action (CTA) that serves zero value to the reader.

That’s why it’s crucial to target keywords or topics that align with your product.

Here at Ahrefs, we always consider a keyword’s “business potential” score. The higher it is, the better the opportunity to position our product as an irreplaceable solution to the reader’s problem.

Here’s the scale we use:

Business potential: Table with scores 3 to 0. And explanation of criteria to meet each score.

This brings us to the million-dollar question: How do you position your product as the best solution so that readers will choose you over your competitors?

2. Make sure it shows your product in action

As marketers, our job is only half done if we target keywords with business potential but fail to educate prospects on how our product works. After all, that’s the whole point of choosing topics with high business potential.

But you shouldn’t just tell readers how your product works—you need to show them.

That’s what we’re doing in this post. Notice how we demonstrate how our SEO toolset helps you optimize your content? You’ll probably hit the “X” button if we make a blatant statement like “Ahrefs optimizes your content with a few clicks” without backing it up with proof.

3. Make sure it includes a persuasive call to action

It’s a pity to leave readers hanging after they read a post, especially when it drives massive value.

Include an irresistible CTA to encourage readers to take action toward solving their problems—whether it’s subscribing to an email list, booking a free consultation call, or even something as basic as leaving a question in the comment box. 

What makes a CTA powerful? We boil it down to:

  • Emotion – Conversion-driven CTAs speak to the prospect’s pain or goals and immediately trigger action. Your CTA should make them go, “This company gets me.”
  • Credibility – With trust comes sales. Appeal to skeptical buyers with social proof like specific data, testimonials, and expert endorsements. 
  • Timing – Effective CTAs align where the prospect is in the buyer’s journey. Don’t be afraid to pepper them throughout the post.

Here’s a powerful CTA from Cognitive FX that ticks all the boxes:

CTA about symptoms after a traumatic brain injury

Note how the treatment center adds an empathetic touch to a post about concussion memory loss in its CTA. It also leverages its impressive results (“on average, our patients improve by 75%”) to instill confidence.

Furthermore, look at the strategic placement of the CTAs.

Cognitive FX places them after setting the stage for the patients’ recovery journey, which strikes an emotional chord with readers.

Recommended Reading: RADically Rethink Your CTAs

How to optimize content for social shares

The more people share your post, the more eyeballs it gets. So, in this last section, let’s look at how you can increase exposure on social networks.

1. Make sure it includes expert quotes

Unique quotes from subject matter experts boost distribution.

When you feature a source in your post, odds are they will want to share the post when it gets published. Plus, not only do you bake organic distribution directly into your content, but you also back up your claims without conducting additional research. 

When Fio Dossetto, creator of ContentFolks, was writing a guide on content marketing for Ahrefs, she approached 14 marketing leaders for their insights. Many of these leaders shared the post with their followers after it went live.

Here’s Louis Grenier, founder of Everyone Hates Marketers, sharing it on Twitter:

Fio’s post generated 127 tweets and 161 backlinks as of today.

TIP

Even if the experts you feature have a smaller following, you can still replicate this technique. Databox, a business intelligence platform, tags them on LinkedIn, further amplifying the reach of its posts.

To make this work, you must first identify the right experts.

Even though platforms like HARO connect you with sources, watch out. Most of them are affiliate marketers with irrelevant expertise looking for backlinks.

A better approach is to look for subject matter experts using Ahrefs:

  1. Go to Ahrefs’ Content Explorer
  2. Enter your article’s topic
  3. Click the Authors tab

Identify experts who have written extensively about the topic and have a lot of followers. For example, we may reach out to Jonas Sickler and Caroline Forsey for a quick quote if we’re writing a piece on content marketing examples.

List of authors with corresponding data on Followers and Total Pages

2. Make sure your social share buttons are visible at all times

Given that most readers won’t make it to the end of the post, it’s not an exaggeration to say that they will most likely ignore the social share buttons at the bottom.

This is where sticky share buttons from tools like AddThis and Sumo come in handy.

Since these anchored buttons stay on the screen while the reader scrolls, they’re more likely to notice, click, and share the post.

That’s what we do here at Ahrefs:

Excerpt of Ahrefs' blog article; notably, "social" buttons on the right

Readers can easily share our posts with a click instead of scrolling down to locate the elusive button—or worse, copying and pasting the URL on their social media channels.

Final thoughts

Optimizing content is like using the rocket start technique on Mario Kart. It helps you power up, gain a running start, and get the most out of your content efforts.

Try the tips above to rank higher on the SERPs, increase social shares, and attract your best customers.

Got questions? Ping me on Twitter.




Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Optimize Your SEO Strategy For Maximum ROI With These 5 Tips

Published

on

Optimize Your SEO Strategy For Maximum ROI With These 5 Tips

Wondering what improvements can you make to boost organic search results and increase ROI?

If you want to be successful in SEO, even after large Google algorithm updates, be sure to:

  1. Keep the SEO fundamentals at the forefront of your strategy.
  2. Prioritize your SEO efforts for the most rewarding outcomes.
  3. Focus on uncovering and prioritizing commercial opportunities if you’re in ecommerce.
  4. Dive into seasonal trends and how to plan for them.
  5. Get tip 5 and all of the step-by-step how-tos by joining our upcoming webinar.

We’ll share five actionable ways you can discover the most impactful opportunities for your business and achieve maximum ROI.

You’ll learn how to:

  • Identify seasonal trends and plan for them.
  • Report on and optimize your online share of voice.
  • Maximize SERP feature opportunities, most notably Popular Products.

Join Jon Earnshaw, Chief Product Evangelist and Co-Founder of Pi Datametrics, and Sophie Moule, Head of Product and Marketing at Pi Datametrics, as they walk you through ways to drastically improve the ROI of your SEO strategy.

In this live session, we’ll uncover innovative ways you can step up your search strategy and outperform your competitors.

Ready to start maximizing your results and growing your business?

Sign up now and get the actionable insights you need for SEO success.

Can’t attend the live webinar? We’ve got you covered. Register anyway and you’ll get access to a recording, after the event.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

TikTok’s US Future Uncertain: CEO Faces Congress

Published

on

TikTok's US Future Uncertain: CEO Faces Congress

During a five-hour congressional hearing, TikTok CEO Shou Zi Chew faced intense scrutiny from U.S. lawmakers about the social media platform’s connections to its Chinese parent company, ByteDance.

Legislators from both sides demanded clear answers on whether TikTok spies on Americans for China.

The U.S. government has been pushing for the divestiture of TikTok and has even threatened to ban the app in the United States.

Chew found himself in a difficult position, attempting to portray TikTok as an independent company not influenced by China.

However, lawmakers remained skeptical, citing China’s opposition to the sale of TikTok as evidence of the country’s influence over the company.

The hearing was marked by a rare display of bipartisan unity, with the tone harsher than in previous congressional hearings featuring American social media executives.

The Future of TikTok In The US

With the U.S. and China at odds over TikTok’s sale, the app faces two possible outcomes in the United States.

Either TikTok gets banned, or it revisits negotiations for a technical fix to data security concerns.

Lindsay Gorman, head of technology and geopolitics at the German Marshall Fund, said, “The future of TikTok in the U.S. is definitely dimmer and more uncertain today than it was yesterday.”

TikTok has proposed measures to protect U.S. user data, but no security agreement has been reached.

Addressing Concerns About Societal Impact

Lawmakers at the hearing raised concerns about TikTok’s impact on young Americans, accusing the platform of invading privacy and harming mental health.

According to the Pew Research Center, the app is used by 67% of U.S. teenagers.

Critics argue that the app is too addictive and its algorithm can expose teens to dangerous or lethal situations.

Chew pointed to new screen time limits and content guidelines to address these concerns, but lawmakers remained unconvinced.

In Summary

The House Energy and Commerce Committee’s hearing on TikTok addressed concerns common to all social media platforms, like spreading harmful content and collecting massive user data.

Most committee members were critical of TikTok, but many avoided the typical grandstanding seen in high-profile hearings.

The hearing aimed to make a case for regulating social media and protecting children rather than focusing on the national security threat posed by the app’s connection to China.

If anything emerges from this hearing, it could be related to those regulations.

The hearing also allowed Congress to convince Americans that TikTok is a national security threat that warrants a ban.

This concern arises from the potential for the Chinese government to access the data of TikTok’s 150 million U.S. users or manipulate its recommendation algorithms to spread propaganda or disinformation.

However, limited public evidence supports these claims, making banning the app seem extreme and potentially unnecessary.

As events progress, staying informed is crucial as the outcome could impact the digital marketing landscape.


Featured Image: Rokas Tenys/Shutterstock

Full replay of congressional hearing available on YouTube.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Everything You Need To Know

Published

on

Everything You Need To Know

Google has just released Bard, its answer to ChatGPT, and users are getting to know it to see how it compares to OpenAI’s artificial intelligence-powered chatbot.

The name ‘Bard’ is purely marketing-driven, as there are no algorithms named Bard, but we do know that the chatbot is powered by LaMDA.

Here is everything we know about Bard so far and some interesting research that may offer an idea of the kind of algorithms that may power Bard.

What Is Google Bard?

Bard is an experimental Google chatbot that is powered by the LaMDA large language model.

It’s a generative AI that accepts prompts and performs text-based tasks like providing answers and summaries and creating various forms of content.

Bard also assists in exploring topics by summarizing information found on the internet and providing links for exploring websites with more information.

Why Did Google Release Bard?

Google released Bard after the wildly successful launch of OpenAI’s ChatGPT, which created the perception that Google was falling behind technologically.

ChatGPT was perceived as a revolutionary technology with the potential to disrupt the search industry and shift the balance of power away from Google search and the lucrative search advertising business.

On December 21, 2022, three weeks after the launch of ChatGPT, the New York Times reported that Google had declared a “code red” to quickly define its response to the threat posed to its business model.

Forty-seven days after the code red strategy adjustment, Google announced the launch of Bard on February 6, 2023.

What Was The Issue With Google Bard?

The announcement of Bard was a stunning failure because the demo that was meant to showcase Google’s chatbot AI contained a factual error.

The inaccuracy of Google’s AI turned what was meant to be a triumphant return to form into a humbling pie in the face.

Google’s shares subsequently lost a hundred billion dollars in market value in a single day, reflecting a loss of confidence in Google’s ability to navigate the looming era of AI.

How Does Google Bard Work?

Bard is powered by a “lightweight” version of LaMDA.

LaMDA is a large language model that is trained on datasets consisting of public dialogue and web data.

There are two important factors related to the training described in the associated research paper, which you can download as a PDF here: LaMDA: Language Models for Dialog Applications (read the abstract here).

  • A. Safety: The model achieves a level of safety by tuning it with data that was annotated by crowd workers.
  • B. Groundedness: LaMDA grounds itself factually with external knowledge sources (through information retrieval, which is search).

The LaMDA research paper states:

“…factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator.

We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible.”

Google used three metrics to evaluate the LaMDA outputs:

  1. Sensibleness: A measurement of whether an answer makes sense or not.
  2. Specificity: Measures if the answer is the opposite of generic/vague or contextually specific.
  3. Interestingness: This metric measures if LaMDA’s answers are insightful or inspire curiosity.

All three metrics were judged by crowdsourced raters, and that data was fed back into the machine to keep improving it.

The LaMDA research paper concludes by stating that crowdsourced reviews and the system’s ability to fact-check with a search engine were useful techniques.

Google’s researchers wrote:

“We find that crowd-annotated data is an effective tool for driving significant additional gains.

We also find that calling external APIs (such as an information retrieval system) offers a path towards significantly improving groundedness, which we define as the extent to which a generated response contains claims that can be referenced and checked against a known source.”

How Is Google Planning To Use Bard In Search?

The future of Bard is currently envisioned as a feature in search.

Google’s announcement in February was insufficiently specific on how Bard would be implemented.

The key details were buried in a single paragraph close to the end of the blog announcement of Bard, where it was described as an AI feature in search.

That lack of clarity fueled the perception that Bard would be integrated into search, which was never the case.

Google’s February 2023 announcement of Bard states that Google will at some point integrate AI features into search:

“Soon, you’ll see AI-powered features in Search that distill complex information and multiple perspectives into easy-to-digest formats, so you can quickly understand the big picture and learn more from the web: whether that’s seeking out additional perspectives, like blogs from people who play both piano and guitar, or going deeper on a related topic, like steps to get started as a beginner.

These new AI features will begin rolling out on Google Search soon.”

It’s clear that Bard is not search. Rather, it is intended to be a feature in search and not a replacement for search.

What Is A Search Feature?

A feature is something like Google’s Knowledge Panel, which provides knowledge information about notable people, places, and things.

Google’s “How Search Works” webpage about features explains:

“Google’s search features ensure that you get the right information at the right time in the format that’s most useful to your query.

Sometimes it’s a webpage, and sometimes it’s real-world information like a map or inventory at a local store.”

In an internal meeting at Google (reported by CNBC), employees questioned the use of Bard in search.

One employee pointed out that large language models like ChatGPT and Bard are not fact-based sources of information.

The Google employee asked:

“Why do we think the big first application should be search, which at its heart is about finding true information?”

Jack Krawczyk, the product lead for Google Bard, answered:

“I just want to be very clear: Bard is not search.”

At the same internal event, Google’s Vice President of Engineering for Search, Elizabeth Reid, reiterated that Bard is not search.

She said:

“Bard is really separate from search…”

What we can confidently conclude is that Bard is not a new iteration of Google search. It is a feature.

Bard Is An Interactive Method For Exploring Topics

Google’s announcement of Bard was fairly explicit that Bard is not search. This means that, while search surfaces links to answers, Bard helps users investigate knowledge.

The announcement explains:

“When people think of Google, they often think of turning to us for quick factual answers, like ‘how many keys does a piano have?’

But increasingly, people are turning to Google for deeper insights and understanding – like, ‘is the piano or guitar easier to learn, and how much practice does each need?’

Learning about a topic like this can take a lot of effort to figure out what you really need to know, and people often want to explore a diverse range of opinions or perspectives.”

It may be helpful to think of Bard as an interactive method for accessing knowledge about topics.

Bard Samples Web Information

The problem with large language models is that they mimic answers, which can lead to factual errors.

The researchers who created LaMDA state that approaches like increasing the size of the model can help it gain more factual information.

But they noted that this approach fails in areas where facts are constantly changing during the course of time, which researchers refer to as the “temporal generalization problem.”

Freshness in the sense of timely information cannot be trained with a static language model.

The solution that LaMDA pursued was to query information retrieval systems. An information retrieval system is a search engine, so LaMDA checks search results.

This feature from LaMDA appears to be a feature of Bard.

The Google Bard announcement explains:

“Bard seeks to combine the breadth of the world’s knowledge with the power, intelligence, and creativity of our large language models.

It draws on information from the web to provide fresh, high-quality responses.”

Screenshot of a Google Bard Chat, March 2023

LaMDA and (possibly by extension) Bard achieve this with what is called the toolset (TS).

The toolset is explained in the LaMDA researcher paper:

“We create a toolset (TS) that includes an information retrieval system, a calculator, and a translator.

TS takes a single string as input and outputs a list of one or more strings. Each tool in TS expects a string and returns a list of strings.

For example, the calculator takes “135+7721”, and outputs a list containing [“7856”]. Similarly, the translator can take “hello in French” and output [‘Bonjour’].

Finally, the information retrieval system can take ‘How old is Rafael Nadal?’, and output [‘Rafael Nadal / Age / 35’].

The information retrieval system is also capable of returning snippets of content from the open web, with their corresponding URLs.

The TS tries an input string on all of its tools, and produces a final output list of strings by concatenating the output lists from every tool in the following order: calculator, translator, and information retrieval system.

A tool will return an empty list of results if it can’t parse the input (e.g., the calculator cannot parse ‘How old is Rafael Nadal?’), and therefore does not contribute to the final output list.”

Here’s a Bard response with a snippet from the open web:

Google Bard: Everything You Need To KnowScreenshot of a Google Bard Chat, March 2023

Conversational Question-Answering Systems

There are no research papers that mention the name “Bard.”

However, there is quite a bit of recent research related to AI, including by scientists associated with LaMDA, that may have an impact on Bard.

The following doesn’t claim that Google is using these algorithms. We can’t say for certain that any of these technologies are used in Bard.

The value in knowing about these research papers is in knowing what is possible.

The following are algorithms relevant to AI-based question-answering systems.

One of the authors of LaMDA worked on a project that’s about creating training data for a conversational information retrieval system.

You can download the 2022 research paper as a PDF here: Dialog Inpainting: Turning Documents into Dialogs (and read the abstract here).

The problem with training a system like Bard is that question-and-answer datasets (like datasets comprised of questions and answers found on Reddit) are limited to how people on Reddit behave.

It doesn’t encompass how people outside of that environment behave and the kinds of questions they would ask, and what the correct answers to those questions would be.

The researchers explored creating a system read webpages, then used a “dialog inpainter” to predict what questions would be answered by any given passage within what the machine was reading.

A passage in a trustworthy Wikipedia webpage that says, “The sky is blue,” could be turned into the question, “What color is the sky?”

The researchers created their own dataset of questions and answers using Wikipedia and other webpages. They called the datasets WikiDialog and WebDialog.

  • WikiDialog is a set of questions and answers derived from Wikipedia data.
  • WebDialog is a dataset derived from webpage dialog on the internet.

These new datasets are 1,000 times larger than existing datasets. The importance of that is it gives conversational language models an opportunity to learn more.

The researchers reported that this new dataset helped to improve conversational question-answering systems by over 40%.

The research paper describes the success of this approach:

“Importantly, we find that our inpainted datasets are powerful sources of training data for ConvQA systems…

When used to pre-train standard retriever and reranker architectures, they advance state-of-the-art across three different ConvQA retrieval benchmarks (QRECC, OR-QUAC, TREC-CAST), delivering up to 40% relative gains on standard evaluation metrics…

Remarkably, we find that just pre-training on WikiDialog enables strong zero-shot retrieval performance—up to 95% of a finetuned retriever’s performance—without using any in-domain ConvQA data. “

Is it possible that Google Bard was trained using the WikiDialog and WebDialog datasets?

It’s difficult to imagine a scenario where Google would pass on training a conversational AI on a dataset that is over 1,000 times larger.

But we don’t know for certain because Google doesn’t often comment on its underlying technologies in detail, except on rare occasions like for Bard or LaMDA.

Large Language Models That Link To Sources

Google recently published an interesting research paper about a way to make large language models cite the sources for their information. The initial version of the paper was published in December 2022, and the second version was updated in February 2023.

This technology is referred to as experimental as of December 2022.

You can download the PDF of the paper here: Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models (read the Google abstract here).

The research paper states the intent of the technology:

“Large language models (LLMs) have shown impressive results while requiring little or no direct supervision.

Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios.

We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting.

We formulate and study Attributed QA as a key first step in the development of attributed LLMs.

We propose a reproducible evaluation framework for the task and benchmark a broad set of architectures.

We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development.

Our experimental work gives concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third (How to build LLMs with attribution?).”

This kind of large language model can train a system that can answer with supporting documentation that, theoretically, assures that the response is based on something.

The research paper explains:

“To explore these questions, we propose Attributed Question Answering (QA). In our formulation, the input to the model/system is a question, and the output is an (answer, attribution) pair where answer is an answer string, and attribution is a pointer into a fixed corpus, e.g., of paragraphs.

The returned attribution should give supporting evidence for the answer.”

This technology is specifically for question-answering tasks.

The goal is to create better answers – something that Google would understandably want for Bard.

  • Attribution allows users and developers to assess the “trustworthiness and nuance” of the answers.
  • Attribution allows developers to quickly review the quality of the answers since the sources are provided.

One interesting note is a new technology called AutoAIS that strongly correlates with human raters.

In other words, this technology can automate the work of human raters and scale the process of rating the answers given by a large language model (like Bard).

The researchers share:

“We consider human rating to be the gold standard for system evaluation, but find that AutoAIS correlates well with human judgment at the system level, offering promise as a development metric where human rating is infeasible, or even as a noisy training signal. “

This technology is experimental; it’s probably not in use. But it does show one of the directions that Google is exploring for producing trustworthy answers.

Research Paper On Editing Responses For Factuality

Lastly, there’s a remarkable technology developed at Cornell University (also dating from the end of 2022) that explores a different way to source attribution for what a large language model outputs and can even edit an answer to correct itself.

Cornell University (like Stanford University) licenses technology related to search and other areas, earning millions of dollars per year.

It’s good to keep up with university research because it shows what is possible and what is cutting-edge.

You can download a PDF of the paper here: RARR: Researching and Revising What Language Models Say, Using Language Models (and read the abstract here).

The abstract explains the technology:

“Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog.

However, they sometimes generate unsupported or misleading content.

A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence.

To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible.

…we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models.

Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.”

How Do I Get Access To Google Bard?

Google is currently accepting new users to test Bard, which is currently labeled as experimental. Google is rolling out access for Bard here.

Google Bard is ExperimentalScreenshot from bard.google.com, March 2023

Google is on the record saying that Bard is not search, which should reassure those who feel anxiety about the dawn of AI.

We are at a turning point that is unlike any we’ve seen in, perhaps, a decade.

Understanding Bard is helpful to anyone who publishes on the web or practices SEO because it’s helpful to know the limits of what is possible and the future of what can be achieved.

More Resources:


Featured Image: Whyredphotographor/Shutterstock



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

en_USEnglish