Connect with us

SEO

How To Create Content Tagging Policies For News Publishers

Published

on

How To Create Content Tagging Policies For News Publishers

September 2022 was one of the most turbulent months of recent times for news publishers.

The month started with the Helpful Content Update, which targeted low-quality, unhelpful content.

That was swiftly followed by the September 2022 Core Algorithm Update: one of Google’s broader updates, which overlapped with the September 2022 Product Reviews Update, targeting low-quality affiliate content (among other things).

For some of the biggest publishers in the game, this has caused major disruption, with publications like the Metro seeing a 40% decrease in visibility.

In an industry that often relies on fairly turbulent traffic sources such as breaking news stories and Google Discover, appropriate content tagging can provide the safety net in which traffic consistency can be somewhat more reliable.

I’ve been working with several large news publishers across entertainment, football, gaming, music, and healthcare this year, some of which attract tens of millions of users each month. They all seemed to be lacking in one particular area: content tagging.

Why Content Tagging Is Important For News Publishers

When used appropriately, content tagging can help to boost a publication’s organic performance massively. However, many publishers are getting it wrong. Three of the main motivations for optimizing content tagging are as follows.

Stronger Topical Authority

We know that with Google’s various updates, there’s a big focus on ensuring that websites that are a genuine authority for the user’s search query are ranked higher. For news publications, becoming an authority relies on a number of factors, such as author specialties, relevant backlinks, and expert content.

With content tagging, you can pull all your expert content on a particular topic into one place, making it easier for Google to crawl, find the connections between different articles, and understand just how authoritative your publication is in this area.

This means each article is directly supported by a tag page and multiple other relevant articles, giving Google more confidence in its topical authority.

Safety Net Of Consistent Organic Traffic

Tag pages can do much more than pull all of your articles together, though. They can actually become strong landing pages that rank for high-volume, generic keywords.

Due to the nature of tag pages being a central hub that can answer various questions on a specific subject, Google is more likely to rank a tag page within the normal organic listings (rather than Top Stories, etc.) for a high search volume generic query.

For example, when someone searches for “Love Island,” there isn’t a huge amount of context behind what the searcher is looking for about Love Island. By serving a tag page, Google gives the user a wider variety of content to consume, therefore increasing the likelihood of satisfying their search intent.

Using the Metro as an example, a quick look in Semrush shows tag pages potentially pulling in hundreds of thousands (and in many cases millions) of organic users each month.

Screenshot from Semrush, September 2022

And most of this tag page traffic, unsurprisingly, is coming from the high volume, generic keywords such as”‘Love Island”:

Metro traffic driving keywords

Screenshot from Semrush, September 2022

When the Metro also ranks in Top Stories for “Love Island,” they’re doubling the chances of capturing the click.

If they do happen to see a drop in organic performance for their Love Island articles, their site still has the safety net of the tag page to pull in the traffic from high-volume, generic keywords.

That is, if the tag page maintains rankings, of course.

What Can Happen Without A Tagging Policy

Having started working with a few large publications which haven’t properly implemented a tagging policy, I’ve seen firsthand how messy things can get when a tagging policy isn’t in place.

When not properly trained on the ins and outs of content tagging and how it relates to SEO, writers have added endless random tags to articles, creating masses of tag pages that offer no real benefit to the website.

From an SEO perspective, these are the issues this causes:

  • Wasted crawl budget: When large volumes of articles are created daily, alongside masses of new tag pages, this results in Googlebot (and other bots) wasting resources by crawling low-quality tag pages rather than the articles themselves.
  • Diluted topical authority signals: When tagging is overdone, you can end up with multiple tag pages which essentially focus on the same subject but spread the articles and topical authority across multiple tags. An example of this would be writing an article about Cristiano Ronaldo breaking his nose, then creating a tag page for “Cristiano Ronaldo,” “Cristiano Ronaldo nose,” “Cristiano Ronaldo broken nose,” and “Cristiano Ronaldo nose injury.” Really, we only need the “Cristiano Ronaldo” tag here, as the article itself will be targeting the “nose” related keywords. So, not only does the main “Cristiano Ronaldo” tag page have to compete with three other related tag pages, the article itself does, too.
  • Index bloat: When niche tag pages are created (such as “Cristiano Ronaldo nose” and “Cristiano Ronaldo nose injury”), they end up having just one article tagged, resulting in thin, low-quality tag pages being indexed, which end up being almost exact duplicates of each other.
  • Hardly any traffic or rankings for tag pages: When writers don’t know how to effectively tag content and optimize tag page performance, the tag pages just end up being a wasted opportunity, as they likely won’t rank or drive traffic.

When improper tagging has been done for a long time, the clean-up job is quite time-consuming and requires detailed analysis to ensure nothing of value is removed. Prevention is definitely better than the cure!

So, tag pages can act as the glue that holds relevant content together and as consistent, evergreen traffic drivers when article performance declines.

But how do you ensure your writers are united in an approach to tagging which benefits the site as a whole? Through tagging policies, of course!

How To Implement Tagging Policies For Writers

Every news publication that publishes content on multiple topics should have an appropriate tagging policy in place, but what should be included? And how should it be written? Below are the items I would advise publishers to focus on.

Create An Introduction To The Policy

Start with a one-paragraph explanation of why the policy is needed and what it aims to achieve. If tagging has been a historical issue for the site, then this is an opportunity to give examples of where things have gone wrong and why. This helps writers to understand the purpose of the policy.

Rule 1: Make Tags Generic Yet Relevant

As mentioned earlier, tags have real potential to rank for high search volume generic keywords, so they should ideally target just that!

You also prevent the risk of diluting topical authority signals through multiple niche tag pages, which all compete for similar keywords.

Rule 2: Use A Maximum Of X Tags Per Article

A good target for tagging is to have one or two tags per article (though this does vary for each publication).

That way, writers will be less likely to create multiple, similar tags, which helps to control index bloat and crawl budget efficiency.

Rule 3: Use Existing Tags Where Possible

Hopefully, your publication creates more than one story per topic, so ensure writers are searching for an appropriate existing tag before they start creating a new one.

Rule 4: Use Lowercase Text And No Special Characters

Depending on the system being used, tags that writers input with capital letters or special characters can end up being applied to the tag page’s URL, which isn’t ideal – and can, once again, result in duplicate tag pages being created (e.g.,/Cristiano-Ronaldo/ and /cristiano-ronaldo/) or just generally unoptimized and messy URLs.

Rule 5: Add Internal Links To Tag Pages From Articles

This one is important. While all of the other points relate to the creation of tagging pages, internal linking from articles is how you start to build up the authority of the tag page itself.

Writers should be linking to the article’s main tag page within the first paragraph if possible, and to other secondary tag pages within the rest of the article where possible.

Ensure You Are Providing Context

One of the main reasons writers end up not adhering to general rules around content tagging is that they simply haven’t been given the context behind why they should be doing things a certain way.

A publication’s SEO strategy relies on writers understanding how their efforts support that strategy, so ensure training is done to help them understand why tagging needs to be done a certain way (feel free to point them towards this article!).

The policies themselves should be simple documents that just outline the basic rules of tagging, almost like a checklist. Training should be provided with the introduction of these policies to provide the context, which can be done in video form to ensure everyone gets the exact same training.

General Tag Page Setup

Beyond the writer’s responsibilities, publication owners need to ensure they have the right technical setup to support the growth of tag pages, too. The following areas should be adequately addressed to support the writer’s efforts.

Convert Tag Pages To Landing Pages

Simple things like indexability need to be considered when setting up tag pages, as well as basic optimizations such as meta titles and descriptions, headers, and intro text.

Providing more detail than your competitors’ tag pages (bio information, introductory text with internal links to related tag pages, etc.) and ensuring that many tagged articles are made immediately available on the first page will also give you an advantage.

Break Content Up Over Multiple Pages

Pagination is an important consideration, and although Googlebot can crawl and index pages that utilize infinite scrolling, my preference would be to break content up over multiple pages, using pagination to make things simpler for Googlebot and avoid any potential issues with rendering, etc.

Add Tag Page Breadcrumbs To Articles

Although writers and editors are responsible for ensuring internal links to tag pages are included within the article’s body text, the technical setup of the page should ensure that the main tag page is linked to by default.

Add breadcrumbs to the top of each article that links to the main tag for that particular article. Article pages will often include a breadcrumb link to the main category (e.g., “Music”), but breadcrumbs also present a fantastic opportunity to promote tag pages.

Add Tag Page Breadcrumbs To Article Schema

Along with the physical breadcrumb link on the page itself, breadcrumb schema can be used within the Article or NewsArticle schema on the page to link to the tag page, giving Google another indication of the connection between the two pages.

Create A Tag Page XML Sitemap

Big news publications inevitably end up with multiple XML sitemaps, including a Google News sitemap and multiple other sitemaps for the masses of older articles, all stored within a sitemap index.

There is also a great opportunity to group tag pages together within their own sitemaps, which can be split out according to their category.

For example, “Artist” sitemaps for music publications, “Team” sitemaps for sports publications, etc., give Googlebot quick and easy access to these important pages.

Create HTML Sitemaps For Priority Tags

To make tag pages even more accessible to both crawlers and users, creating HTML sitemaps is a great way to ensure there are easily accessible internal links to all of your priority tag pages, which essentially become topic indexes.

Again, this might come in the form of an ‘Artists’ or ‘Teams’ page.

Conclusion

Publication owners need to lead by example when it comes to tagging, so by creating a technical setup that prioritizes tag page visibility and sharing a tagging policy that helps writers to understand what they should be doing – and why they should be doing it – everyone can work towards the same goal together.

More resources:


Featured Image: Zerbor/Shutterstock



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

SEO

Optimize Your SEO Strategy For Maximum ROI With These 5 Tips

Published

on

Optimize Your SEO Strategy For Maximum ROI With These 5 Tips

Wondering what improvements can you make to boost organic search results and increase ROI?

If you want to be successful in SEO, even after large Google algorithm updates, be sure to:

  1. Keep the SEO fundamentals at the forefront of your strategy.
  2. Prioritize your SEO efforts for the most rewarding outcomes.
  3. Focus on uncovering and prioritizing commercial opportunities if you’re in ecommerce.
  4. Dive into seasonal trends and how to plan for them.
  5. Get tip 5 and all of the step-by-step how-tos by joining our upcoming webinar.

We’ll share five actionable ways you can discover the most impactful opportunities for your business and achieve maximum ROI.

You’ll learn how to:

  • Identify seasonal trends and plan for them.
  • Report on and optimize your online share of voice.
  • Maximize SERP feature opportunities, most notably Popular Products.

Join Jon Earnshaw, Chief Product Evangelist and Co-Founder of Pi Datametrics, and Sophie Moule, Head of Product and Marketing at Pi Datametrics, as they walk you through ways to drastically improve the ROI of your SEO strategy.

In this live session, we’ll uncover innovative ways you can step up your search strategy and outperform your competitors.

Ready to start maximizing your results and growing your business?

Sign up now and get the actionable insights you need for SEO success.

Can’t attend the live webinar? We’ve got you covered. Register anyway and you’ll get access to a recording, after the event.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

TikTok’s US Future Uncertain: CEO Faces Congress

Published

on

TikTok's US Future Uncertain: CEO Faces Congress

During a five-hour congressional hearing, TikTok CEO Shou Zi Chew faced intense scrutiny from U.S. lawmakers about the social media platform’s connections to its Chinese parent company, ByteDance.

Legislators from both sides demanded clear answers on whether TikTok spies on Americans for China.

The U.S. government has been pushing for the divestiture of TikTok and has even threatened to ban the app in the United States.

Chew found himself in a difficult position, attempting to portray TikTok as an independent company not influenced by China.

However, lawmakers remained skeptical, citing China’s opposition to the sale of TikTok as evidence of the country’s influence over the company.

The hearing was marked by a rare display of bipartisan unity, with the tone harsher than in previous congressional hearings featuring American social media executives.

The Future of TikTok In The US

With the U.S. and China at odds over TikTok’s sale, the app faces two possible outcomes in the United States.

Either TikTok gets banned, or it revisits negotiations for a technical fix to data security concerns.

Lindsay Gorman, head of technology and geopolitics at the German Marshall Fund, said, “The future of TikTok in the U.S. is definitely dimmer and more uncertain today than it was yesterday.”

TikTok has proposed measures to protect U.S. user data, but no security agreement has been reached.

Addressing Concerns About Societal Impact

Lawmakers at the hearing raised concerns about TikTok’s impact on young Americans, accusing the platform of invading privacy and harming mental health.

According to the Pew Research Center, the app is used by 67% of U.S. teenagers.

Critics argue that the app is too addictive and its algorithm can expose teens to dangerous or lethal situations.

Chew pointed to new screen time limits and content guidelines to address these concerns, but lawmakers remained unconvinced.

In Summary

The House Energy and Commerce Committee’s hearing on TikTok addressed concerns common to all social media platforms, like spreading harmful content and collecting massive user data.

Most committee members were critical of TikTok, but many avoided the typical grandstanding seen in high-profile hearings.

The hearing aimed to make a case for regulating social media and protecting children rather than focusing on the national security threat posed by the app’s connection to China.

If anything emerges from this hearing, it could be related to those regulations.

The hearing also allowed Congress to convince Americans that TikTok is a national security threat that warrants a ban.

This concern arises from the potential for the Chinese government to access the data of TikTok’s 150 million U.S. users or manipulate its recommendation algorithms to spread propaganda or disinformation.

However, limited public evidence supports these claims, making banning the app seem extreme and potentially unnecessary.

As events progress, staying informed is crucial as the outcome could impact the digital marketing landscape.


Featured Image: Rokas Tenys/Shutterstock

Full replay of congressional hearing available on YouTube.



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

SEO

Everything You Need To Know

Published

on

Everything You Need To Know

Google has just released Bard, its answer to ChatGPT, and users are getting to know it to see how it compares to OpenAI’s artificial intelligence-powered chatbot.

The name ‘Bard’ is purely marketing-driven, as there are no algorithms named Bard, but we do know that the chatbot is powered by LaMDA.

Here is everything we know about Bard so far and some interesting research that may offer an idea of the kind of algorithms that may power Bard.

What Is Google Bard?

Bard is an experimental Google chatbot that is powered by the LaMDA large language model.

It’s a generative AI that accepts prompts and performs text-based tasks like providing answers and summaries and creating various forms of content.

Bard also assists in exploring topics by summarizing information found on the internet and providing links for exploring websites with more information.

Why Did Google Release Bard?

Google released Bard after the wildly successful launch of OpenAI’s ChatGPT, which created the perception that Google was falling behind technologically.

ChatGPT was perceived as a revolutionary technology with the potential to disrupt the search industry and shift the balance of power away from Google search and the lucrative search advertising business.

On December 21, 2022, three weeks after the launch of ChatGPT, the New York Times reported that Google had declared a “code red” to quickly define its response to the threat posed to its business model.

Forty-seven days after the code red strategy adjustment, Google announced the launch of Bard on February 6, 2023.

What Was The Issue With Google Bard?

The announcement of Bard was a stunning failure because the demo that was meant to showcase Google’s chatbot AI contained a factual error.

The inaccuracy of Google’s AI turned what was meant to be a triumphant return to form into a humbling pie in the face.

Google’s shares subsequently lost a hundred billion dollars in market value in a single day, reflecting a loss of confidence in Google’s ability to navigate the looming era of AI.

How Does Google Bard Work?

Bard is powered by a “lightweight” version of LaMDA.

LaMDA is a large language model that is trained on datasets consisting of public dialogue and web data.

There are two important factors related to the training described in the associated research paper, which you can download as a PDF here: LaMDA: Language Models for Dialog Applications (read the abstract here).

  • A. Safety: The model achieves a level of safety by tuning it with data that was annotated by crowd workers.
  • B. Groundedness: LaMDA grounds itself factually with external knowledge sources (through information retrieval, which is search).

The LaMDA research paper states:

“…factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator.

We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible.”

Google used three metrics to evaluate the LaMDA outputs:

  1. Sensibleness: A measurement of whether an answer makes sense or not.
  2. Specificity: Measures if the answer is the opposite of generic/vague or contextually specific.
  3. Interestingness: This metric measures if LaMDA’s answers are insightful or inspire curiosity.

All three metrics were judged by crowdsourced raters, and that data was fed back into the machine to keep improving it.

The LaMDA research paper concludes by stating that crowdsourced reviews and the system’s ability to fact-check with a search engine were useful techniques.

Google’s researchers wrote:

“We find that crowd-annotated data is an effective tool for driving significant additional gains.

We also find that calling external APIs (such as an information retrieval system) offers a path towards significantly improving groundedness, which we define as the extent to which a generated response contains claims that can be referenced and checked against a known source.”

How Is Google Planning To Use Bard In Search?

The future of Bard is currently envisioned as a feature in search.

Google’s announcement in February was insufficiently specific on how Bard would be implemented.

The key details were buried in a single paragraph close to the end of the blog announcement of Bard, where it was described as an AI feature in search.

That lack of clarity fueled the perception that Bard would be integrated into search, which was never the case.

Google’s February 2023 announcement of Bard states that Google will at some point integrate AI features into search:

“Soon, you’ll see AI-powered features in Search that distill complex information and multiple perspectives into easy-to-digest formats, so you can quickly understand the big picture and learn more from the web: whether that’s seeking out additional perspectives, like blogs from people who play both piano and guitar, or going deeper on a related topic, like steps to get started as a beginner.

These new AI features will begin rolling out on Google Search soon.”

It’s clear that Bard is not search. Rather, it is intended to be a feature in search and not a replacement for search.

What Is A Search Feature?

A feature is something like Google’s Knowledge Panel, which provides knowledge information about notable people, places, and things.

Google’s “How Search Works” webpage about features explains:

“Google’s search features ensure that you get the right information at the right time in the format that’s most useful to your query.

Sometimes it’s a webpage, and sometimes it’s real-world information like a map or inventory at a local store.”

In an internal meeting at Google (reported by CNBC), employees questioned the use of Bard in search.

One employee pointed out that large language models like ChatGPT and Bard are not fact-based sources of information.

The Google employee asked:

“Why do we think the big first application should be search, which at its heart is about finding true information?”

Jack Krawczyk, the product lead for Google Bard, answered:

“I just want to be very clear: Bard is not search.”

At the same internal event, Google’s Vice President of Engineering for Search, Elizabeth Reid, reiterated that Bard is not search.

She said:

“Bard is really separate from search…”

What we can confidently conclude is that Bard is not a new iteration of Google search. It is a feature.

Bard Is An Interactive Method For Exploring Topics

Google’s announcement of Bard was fairly explicit that Bard is not search. This means that, while search surfaces links to answers, Bard helps users investigate knowledge.

The announcement explains:

“When people think of Google, they often think of turning to us for quick factual answers, like ‘how many keys does a piano have?’

But increasingly, people are turning to Google for deeper insights and understanding – like, ‘is the piano or guitar easier to learn, and how much practice does each need?’

Learning about a topic like this can take a lot of effort to figure out what you really need to know, and people often want to explore a diverse range of opinions or perspectives.”

It may be helpful to think of Bard as an interactive method for accessing knowledge about topics.

Bard Samples Web Information

The problem with large language models is that they mimic answers, which can lead to factual errors.

The researchers who created LaMDA state that approaches like increasing the size of the model can help it gain more factual information.

But they noted that this approach fails in areas where facts are constantly changing during the course of time, which researchers refer to as the “temporal generalization problem.”

Freshness in the sense of timely information cannot be trained with a static language model.

The solution that LaMDA pursued was to query information retrieval systems. An information retrieval system is a search engine, so LaMDA checks search results.

This feature from LaMDA appears to be a feature of Bard.

The Google Bard announcement explains:

“Bard seeks to combine the breadth of the world’s knowledge with the power, intelligence, and creativity of our large language models.

It draws on information from the web to provide fresh, high-quality responses.”

Screenshot of a Google Bard Chat, March 2023

LaMDA and (possibly by extension) Bard achieve this with what is called the toolset (TS).

The toolset is explained in the LaMDA researcher paper:

“We create a toolset (TS) that includes an information retrieval system, a calculator, and a translator.

TS takes a single string as input and outputs a list of one or more strings. Each tool in TS expects a string and returns a list of strings.

For example, the calculator takes “135+7721”, and outputs a list containing [“7856”]. Similarly, the translator can take “hello in French” and output [‘Bonjour’].

Finally, the information retrieval system can take ‘How old is Rafael Nadal?’, and output [‘Rafael Nadal / Age / 35’].

The information retrieval system is also capable of returning snippets of content from the open web, with their corresponding URLs.

The TS tries an input string on all of its tools, and produces a final output list of strings by concatenating the output lists from every tool in the following order: calculator, translator, and information retrieval system.

A tool will return an empty list of results if it can’t parse the input (e.g., the calculator cannot parse ‘How old is Rafael Nadal?’), and therefore does not contribute to the final output list.”

Here’s a Bard response with a snippet from the open web:

Google Bard: Everything You Need To KnowScreenshot of a Google Bard Chat, March 2023

Conversational Question-Answering Systems

There are no research papers that mention the name “Bard.”

However, there is quite a bit of recent research related to AI, including by scientists associated with LaMDA, that may have an impact on Bard.

The following doesn’t claim that Google is using these algorithms. We can’t say for certain that any of these technologies are used in Bard.

The value in knowing about these research papers is in knowing what is possible.

The following are algorithms relevant to AI-based question-answering systems.

One of the authors of LaMDA worked on a project that’s about creating training data for a conversational information retrieval system.

You can download the 2022 research paper as a PDF here: Dialog Inpainting: Turning Documents into Dialogs (and read the abstract here).

The problem with training a system like Bard is that question-and-answer datasets (like datasets comprised of questions and answers found on Reddit) are limited to how people on Reddit behave.

It doesn’t encompass how people outside of that environment behave and the kinds of questions they would ask, and what the correct answers to those questions would be.

The researchers explored creating a system read webpages, then used a “dialog inpainter” to predict what questions would be answered by any given passage within what the machine was reading.

A passage in a trustworthy Wikipedia webpage that says, “The sky is blue,” could be turned into the question, “What color is the sky?”

The researchers created their own dataset of questions and answers using Wikipedia and other webpages. They called the datasets WikiDialog and WebDialog.

  • WikiDialog is a set of questions and answers derived from Wikipedia data.
  • WebDialog is a dataset derived from webpage dialog on the internet.

These new datasets are 1,000 times larger than existing datasets. The importance of that is it gives conversational language models an opportunity to learn more.

The researchers reported that this new dataset helped to improve conversational question-answering systems by over 40%.

The research paper describes the success of this approach:

“Importantly, we find that our inpainted datasets are powerful sources of training data for ConvQA systems…

When used to pre-train standard retriever and reranker architectures, they advance state-of-the-art across three different ConvQA retrieval benchmarks (QRECC, OR-QUAC, TREC-CAST), delivering up to 40% relative gains on standard evaluation metrics…

Remarkably, we find that just pre-training on WikiDialog enables strong zero-shot retrieval performance—up to 95% of a finetuned retriever’s performance—without using any in-domain ConvQA data. “

Is it possible that Google Bard was trained using the WikiDialog and WebDialog datasets?

It’s difficult to imagine a scenario where Google would pass on training a conversational AI on a dataset that is over 1,000 times larger.

But we don’t know for certain because Google doesn’t often comment on its underlying technologies in detail, except on rare occasions like for Bard or LaMDA.

Large Language Models That Link To Sources

Google recently published an interesting research paper about a way to make large language models cite the sources for their information. The initial version of the paper was published in December 2022, and the second version was updated in February 2023.

This technology is referred to as experimental as of December 2022.

You can download the PDF of the paper here: Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models (read the Google abstract here).

The research paper states the intent of the technology:

“Large language models (LLMs) have shown impressive results while requiring little or no direct supervision.

Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios.

We believe the ability of an LLM to attribute the text that it generates is likely to be crucial in this setting.

We formulate and study Attributed QA as a key first step in the development of attributed LLMs.

We propose a reproducible evaluation framework for the task and benchmark a broad set of architectures.

We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development.

Our experimental work gives concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third (How to build LLMs with attribution?).”

This kind of large language model can train a system that can answer with supporting documentation that, theoretically, assures that the response is based on something.

The research paper explains:

“To explore these questions, we propose Attributed Question Answering (QA). In our formulation, the input to the model/system is a question, and the output is an (answer, attribution) pair where answer is an answer string, and attribution is a pointer into a fixed corpus, e.g., of paragraphs.

The returned attribution should give supporting evidence for the answer.”

This technology is specifically for question-answering tasks.

The goal is to create better answers – something that Google would understandably want for Bard.

  • Attribution allows users and developers to assess the “trustworthiness and nuance” of the answers.
  • Attribution allows developers to quickly review the quality of the answers since the sources are provided.

One interesting note is a new technology called AutoAIS that strongly correlates with human raters.

In other words, this technology can automate the work of human raters and scale the process of rating the answers given by a large language model (like Bard).

The researchers share:

“We consider human rating to be the gold standard for system evaluation, but find that AutoAIS correlates well with human judgment at the system level, offering promise as a development metric where human rating is infeasible, or even as a noisy training signal. “

This technology is experimental; it’s probably not in use. But it does show one of the directions that Google is exploring for producing trustworthy answers.

Research Paper On Editing Responses For Factuality

Lastly, there’s a remarkable technology developed at Cornell University (also dating from the end of 2022) that explores a different way to source attribution for what a large language model outputs and can even edit an answer to correct itself.

Cornell University (like Stanford University) licenses technology related to search and other areas, earning millions of dollars per year.

It’s good to keep up with university research because it shows what is possible and what is cutting-edge.

You can download a PDF of the paper here: RARR: Researching and Revising What Language Models Say, Using Language Models (and read the abstract here).

The abstract explains the technology:

“Language models (LMs) now excel at many tasks such as few-shot learning, question answering, reasoning, and dialog.

However, they sometimes generate unsupported or misleading content.

A user cannot easily determine whether their outputs are trustworthy or not, because most LMs do not have any built-in mechanism for attribution to external evidence.

To enable attribution while still preserving all the powerful advantages of recent generation models, we propose RARR (Retrofit Attribution using Research and Revision), a system that 1) automatically finds attribution for the output of any text generation model and 2) post-edits the output to fix unsupported content while preserving the original output as much as possible.

…we find that RARR significantly improves attribution while otherwise preserving the original input to a much greater degree than previously explored edit models.

Furthermore, the implementation of RARR requires only a handful of training examples, a large language model, and standard web search.”

How Do I Get Access To Google Bard?

Google is currently accepting new users to test Bard, which is currently labeled as experimental. Google is rolling out access for Bard here.

Google Bard is ExperimentalScreenshot from bard.google.com, March 2023

Google is on the record saying that Bard is not search, which should reassure those who feel anxiety about the dawn of AI.

We are at a turning point that is unlike any we’ve seen in, perhaps, a decade.

Understanding Bard is helpful to anyone who publishes on the web or practices SEO because it’s helpful to know the limits of what is possible and the future of what can be achieved.

More Resources:


Featured Image: Whyredphotographor/Shutterstock



Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

en_USEnglish