Connect with us


Google KELM Reduces Bias and Improves Factual Accuracy via @sejournal, @martinibuster



Google AI Blog announced KELM, a way that could be used to reduce bias and toxic content in search (open domain question answering). It uses a method called TEKGEN to convert Knowledge Graph facts into natural language text that can then be used to improve natural language processing models.

What is KELM?

KELM is an acronym for Knowledge-Enhanced Language Model Pre-training.  Natural language processing models like BERT are typically trained on web and other documents. KELM proposes adding trustworthy factual content (knowledge-enhanced) to the language model pre-training in order to improve the factual accuracy and reduce bias.

KELM TEKGEnTEKGEN converts knowledge graph structured data to natural language text known as the KELM CorpusKELM TEKGEn

KELM Uses Trustworthy Data

The Google researchers proposed using knowledge graphs for improving factual accuracy because they’re a trusted source of facts.


Continue Reading Below

“Alternate sources of information are knowledge graphs (KGs), which consist of structured data. KGs are factual in nature because the information is usually extracted from more trusted sources, and post-processing filters and human editors ensure inappropriate and incorrect content are removed.”

Is Google Using KELM?

Google has not indicated whether or not KELM is in  use. KELM is an approach to language model pre-training that shows strong promise and was summarized on the Google AI blog.

Bias, Factual Accuracy and Search Results

According to the research paper this approach improves factual accuracy:

“It carries the further advantages of improved factual accuracy and reduced toxicity in the resulting language model.”

This research is important because reducing bias and increasing factual accuracy could impact how sites are ranked.

But until KELM is put in use there is no way to predict what kind of impact it would have.

Google doesn’t currently fact check search results.

KELM, should it be introduced, could conceivably have an impact on sites that promote factually incorrect statements and ideas.


Continue Reading Below

KELM Could Impact More than Search

The KELM Corpus has been released under a Creative Commons license (CC BY-SA 2.0).

That means, in theory, any other company (like Bing, Facebook or Twitter) can use it to improve their natural language processing pre-training as well.

It’s possible then that the influence of KELM could extend across many search and social media platforms.

Indirect Ties to MUM

Google has also indicated that the next-generation MUM algorithm will not be released until Google is satisfied that bias does not negatively impact the answers it gives.

According to the Google MUM announcement:

“Just as we’ve carefully tested the many applications of BERT launched since 2019, MUM will undergo the same process as we apply these models in Search.
Specifically, we’ll look for patterns that may indicate bias in machine learning to avoid introducing bias into our systems.”

The KELM approach specifically targets bias reduction, which could make it valuable for developing the MUM algorithm.

Machine Learning Can Generate Biased Results

The research paper states that the data that natural language models like BERT and GPT-3 use for training can result in “toxic content” and biases.

In computing there is an old acronym , GIGO that stands for Garbage In – Garbage Out. That means the quality of the output is determined by the quality of the input.

If what you’re training the algorithm with is high quality then the result is going to be high quality.

What the researchers are proposing is to improve the quality of the data that technologies like BERT and MUM are trained on in order to remove biases.

Knowledge Graph

The knowledge graph is a collection of facts in a structured data format. Structured data is a markup language that communicates specific information in a manner easily consumed by machines.

In this case the information is facts about people, places and things.

The Google Knowledge Graph was introduced in 2012 as a way to help Google understand the relationships between things. So when someone asks about Washington, Google could be able to discern if the person asking the question was asking about Washington the person, the state or the District of Columbia.


Continue Reading Below

Google’s knowledge graph was announced to be comprised of data from trusted sources of facts.

Google’s 2012 announcement characterized the knowledge graph as a first step towards building the next generation of search, which we are currently enjoying.

Knowledge Graph and Factual Accuracy

Knowledge graph data is used in this research paper for improving Google’s algorithms because the information is trustworthy and reliable.

The Google research paper proposes integrating knowledge graph information into the training process to remove the biases and increase factual accuracy.

What the Google research proposes is two-fold.

  1. First, they need to convert knowledge bases into natural language text.
  2. Secondly the resulting corpus, named Knowledge-Enhanced Language Model Pre-training (KELM), can then be integrated into the algorithm pre-training to reduce biases.

The researchers explain the problem like this:

“Large pre-trained natural language processing (NLP) models, such as BERT, RoBERTa, GPT-3, T5 and REALM, leverage natural language corpora that are derived from the Web and fine-tuned on task specific data…

However, natural language text alone represents a limited coverage of knowledge… Furthermore, existence of non-factual information and toxic content in text can eventually cause biases in the resulting models.”


Continue Reading Below

From Knowledge Graph Structured Data to Natural Language Text

The researchers state that a problem with integrating knowledge base information into the training is that the knowledge base data is in the form of structured data.

The solution is to convert the knowledge graph structured data to natural language text using a natural language task called, data-to-text-generation.

They explained that because data-to-text-generation is challenging they created what they called a new “pipeline” called “Text from KG Generator (TEKGEN)” to solve the problem.

Citation: Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training (PDF) 

TEKGEN Natural Language Text Improved Factual Accuracy

TEKGEN is the technology the researchers created to convert structured data to natural language text. It is this end result, factual text, that can be used to create the KELM corpus which can then be used as part of machine learning pre-training to help prevent bias from making its way into algorithms.

The researchers noted that adding this additional knowledge graph information (corpora) into the training data resulted in improved factual accuracy.


Continue Reading Below

The TEKGEN/KELM paper states:

“We further show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora.

…our approach converts the KG into natural text, allowing it to be seamlessly integrated into existing language models. It carries the further advantages of improved factual accuracy and reduced toxicity in the resulting language model.”

The KELM article published an illustration showing how one structured data node is concatenated then converted from there to natural text (verbalized).

I broke up the illustration into two parts.

Below is an image representing a knowledge graph structured data. The data is concatenated to text.

Screenshot of First Part of TEKGEN Conversion Process

Google KELM Concatenation

Google KELM Concatenation

The image below represents the next step of the TEKGEN process that takes the concatenated text and converts it to a natural language text.


Continue Reading Below

Screenshot of Text Turned to Natural Language Text

Google KELM Verbalized Knowledge Graph Data

Google KELM Verbalized Knowledge Graph Data

Generating the KELM Corpus

There is another illustration that shows how the KELM natural language text that can be used for pre-training is generated.

The TEKGEN paper shows this illustration plus description:

How TEKGEN works

How TEKGEN works

  • “In Step 1 , KG triples arealigned with Wikipedia text using distant supervision.
  • In Steps 2 & 3 , T5 is fine-tuned sequentially first on this corpus, followed by a small number of steps on the WebNLG corpus,
  • In Step 4 , BERT is fine-tuned to generate a semantic quality score for generated sentences w.r.t. triples.
  • Steps 2 , 3 & 4 together form TEKGEN.
  • To generate the KELM corpus, in Step 5 , entity subgraphs are created using the relation pair alignment counts from the training corpus generated in step 1.
    The subgraph triples are then converted into natural text using TEKGEN.”


Continue Reading Below

KELM Works to Reduce Bias and Promote Accuracy

The KELM article published on Google’s AI blog states that KELM has real-world applications, particularly for question answering tasks which are explicitly related to information retrieval (search) and natural language processing (technologies like BERT and MUM).

Google researches many things, some of which seem to be explorations into what is possible but otherwise seem like dead-ends. Research that probably won’t make it into Google’s algorithm usually concludes with a statement that more research is needed because the technology doesn’t fulfill expectations in one way or another.

But that is not the case with the KELM and TEKGEN research. The article is in fact optimistic about real-world application of the discoveries. That tends to give it a higher probability that KELM could eventually make it into search in one form or another.

This is how the researchers concluded the article on KELM for reducing bias:

“This has real-world applications for knowledge-intensive tasks, such as question answering, where providing factual knowledge is essential. Moreover, such corpora can be applied in pre-training of large language models, and can potentially reduce toxicity and improve factuality.”


Continue Reading Below

Will KELM be Used in Soon?

Google’s recent announcement of the MUM algorithm requires accuracy, something the KELM corpus was created for. But the application of KELM is not limited to MUM.

The fact that reducing bias and factual accuracy are a critical concern in society today and that the researchers are optimistic about the results tends to give it a higher probability of being used in some form in the future in search.


Google AI Article on KELM
KELM: Integrating Knowledge Graphs with Language Model Pre-training Corpora

KELM Research Paper (PDF) 
Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

TEKGEN Training Corpus at GitHub


OpenAI Introduces ChatGPT Plus with Monthly Subscription of $20



Open AI - Chat GPT

OpenAI, the leading artificial intelligence research laboratory, has launched a new product – ChatGPT Plus. The new product is an advanced version of its previous language model, ChatGPT, and is available for a monthly subscription of $20. The company aims to provide a more sophisticated and efficient conversational AI tool to its users through this new product.

ChatGPT Plus is a state-of-the-art language model that uses advanced deep learning algorithms to generate human-like responses to text inputs. The model has been trained on a massive corpus of text data, allowing it to generate coherent and contextually relevant responses. The model is designed to handle a wide range of conversational topics and can be integrated into various applications, such as chatbots, customer support systems, and virtual assistants.

One of the main advantages of ChatGPT Plus over its predecessor, ChatGPT, is its ability to generate responses in a more human-like manner. The model has been fine-tuned to incorporate more advanced language processing techniques, which enable it to better understand the context and tone of a conversation. This makes it possible for the model to generate more nuanced and appropriate responses, which can greatly improve the user experience.

In addition to its advanced language processing capabilities, ChatGPT Plus also offers improved performance in terms of response generation speed and efficiency. The model has been optimized to run on faster hardware and has been fine-tuned to generate responses more quickly. This makes it possible for the model to handle a larger volume of requests, making it an ideal solution for businesses with high traffic websites or customer support centers.

The monthly subscription fee of $20 for ChatGPT Plus makes it an affordable solution for businesses of all sizes. The company has designed the pricing model in such a way that it is accessible to businesses of all sizes, regardless of their budget. This makes it possible for small businesses to take advantage of advanced conversational AI technology, which can greatly improve their customer engagement and support.

OpenAI has also made it easy to integrate ChatGPT Plus into various applications. The company has provided a comprehensive API that allows developers to easily integrate the model into their applications. The API supports a wide range of programming languages, making it possible for developers to use the technology regardless of their preferred programming language. This makes it possible for businesses to quickly and easily incorporate conversational AI into their operations.

In conclusion, OpenAI’s launch of ChatGPT Plus is a significant development in the field of conversational AI. The new product offers advanced language processing capabilities and improved performance, making it an ideal solution for businesses of all sizes. The affordable pricing model and easy integration make it accessible to businesses of all sizes, and the advanced language processing capabilities make it possible for businesses to improve their customer engagement and support. OpenAI’s ChatGPT Plus is set to revolutionize the conversational AI industry and bring advanced technology within the reach of businesses of all sizes.

Visit to read more and to get the latest news about ChatGPT.

Continue Reading


What can ChatGPT do?



ChatGPT Explained

ChatGPT is a large language model developed by OpenAI that is trained on a massive amount of text data. It is capable of generating human-like text and has been used in a variety of applications, such as chatbots, language translation, and text summarization.

One of the key features of ChatGPT is its ability to generate text that is similar to human writing. This is achieved through the use of a transformer architecture, which allows the model to understand the context and relationships between words in a sentence. The transformer architecture is a type of neural network that is designed to process sequential data, such as natural language.

Another important aspect of ChatGPT is its ability to generate text that is contextually relevant. This means that the model is able to understand the context of a conversation and generate responses that are appropriate to the conversation. This is accomplished by the use of a technique called “masked language modeling,” which allows the model to predict the next word in a sentence based on the context of the previous words.

One of the most popular applications of ChatGPT is in the creation of chatbots. Chatbots are computer programs that simulate human conversation and can be used in customer service, sales, and other applications. ChatGPT is particularly well-suited for this task because of its ability to generate human-like text and understand context.

Another application of ChatGPT is language translation. By training the model on a large amount of text data in multiple languages, it can be used to translate text from one language to another. The model is able to understand the meaning of the text and generate a translation that is grammatically correct and semantically equivalent.

In addition to chatbots and language translation, ChatGPT can also be used for text summarization. This is the process of taking a large amount of text and condensing it into a shorter, more concise version. ChatGPT is able to understand the main ideas of the text and generate a summary that captures the most important information.

Despite its many capabilities and applications, ChatGPT is not without its limitations. One of the main challenges with using language models like ChatGPT is the risk of generating text that is biased or offensive. This can occur when the model is trained on text data that contains biases or stereotypes. To address this, OpenAI has implemented a number of techniques to reduce bias in the training data and in the model itself.

In conclusion, ChatGPT is a powerful language model that is capable of generating human-like text and understanding context. It has a wide range of applications, including chatbots, language translation, and text summarization. While there are limitations to its use, ongoing research and development is aimed at improving the model’s performance and reducing the risk of bias.

** The above article has been written 100% by ChatGPT. This is an example of what can be done with AI. This was done to show the advanced text that can be written by an automated AI.

Continue Reading


Google December Product Reviews Update Affects More Than English Language Sites? via @sejournal, @martinibuster



Google’s Product Reviews update was announced to be rolling out to the English language. No mention was made as to if or when it would roll out to other languages. Mueller answered a question as to whether it is rolling out to other languages.

Google December 2021 Product Reviews Update

On December 1, 2021, Google announced on Twitter that a Product Review update would be rolling out that would focus on English language web pages.

The focus of the update was for improving the quality of reviews shown in Google search, specifically targeting review sites.

A Googler tweeted a description of the kinds of sites that would be targeted for demotion in the search rankings:

“Mainly relevant to sites that post articles reviewing products.

Think of sites like “best TVs under $200″.com.

Goal is to improve the quality and usefulness of reviews we show users.”


Continue Reading Below

Google also published a blog post with more guidance on the product review update that introduced two new best practices that Google’s algorithm would be looking for.

The first best practice was a requirement of evidence that a product was actually handled and reviewed.

The second best practice was to provide links to more than one place that a user could purchase the product.

The Twitter announcement stated that it was rolling out to English language websites. The blog post did not mention what languages it was rolling out to nor did the blog post specify that the product review update was limited to the English language.

Google’s Mueller Thinking About Product Reviews Update

Screenshot of Google's John Mueller trying to recall if December Product Review Update affects more than the English language

Screenshot of Google's John Mueller trying to recall if December Product Review Update affects more than the English language

Product Review Update Targets More Languages?

The person asking the question was rightly under the impression that the product review update only affected English language search results.


Continue Reading Below

But he asserted that he was seeing search volatility in the German language that appears to be related to Google’s December 2021 Product Review Update.

This is his question:

“I was seeing some movements in German search as well.

So I was wondering if there could also be an effect on websites in other languages by this product reviews update… because we had lots of movement and volatility in the last weeks.

…My question is, is it possible that the product reviews update affects other sites as well?”

John Mueller answered:

“I don’t know… like other languages?

My assumption was this was global and and across all languages.

But I don’t know what we announced in the blog post specifically.

But usually we try to push the engineering team to make a decision on that so that we can document it properly in the blog post.

I don’t know if that happened with the product reviews update. I don’t recall the complete blog post.

But it’s… from my point of view it seems like something that we could be doing in multiple languages and wouldn’t be tied to English.

And even if it were English initially, it feels like something that is relevant across the board, and we should try to find ways to roll that out to other languages over time as well.

So I’m not particularly surprised that you see changes in Germany.

But I also don’t know what we actually announced with regards to the locations and languages that are involved.”

Does Product Reviews Update Affect More Languages?

While the tweeted announcement specified that the product reviews update was limited to the English language the official blog post did not mention any such limitations.

Google’s John Mueller offered his opinion that the product reviews update is something that Google could do in multiple languages.

One must wonder if the tweet was meant to communicate that the update was rolling out first in English and subsequently to other languages.

It’s unclear if the product reviews update was rolled out globally to more languages. Hopefully Google will clarify this soon.


Google Blog Post About Product Reviews Update

Product reviews update and your site

Google’s New Product Reviews Guidelines

Write high quality product reviews

John Mueller Discusses If Product Reviews Update Is Global

Watch Mueller answer the question at the 14:00 Minute Mark

[embedded content]

Continue Reading