Connect with us


How Language Model For Dialogue Applications Work



How Language Model For Dialogue Applications Work

Google creating a language model isn’t something new; in fact, Google LaMDA joins the likes of BERT and MUM as a way for machines to better understand user intent.

Google has researched language-based models for several years with the hope of training a model that could essentially hold an insightful and logical conversation on any topic.

So far, Google LaMDA appears to be the closest to reaching this milestone.

What Is Google LaMDA?

LaMDA, which stands for Language Models for Dialog Application, was created to enable software to better engage in a fluid and natural conversation.

LaMDA is based on the same transformer architecture as other language models such as BERT and GPT-3.

However, due to its training, LaMDA can understand nuanced questions and conversations covering several different topics.

With other models, because of the open-ended nature of conversations,  you could end up speaking about something completely different, despite initially focusing on a single topic.

This behavior can easily confuse most conversational models and chatbots.

During last year’s Google I/O announcement, we saw that LaMDA was built to overcome these issues.

The demonstration proved how the model could naturally carry out a conversation on a randomly given topic.

Despite the stream of loosely associated questions, the conversation remained on track, which was amazing to see.

How Does LaMDA work?

LaMDA was built on Google’s open-source neural network, Transformer, which is used for natural language understanding.

The model is trained to find patterns in sentences, correlations between the different words used in those sentences, and even predict the word that is likely to come next.

It does this by studying datasets consisting of dialogue rather than just individual words.

While a conversational AI system is similar to chatbot software, there are some key differences between the two.

For example, chatbots are trained on limited, specific datasets and can only have a limited conversation based on the data and exact questions it is trained on.

On the other hand, because LaMDA is trained on multiple different datasets, it can have open-ended conversations.

During the training process, it picks up on the nuances of open-ended dialogue and adapts.

It can answer questions on many different topics, depending on the flow of the conversation.

Therefore, it enables conversations that are even more similar to human interaction than chatbots can often provide.

How Is LaMDA Trained?

Google explained that LaMDA has a two-stage training process, including pre-training and fine-tuning.

In total, the model is trained on 1.56 trillion words with 137 billion parameters.


For the pre-training stage, the team at Google created a dataset of 1.56T words from multiple public web documents.

This dataset is then tokenized (turned into a string of characters to make sentences) into 2.81T tokens, on which the model is initially trained.

During pre-training, the model uses general and scalable parallelization to predict the next part of the conversation based on previous tokens it has seen.


LaMDA is trained to perform generation and classification tasks during the fine-tuning phase.

Essentially, the LaMDA generator, which predicts the next part of the dialogue, generates several relevant responses based on the back-and-forth conversation.

The LaMDA classifiers will then predict safety and quality scores for each possible response.

Any response with a low safety score is filtered out before the top-scored response is selected to continue the conversation.

The scores are based on safety, sensibility, specificity, and interesting percentages.

Image from Google AI Blog, March 2022

The goal is to ensure the most relevant, high quality, and ultimately safest response is provided.

LaMDA Key Objectives And Metrics

Three main objectives for the model have been defined to guide the model’s training.

These are quality, safety, and groundedness.


This is based on three human rater dimensions:

  • Sensibleness.
  • Specificity
  • Interestingness.

The quality score is used to ensure a response makes sense in the context it is used, that it is specific to the question asked, and is considered insightful enough to create better dialogue.


To ensure safety, the model follows the standards of responsible AI. A set of safety objectives are used to capture and review the model’s behavior.

This ensures the output does not provide any unintended response and avoids any bias.


Groundedness is defined as “the percentage of responses containing claims about the external world.”

This is used to ensure that responses are as “factually accurate as possible, allowing users to judge the validity of a response based on the reliability of its source.”


Through an ongoing process of quantifying progress, responses from the pre-trained model, fine-tuned model and human raters, are reviewed to evaluate the responses against the aforementioned quality, safety, and groundedness metrics.

So far, they have been able to conclude that:

  • Quality metrics improve with the number of parameters.
  • Safety improves with fine-tuning.
  • Groundedness improves as the model size increases.
LaMDA progressImage from Google AI Blog, March 2022

How Will LaMDA Be Used?

While still a work in progress with no finalized release date, it is predicted that LaMDA will be used in the future to improve customer experience and enable chatbots to provide a more human-like conversation.

In addition, using LaMDA to navigate search within Google’s search engine is a genuine possibility.

LaMDA Implications For SEO

By focusing on language and conversational models, Google offers insight into their vision for the future of search and highlights a shift in how their products are set to develop.

This ultimately means there may well be a shift in search behavior and the way users search for products or information.

Google is constantly working on improving the understanding of users’ search intent to ensure they receive the most useful and relevant results in SERPs.

The LaMDA model will, no doubt, be a key tool to understand questions searchers may be asking.

This all further highlights the need to ensure content is optimized for humans rather than search engines.

Making sure content is conversational and written with your target audience in mind means that even as Google advances, content can continue to perform well.

It’s also key to regularly refresh evergreen content to ensure it evolves with time and remains relevant.

In a paper titled Rethinking Search: Making Experts out of Dilettantes, research engineers from Google shared how they envisage AI advancements such as LaMDA will further enhance “search as a conversation with experts.”

They shared an example around the search question, “What are the health benefits and risks of red wine?”

Currently, Google will display an answer box list of bullet points as answers to this question.

However, they suggest that in the future, a response may well be a paragraph explaining the benefits and risks of red wine, with links to the source information.

Therefore, ensuring content is backed up by expert sources will be more important than ever should Google LaMDA generate search results in the future.

Overcoming Challenges

As with any AI model, there are challenges to address.

The two main challenges engineers face with Google LaMDA are safety and groundedness.

Safety – Avoiding Bias

Because you can pull answers from anywhere on the web, there is the possibility that the output will amplify bias, reflecting the notions that are shared online.

It is important that responsibility comes first with Google LaMDA to ensure it is not generating unpredictable or harmful results.

To help overcome this, Google has open-sourced the resources used to analyze and train the data.

This enables diverse groups to participate in creating the datasets used to train the model, help identify existing bias, and minimize any harmful or misleading information from being shared.

Factual Grounding

It isn’t easy to validate the reliability of answers that AI models produce, as sources are collected from all over the web.

To overcome this challenge, the team enables the model to consult with multiple external sources, including information retrieval systems and even a calculator, to provide accurate results.

The Groundedness metric shared earlier also ensures responses are grounded in known sources. These sources are shared to allow users to validate the results given and prevent the spreading of misinformation.

What’s Next For Google LaMDA?

Google is clear that there are benefits and risks to open-ended dialog models such as LaMDA and are committed to improving safety and groundedness to ensure a more reliable and unbiased experience.

Training LaMDA models on different data, including images or videos, is another thing we may see in the future.

This opens up the ability to navigate even more on the web, using conversational prompts.

Google’s CEO Sundar Pichai said of LaMDA, “We believe LaMDA’s conversation capabilities have the potential to make information and computing radically more accessible and easier to use.”

While a rollout date hasn’t yet been confirmed, it’s no doubt models such as LaMDA will be the future of Google.

More resources: 

Featured Image: Andrey Suslov/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address


Google Hints At Improving Site Rankings In Next Update




Google Hints At Improving Site Rankings In Next Update

Google’s John Mueller says the Search team is “explicitly evaluating” how to reward sites that produce helpful, high-quality content when the next core update rolls out.

The comments came in response to a discussion on X about the impact of March’s core update and September’s helpful content update.

In a series of tweets, Mueller acknowledged the concerns, stating:

“I imagine for most sites strongly affected, the effects will be site-wide for the time being, and it will take until the next update to see similar strong effects (assuming the new state of the site is significantly better than before).”

He added:

“I can’t make any promises, but the team working on this is explicitly evaluating how sites can / will improve in Search for the next update. It would be great to show more users the content that folks have worked hard on, and where sites have taken helpfulness to heart.”

What Does This Mean For SEO Professionals & Site Owners?

Mueller’s comments confirm Google is aware of critiques about the March core update and is refining its ability to identify high-quality sites and reward them appropriately in the next core update.

For websites, clearly demonstrating an authentic commitment to producing helpful and high-quality content remains the best strategy for improving search performance under Google’s evolving systems.

The Aftermath Of Google’s Core Updates

Google’s algorithm updates, including the September “Helpful Content Update” and the March 2024 update, have far-reaching impacts on rankings across industries.

While some sites experienced surges in traffic, others faced substantial declines, with some reporting visibility losses of up to 90%.

As website owners implement changes to align with Google’s guidelines, many question whether their efforts will be rewarded.

There’s genuine concern about the potential for long-term or permanent demotions for affected sites.

Recovery Pathway Outlined, But Challenges Remain

In a previous statement, Mueller acknowledged the complexity of the recovery process, stating that:

“some things take much longer to be reassessed (sometimes months, at the moment), and some bigger effects require another update cycle.”

Mueller clarified that not all changes would require a new update cycle but cautioned that “stronger effects will require another update.”

While affirming that permanent changes are “not very useful in a dynamic world,” Mueller adds that “recovery” implies a return to previous levels, which may be unrealistic given evolving user expectations.

“It’s never ‘just-as-before’,” Mueller stated.

Improved Rankings On The Horizon?

Despite the challenges, Mueller has offered glimmers of hope for impacted sites, stating:

“Yes, sites can grow again after being affected by the ‘HCU’ (well, core update now). This isn’t permanent. It can take a lot of work, time, and perhaps update cycles, and/but a different – updated – site will be different in search too.”

He says the process may require “deep analysis to understand how to make a website relevant in a modern world, and significant work to implement those changes — assuming that it’s something that aligns with what the website even wants.”

Looking Ahead

Google’s search team is actively working on improving site rankings and addressing concerns with the next core update.

However, recovery requires patience, thorough analysis, and persistent effort.

The best way to spend your time until the next update is to remain consistent and produce the most exceptional content in your niche.


How long does it generally take for a website to recover from the impact of a core update?

Recovery timelines can vary and depend on the extent and type of updates made to align with Google’s guidelines.

Google’s John Mueller noted that some changes might be reassessed quickly, while more substantial effects could take months and require additional update cycles.

Google acknowledges the complexity of the recovery process, indicating that significant improvements aligned with Google’s quality signals might be necessary for a more pronounced recovery.

What impact did the March and September updates have on websites, and what steps should site owners take?

The March and September updates had widespread effects on website rankings, with some sites experiencing traffic surges while others faced up to 90% visibility losses.

Publishing genuinely useful, high-quality content is key for website owners who want to bounce back from a ranking drop or maintain strong rankings. Stick to Google’s recommendations and adapt as they keep updating their systems.

To minimize future disruptions from algorithm changes, it’s a good idea to review your whole site thoroughly and build a content plan centered on what your users want and need.

Is it possible for sites affected by core updates to regain their previous ranking positions?

Sites can recover from the impact of core updates, but it requires significant effort and time.

Mueller suggested that recovery might happen over multiple update cycles and involves a deep analysis to align the site with current user expectations and modern search criteria.

While a return to previous levels isn’t guaranteed, sites can improve and grow by continually enhancing the quality and relevance of their content.

Featured Image: eamesBot/Shutterstock

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading


Google Reveals Two New Web Crawlers




Google Reveals Two New Web Crawlers

Google revealed details of two new crawlers that are optimized for scraping image and video content for “research and development” purposes. Although the documentation doesn’t explicitly say so, it’s presumed that there is no impact in ranking should publishers decide to block the new crawlers.

It should be noted that the data scraped by these crawlers are not explicitly for AI training data, that’s what the Google-Extended crawler is for.

GoogleOther Crawlers

The two new crawlers are versions of Google’s GoogleOther crawler that was launched in April 2023. The original GoogleOther crawler was also designated for use by Google product teams for research and development in what is described as one-off crawls, the description of which offers clues about what the new GoogleOther variants will be used for.

The purpose of the original GoogleOther crawler is officially described as:

“GoogleOther is the generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.”

Two GoogleOther Variants

There are two new GoogleOther crawlers:

  • GoogleOther-Image
  • GoogleOther-Video

The new variants are for crawling binary data, which is data that’s not text. HTML data is generally referred to as text files, ASCII or Unicode files. If it can be viewed in a text file then it’s a text file/ASCII/Unicode file. Binary files are files that can’t be open in a text viewer app, files like image, audio, and video.

The new GoogleOther variants are for image and video content. Google lists user agent tokens for both of the new crawlers which can be used in a robots.txt for blocking the new crawlers.

1. GoogleOther-Image

User agent tokens:

  • GoogleOther-Image
  • GoogleOther

Full user agent string:


2. GoogleOther-Video

User agent tokens:

  • GoogleOther-Video
  • GoogleOther

Full user agent string:


Newly Updated GoogleOther User Agent Strings

Google also updated the GoogleOther user agent strings for the regular GoogleOther crawler. For blocking purposes you can continue using the same user agent token as before (GoogleOther). The new Users Agent Strings are just the data sent to servers to identify the full description of the crawlers, in particular the technology used. In this case the technology used is Chrome, with the model number periodically updated to reflect which version is used (W.X.Y.Z is a Chrome version number placeholder in the example listed below)

The full list of GoogleOther user agent strings:

  • Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; GoogleOther)
  • Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GoogleOther) Chrome/W.X.Y.Z Safari/537.36

GoogleOther Family Of Bots

These new bots may from time to time show up in your server logs and this information will help in identifying them as genuine Google crawlers and will help publishers who may want to opt out of having their images and videos scraped for research and development purposes.

Read the updated Google crawler documentation



Featured Image by Shutterstock/ColorMaker

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading


ChatGPT To Surface Reddit Content Via Partnership With OpenAI




ChatGPT artificial intelligence chatbot app on smartphone screen with large shadow giving the feeling of floating on top of the background. White background.

Reddit partners with OpenAI to integrate content into ChatGPT.

  • Reddit and OpenAI announce a partnership.
  • Reddit content will be used in ChatGPT.
  • Concerns about accuracy of Reddit user-generated content.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading