Connect with us

GOOGLE

Google’s Smith Algorithm Outperforms BERT

Published

on

Google recently published a research paper on a new algorithm called SMITH that it claims outperforms BERT for understanding long queries and long documents. In particular, what makes this new model better is that it is able to understand passages within documents in the same way BERT understands words and sentences, which enables the algorithm to understand longer documents.

On November 3, 2020 I read about a Google algorithm called Smith that claims to outperform BERT. I briefly discussed it on November 25th in Episode 395 of the SEO 101 podcast in late November.

I’ve been waiting until I had some time to write a summary of it because SMITH seems to be an important algorithm and deserved a thoughtful write up, which I humbly attempted.

So here it is, I hope you enjoy it and if you do please share this article.

Is Google Using the SMITH Algorithm?

Google does not generally say what specific algorithms it is using. Although the researchers say that this algorithm outperforms BERT, until Google formally states that the SMITH algorithm is in use to understand passages within web pages, it is purely speculative to say whether or not it is in use.

What is the SMITH Algorithm?

SMITH is a new model for trying to understand entire documents. Models such as BERT are trained to understand words within the context of sentences.

In a very simplified description, the SMITH model is trained to understand passages within the context of the entire document.

While algorithms like BERT are trained on data sets to predict randomly hidden words are from the context within sentences, the SMITH algorithm is trained to predict what the next block of sentences are.

This kind of training helps the algorithm understand larger documents better than the BERT algorithm, according to the researchers.

BERT Algorithm Has Limitations

This is how they present the shortcomings of BERT:

“In recent years, self-attention based models like Transformers… and BERT …have achieved state-of-the-art performance in the task of text matching. These models, however, are still limited to short text like a few sentences or one paragraph due to the quadratic computational complexity of self-attention with respect to input text length.

In this paper, we address the issue by proposing the Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder for long-form document matching. Our model contains several innovations to adapt self-attention models for longer text input.”

According to the researchers, the BERT algorithm is limited to understanding short documents. For a variety of reasons explained in the research paper, BERT is not well suited for understanding long-form documents.

The researchers propose their new algorithm which they say outperforms BERT with longer documents.

They then explain why long documents are difficult:

“…semantic matching between long texts is a more challenging task due to a few reasons:

1) When both texts are long, matching them requires a more thorough understanding of semantic relations including matching pattern between text fragments with long distance;

2) Long documents contain internal structure like sections, passages and sentences. For human readers, document structure usually plays a key role for content understanding. Similarly, a model also needs to take document structure information into account for better document matching performance;

3) The processing of long texts is more likely to trigger practical issues like out of TPU/GPU memories without careful model design.”

Larger Input Text

BERT is limited to how long documents can be. SMITH, as you will see further down, performs better the longer the document is.

This is a known shortcoming with BERT.

This is how they explain it:

“Experimental results on several benchmark data for long-form text matching… show that our proposed SMITH model outperforms the previous state-of-the-art models and increases the maximum input text length from 512 to 2048 when comparing with BERT based baselines.”

This fact of SMITH being able to do something that BERT is unable to do is what makes the SMITH model intriguing.

The SMITH model doesn’t replace BERT.

The SMITH model supplements BERT by doing the heavy lifting that BERT is unable to do.

The researchers tested it and said:

“Our experimental results on several benchmark datasets for long-form document matching show that our proposed SMITH model outperforms the previous state-of-the-art models including hierarchical attention…, multi-depth attention-based hierarchical recurrent neural network…, and BERT.

Comparing to BERT based baselines, our model is able to increase maximum input text length from 512 to 2048.”

Long to Long Matching

If I am understanding the research paper correctly, the research paper states that the problem of matching long queries to long content has not been been adequately explored.

According to the researchers:

“To the best of our knowledge, semantic matching between long document pairs, which has many important applications like news recommendation, related article recommendation and document clustering, is less explored and needs more research effort.”

Later in the document they state that there have been some studies that come close to what they are researching.

But overall there appears to be a gap in researching ways to match long queries to long documents. That is the problem the researchers are solving with the SMITH algorithm.

Details of Google’s SMITH

I won’t go deep into the details of the algorithm but I will pick out some general features that communicate a high level view of what it is.

The document explains that they use a pre-training model that is similar to BERT and many other algorithms.

First a little background information so the document makes more sense.

Algorithm Pre-training

Pre-training is where an algorithm is trained on a data set. For typical pre-training of these kinds of algorithms, the engineers will mask (hide) random words within sentences. The algorithm tries to predict the masked words.

As an example, if a sentence is written as, “Old McDonald had a ____,” the algorithm when fully trained might predict, “farm” is the missing word.

As the algorithm learns, it eventually becomes optimized to make less mistakes on the training data.

The pre-training is done for the purpose of training the machine to be accurate and make less mistakes.

Here’s what the paper says:

“Inspired by the recent success of language model pre-training methods like BERT, SMITH also adopts the “unsupervised pre-training + fine-tuning” paradigm for the model training.

For the Smith model pre-training, we propose the masked sentence block language modeling task in addition to the original masked word language modeling task used in BERT for long text inputs.”

Blocks of Sentences are Hidden in Pre-training

Here is where the researchers explain a key part of the algorithm, how relations between sentence blocks in a document are used for understanding what a document is about during the pre-training process.

“When the input text becomes long, both relations between words in a sentence block and relations between sentence blocks within a document becomes important for content understanding.

Therefore, we mask both randomly selected words and sentence blocks during model pre-training.”

The researchers next describe in more detail how this algorithm goes above and beyond the BERT algorithm.

What they’re doing is stepping up the training to go beyond word training to take on blocks of sentences.

Here’s how it is described in the research document:

“In addition to the masked word prediction task in BERT, we propose the masked sentence block prediction task to learn the relations between different sentence blocks.”

The SMITH algorithm is trained to predict blocks of sentences. My personal feeling about that is… that’s pretty cool.

This algorithm is learning the relationships between words and then leveling up to learn the context of blocks of sentences and how they relate to each other in a long document.

Section 4.2.2, titled, “Masked Sentence Block Prediction” provides more details on the process (research paper linked below).

Results of SMITH Testing

The researchers noted that SMITH does better with longer text documents.

“The SMITH model which enjoys longer input text lengths compared with other standard self-attention models is a better choice for long document representation learning and matching.”

In the end, the researchers concluded that the SMITH algorithm does better than BERT for long documents.

Why SMITH Research Paper is Important

One of the reasons I prefer reading research papers over patents is that the research papers share details of whether the proposed model does better than existing and state of the art models.

Many research papers conclude by saying that more work needs to be done. To me that means that the algorithm experiment is promising but likely not ready to be put into a live environment.

A smaller percentage of research papers say that the results outperform the state of the art. These are the research papers that in my opinion are worth paying attention to because they are likelier to make it into Google’s algorithm.

When I say likelier, I don’t mean that the algorithm is or will be in Google’s algorithm.

What I mean is that, relative to other algorithm experiments, the research papers that claim to outperform the state of the art are more likely to make it into Google’s algorithm.

SMITH Outperforms BERT for Long Form Documents

According to the conclusions reached in the research paper, the SMITH model outperforms many models, including BERT, for understanding long content.

“The experimental results on several benchmark datasets show that our proposed SMITH model outperforms previous state-of-the-art Siamese matching models including HAN, SMASH and BERT for long-form document matching.

Moreover, our proposed model increases the maximum input text length from 512 to 2048 when compared with BERT-based baseline methods.”

Is SMITH in Use?

As written earlier, until Google explicitly states they are using SMITH there’s no way to accurately say that the SMITH model is in use at Google.

That said, research papers that aren’t likely in use are those that explicitly state that the findings are a first step toward a new kind of algorithm and that more research is necessary.

This is not the case with this research paper. The research paper authors confidently state that SMITH beats the state of the art for understanding long-form content.

That confidence in the results and the lack of a statement that more research is needed makes this paper more interesting than others and therefore well worth knowing about in case it gets folded into Google’s algorithm sometime in the future or in the present.

Citation

Read the original research paper:

Description of the SMITH Algorithm

Download the SMITH Algorithm PDF Research Paper:

Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching (PDF)

Searchenginejournal.com

GOOGLE

Google to pay $391.5 million settlement over location tracking, state AGs say

Published

on

Google to pay $391.5 million settlement over location tracking, state AGs say

Google has agreed to pay a $391.5 million settlement to 40 states to resolve accusations that it tracked people’s locations in violation of state laws, including snooping on consumers’ whereabouts even after they told the tech behemoth to bug off.

Louisiana Attorney General Jeff Landry said it is time for Big Tech to recognize state laws that limit data collection efforts.

“I have been ringing the alarm bell on big tech for years, and this is why,” Mr. Landry, a Republican, said in a statement Monday. “Citizens must be able to make informed decisions about what information they release to big tech.”

The attorneys general said the investigation resulted in the largest-ever multistate privacy settlement. Connecticut Attorney General William Tong, a Democrat, said Google’s penalty is a “historic win for consumers.”

“Location data is among the most sensitive and valuable personal information Google collects, and there are so many reasons why a consumer may opt out of tracking,” Mr. Tong said. “Our investigation found that Google continued to collect this personal information even after consumers told them not to. That is an unacceptable invasion of consumer privacy, and a violation of state law.”

Location tracking can help tech companies sell digital ads to marketers looking to connect with consumers within their vicinity. It’s another tool in a data-gathering toolkit that generates more than $200 billion in annual ad revenue for Google, accounting for most of the profits pouring into the coffers of its corporate parent, Alphabet, which has a market value of $1.2 trillion.

The settlement is part of a series of legal challenges to Big Tech in the U.S. and around the world, which include consumer protection and antitrust lawsuits.

Though Google, based in Mountain View, California, said it fixed the problems several years ago, the company’s critics remained skeptical. State attorneys general who also have tussled with Google have questioned whether the tech company will follow through on its commitments.

The states aren’t dialing back their scrutiny of Google’s empire.

Last month, Texas Attorney General Ken Paxton said he was filing a lawsuit over reports that Google unlawfully collected millions of Texans’ biometric data such as “voiceprints and records of face geometry.”

The states began investigating Google’s location tracking after The Associated Press reported in 2018 that Android devices and iPhones were storing location data despite the activation of privacy settings intended to prevent the company from following along.

Arizona Attorney General Mark Brnovich went after the company in May 2020. The state’s lawsuit charged that the company had defrauded its users by misleading them into believing they could keep their whereabouts private by turning off location tracking in the settings of their software.

Arizona settled its case with Google for $85 million last month. By then, attorneys general in several other states and the District of Columbia had pounced with their own lawsuits seeking to hold Google accountable.

Along with the hefty penalty, the state attorneys general said, Google must not hide key information about location tracking, must give users detailed information about the types of location tracking information Google collects, and must show additional information to people when users turn location-related account settings to “off.”

States will receive differing sums from the settlement. Mr. Landry’s office said Louisiana would receive more than $12.7 million, and Mr. Tong’s office said Connecticut would collect more than $6.5 million.

The financial penalty will not cripple Google’s business. The company raked in $69 billion in revenue for the third quarter of 2022, according to reports, yielding about $13.9 billion in profit.

Google downplayed its location-tracking tools Monday and said it changed the products at issue long ago.

“Consistent with improvements we’ve made in recent years, we have settled this investigation which was based on outdated product policies that we changed years ago,” Google spokesman Jose Castaneda said in a statement.

Google product managers Marlo McGriff and David Monsees defended their company’s Search and Maps products’ usage of location information.

“Location information lets us offer you a more helpful experience when you use our products,” the two men wrote on Google’s blog. “From Google Maps’ driving directions that show you how to avoid traffic to Google Search surfacing local restaurants and letting you know how busy they are, location information helps connect experiences across Google to what’s most relevant and useful.”

The blog post touted transparency tools and auto-delete controls that Google has developed in recent years and said the private browsing Incognito mode prevents Google Maps from saving an account’s search history.

Mr. McGriff and Mr. Monsees said Google would make changes to its products as part of the settlement. The changes include simplifying the process for deleting location data, updating the method to set up an account and revamping information hubs.

“We’ll provide a new control that allows users to easily turn off their Location History and Web & App Activity settings and delete their past data in one simple flow,” Mr. McGriff and Mr. Monsees wrote. “We’ll also continue deleting Location History data for users who have not recently contributed new Location History data to their account.”

• This article is based in part on wire service reports.

Source link

Continue Reading

GOOGLE

5 Tips to Boost Your Holiday Search Strategy

Published

on

Student writing on computer

With the global economic downturn, inflation, ongoing supply chain challenges, and uncertainty due to the Ukraine war, this year’s holiday shopping season promises to be very challenging. Will people be in the mood to spend despite the gloom? Or will they rein in their enthusiasm and save for the year ahead?

With these issues in mind, here are five considerations to support your search engine optimization strategy this holiday shopping season:

1. Start early.

Rising prices are likely to mean shoppers will start researching their holiday spending earlier than ever to nab the best bargains. Therefore, retailers must roll out their holiday product and category pages — and launch any promotions — sooner to ensure their pages get crawled and indexed by search engines in good time.

Some e-commerce stores manage to get their pages ranking early by updating and reusing the same section of the website for holiday content and promotions, rotating between content for Christmas, Mother’s Day, Valentine gifts, Fourth of July sales, etc. This approach can help you retain the momentum, links and authority you build up with Google and get your holiday pages visible and ranking quickly.

2. Make research an even bigger priority.

With all the uncertainty this year, it’s vital to use SEO research to identify the trending seasonal keywords and search phrases in your retail vertical — and then optimize content accordingly.

With tools such as Google Trends you can extract helpful insights based on the types of searches people are making. For example, with many fashion retailers now charging for product returns, will prioritizing keywords such as “free returns” get more search traction? And with money being tighter, will consumers stick with brands they trust rather than anything new — meaning brand searches might be higher?

3. Make greater use of Google Shopping.

To get the most out of their holiday spending, consumers are more likely to turn to online marketplaces such as Google Shopping as they make it easier to compare products, features and prices, as well as to identify the best deals both online and in nearby stores.

Therefore, take a combined approach which includes listing in Google Shopping and at the same time optimizing product detail pages on your e-commerce site to ensure they’re unique and provide more value than competitors’ pages. Be precise with product names on Google Shopping (e.g., do the names contain the words people are searching for?); ensure you provide all the must-have information Google requires; and set a price that’s not too far from the competition. 

4. Give other search sources the attention they deserve.

Earlier this year Google itself acknowledged that consumers — especially younger consumers — are starting to use TikTok, Instagram and other social media sites for search. In fact, research suggests 11 percent of product searches now start on TikTok and 15 percent on Instagram. Younger consumers in particular are more engaged by visual content, which may explain why they’re embracing visually focused social sites for search. So, as part of your search strategy, create and share content on popular social media sites that your target customers visit.

Similarly, with people starting their shopping searches on marketplaces such as Amazon.com, optimizing any listings you have on the site should be part of your strategy. And thankfully, the better optimized your product detail pages are for Amazon (with unique, useful content), the better they will rank on Google as well!

5. Hold paid budget for late opportunities.

The greater uncertainty and volatility this holiday season mean you must keep a close eye on shopper behavior and be ready to embrace opportunities that emerge later on. Getting high organic rankings for late promotions is always more challenging, so hold some paid search budget back to help drive traffic to those pages — via Google Ads, for example. Important keywords to include in late season search ad campaigns include “delivery before Christmas” and “same-day-delivery.” For locally targeted search ads, consider “pick up any time before Christmas.”

The prospect of a tough, unpredictable holiday shopping season means search teams must roll out seasonal SEO plans early, closely track shoppers’ behavior, and be ready to adapt as things change.

Marcus Pentzek is chief SEO consultant at Searchmetrics, the global provider of search data, software and consulting solutions.

Source link

Continue Reading

GOOGLE

Google Home App Gets an Overhaul, Rolling Out Soon

Published

on

Google Home app

Google refreshes its Home app with a slew of new features after launching a new Nest gear. This makes it faster and easier to pair smart devices with Matter, adds customization and personalization options, an enhanced Nest camera experience, and better intercommunication between devices.

This revamped Home app utilizes Google’s Matter smart home standard – launching later this year – especially the Fast Pair functionality. On an Android phone, it will instantly recognize a Matter device and allow you to easily set it up, bypassing the current procedure that is often slow and difficult. Google is also updating its Nest speakers, displays, and routers – to control Matter devices better.

Google Home App New Features

  • Spaces: This feature allows you to control multiple devices in different rooms. Google has listed a few things by room: kitchen, bedroom, living room, etc., although it’s pretty limited right now. Spaces let you organize devices how you see fit. For instance, you can set up a baby monitor in one room and set a different room’s camera to focus on an area the baby often plays. With Spaces, you can categorize these two devices into one Space category called ‘Baby.’

Google Home app Spaces

  • Favorites: This one is pretty self-explanatory. It allows you to make certain gears as a favorite that you frequently use. Doing so will bring those devices into the limelight within the Google Home app for easier access. 

Google Home app

  • Media: Google adds a new media widget at the bottom of your Home feed. This will automatically determine what media is playing in your home and provide you with the appropriate controls as and when needed. There will be song controls if you listen to music on your speakers. There will be television remote controls if you’re watching TV. 

Google probably won’t roll out this Home app makeover anytime soon. But you can try it for yourself in the coming week by enrolling in the public preview, available in select areas.

Source link

Continue Reading

Trending

en_USEnglish