Connect with us


How Google Analyzes Web Page Content and Weights It



image of Ipad and keyboard

Martin Splitt in a Duda webinar explained a concept called Centerpiece Annotation that discusses how Google analyzes content on a web page.

I won’t reproduce the question because it’s somewhat off topic and long.

But what Martin discusses is how Google separates out the boilerplate of a web page and then summarizes from the text content structure what the web page is about.

He mentions what’s called the Centerpiece Annotation.

Martin Splitt explained:

“That’s just us analyzing the content and, I don’t know what we have publicly said about this, but I think I brought it up in one of the podcasts episodes.

So I can probably say that we have a thing called the Centerpiece Annotation, for instance, and there’s a few other annotations that we have where we look at the semantic content, as well as potentially the layout tree.

But fundamentally we can read that from the content structure in HTML already and figure out so “Oh! This looks like from all the natural language processing that we did on this entire text content here that we got, it looks like this is primarily about topic A, dog food.”

Screenshot of Martin Splitt Discussing Centerpiece Annotation

Google's Martin Splitt

Next Martin talks about how the page analysis separates the web page into component parts, some of which aren’t relevant to the Centerpiece.

The parts of the page, he explains, is weighted differently. Weighting is a reference to how important a page element is. So if a section receives a light weighting score then it’s not as important that is weighted with a higher score.

Martin continued:

“And then there’s this other thing here, which seems to be like links to related products but it’s not really part of the centerpiece. It’s not really main content here. This seems to be additional stuff.

And then there’s like a bunch of boilerplate or, “Hey, we figured out that the menu looks pretty much the same on all these pages and lists. This looks pretty much like that menu that we have on all the other pages of this domain,” for instance, or we’ve seen this before. We don’t even actually go by domain or like, “Oh, this looks like a menu.”

We figure out what looks like boilerplate and then, that gets weighted differently as well.”

Off-topic Content Given Less Consideration

Martin next mentions how after Google establishes what a web page is about, that if a section if off-topic then that off topic section is not given as much consideration, presumably for ranking purposes.

Martin explains:

“So if you happen to have content on a page that is not related to the main topic of the rest of the content, we might not give it as much of a consideration as you think.

We still use that information for the link discovery and figuring out your site structure and all of that.

But if a page has 10,000 words on dog food and then 3000 or 2000 or 1000 words on bikes, then probably this is not good content for bikes.”

That’s really interesting because it seems to show that when Google determines what a page is about, then the off-topic content might not have a chance for ranking or as Martin says, is not given “as much of a consideration.”

Jason Barnard asked:

“So that sounds to me like you’re guessing the semantic HTML5. Does semantic HTML5e give you any help or do you just not care? There’s no point?”

What Jason was referencing was the HTML5 markup that defines the different sections of a web page, like the header, navigation, footer, etc.

At the beginning of Martin’s discussion he was making reference to analyzing the content structure and the actual text. So now the topic is kind of drifting a little here into the HTML5 semantic structure.

Martin answered:

“It does help us, but it’s not the only thing that we look for. Yes.”

Centerpiece Annotation

An annotation is a note that explains something. A centerpiece is something that is intended as the center of attention.

A centerpiece annotation seems to be like a summary of the topic of the main content.

Martin explains how Google splits the page out into different sections and weights the parts outside of the centerpiece annotation differently.

He also mentions how parts of a page that are different than the main topic aren’t give much consideration, which seems to mean that it might not be content that can rank.


Duda Webinar on Essential Rendering

Watch Martin Splitt explain how Google analyzes a web page at the 28:42 minute mark:


Google to pay $391.5 million settlement over location tracking, state AGs say



Google to pay $391.5 million settlement over location tracking, state AGs say

Google has agreed to pay a $391.5 million settlement to 40 states to resolve accusations that it tracked people’s locations in violation of state laws, including snooping on consumers’ whereabouts even after they told the tech behemoth to bug off.

Louisiana Attorney General Jeff Landry said it is time for Big Tech to recognize state laws that limit data collection efforts.

“I have been ringing the alarm bell on big tech for years, and this is why,” Mr. Landry, a Republican, said in a statement Monday. “Citizens must be able to make informed decisions about what information they release to big tech.”

The attorneys general said the investigation resulted in the largest-ever multistate privacy settlement. Connecticut Attorney General William Tong, a Democrat, said Google’s penalty is a “historic win for consumers.”

“Location data is among the most sensitive and valuable personal information Google collects, and there are so many reasons why a consumer may opt out of tracking,” Mr. Tong said. “Our investigation found that Google continued to collect this personal information even after consumers told them not to. That is an unacceptable invasion of consumer privacy, and a violation of state law.”

Location tracking can help tech companies sell digital ads to marketers looking to connect with consumers within their vicinity. It’s another tool in a data-gathering toolkit that generates more than $200 billion in annual ad revenue for Google, accounting for most of the profits pouring into the coffers of its corporate parent, Alphabet, which has a market value of $1.2 trillion.

The settlement is part of a series of legal challenges to Big Tech in the U.S. and around the world, which include consumer protection and antitrust lawsuits.

Though Google, based in Mountain View, California, said it fixed the problems several years ago, the company’s critics remained skeptical. State attorneys general who also have tussled with Google have questioned whether the tech company will follow through on its commitments.

The states aren’t dialing back their scrutiny of Google’s empire.

Last month, Texas Attorney General Ken Paxton said he was filing a lawsuit over reports that Google unlawfully collected millions of Texans’ biometric data such as “voiceprints and records of face geometry.”

The states began investigating Google’s location tracking after The Associated Press reported in 2018 that Android devices and iPhones were storing location data despite the activation of privacy settings intended to prevent the company from following along.

Arizona Attorney General Mark Brnovich went after the company in May 2020. The state’s lawsuit charged that the company had defrauded its users by misleading them into believing they could keep their whereabouts private by turning off location tracking in the settings of their software.

Arizona settled its case with Google for $85 million last month. By then, attorneys general in several other states and the District of Columbia had pounced with their own lawsuits seeking to hold Google accountable.

Along with the hefty penalty, the state attorneys general said, Google must not hide key information about location tracking, must give users detailed information about the types of location tracking information Google collects, and must show additional information to people when users turn location-related account settings to “off.”

States will receive differing sums from the settlement. Mr. Landry’s office said Louisiana would receive more than $12.7 million, and Mr. Tong’s office said Connecticut would collect more than $6.5 million.

The financial penalty will not cripple Google’s business. The company raked in $69 billion in revenue for the third quarter of 2022, according to reports, yielding about $13.9 billion in profit.

Google downplayed its location-tracking tools Monday and said it changed the products at issue long ago.

“Consistent with improvements we’ve made in recent years, we have settled this investigation which was based on outdated product policies that we changed years ago,” Google spokesman Jose Castaneda said in a statement.

Google product managers Marlo McGriff and David Monsees defended their company’s Search and Maps products’ usage of location information.

“Location information lets us offer you a more helpful experience when you use our products,” the two men wrote on Google’s blog. “From Google Maps’ driving directions that show you how to avoid traffic to Google Search surfacing local restaurants and letting you know how busy they are, location information helps connect experiences across Google to what’s most relevant and useful.”

The blog post touted transparency tools and auto-delete controls that Google has developed in recent years and said the private browsing Incognito mode prevents Google Maps from saving an account’s search history.

Mr. McGriff and Mr. Monsees said Google would make changes to its products as part of the settlement. The changes include simplifying the process for deleting location data, updating the method to set up an account and revamping information hubs.

“We’ll provide a new control that allows users to easily turn off their Location History and Web & App Activity settings and delete their past data in one simple flow,” Mr. McGriff and Mr. Monsees wrote. “We’ll also continue deleting Location History data for users who have not recently contributed new Location History data to their account.”

• This article is based in part on wire service reports.

Source link

Continue Reading

Subscribe To our Newsletter
We promise not to spam you. Unsubscribe at any time.
Invalid email address