How Google Analyzes Web Page Content and Weights It
Martin Splitt in a Duda webinar explained a concept called Centerpiece Annotation that discusses how Google analyzes content on a web page.
I won’t reproduce the question because it’s somewhat off topic and long.
But what Martin discusses is how Google separates out the boilerplate of a web page and then summarizes from the text content structure what the web page is about.
He mentions what’s called the Centerpiece Annotation.
Martin Splitt explained:
“That’s just us analyzing the content and, I don’t know what we have publicly said about this, but I think I brought it up in one of the podcasts episodes.
So I can probably say that we have a thing called the Centerpiece Annotation, for instance, and there’s a few other annotations that we have where we look at the semantic content, as well as potentially the layout tree.
But fundamentally we can read that from the content structure in HTML already and figure out so “Oh! This looks like from all the natural language processing that we did on this entire text content here that we got, it looks like this is primarily about topic A, dog food.”
Screenshot of Martin Splitt Discussing Centerpiece Annotation
Next Martin talks about how the page analysis separates the web page into component parts, some of which aren’t relevant to the Centerpiece.
The parts of the page, he explains, is weighted differently. Weighting is a reference to how important a page element is. So if a section receives a light weighting score then it’s not as important that is weighted with a higher score.
Martin continued:
“And then there’s this other thing here, which seems to be like links to related products but it’s not really part of the centerpiece. It’s not really main content here. This seems to be additional stuff.
And then there’s like a bunch of boilerplate or, “Hey, we figured out that the menu looks pretty much the same on all these pages and lists. This looks pretty much like that menu that we have on all the other pages of this domain,” for instance, or we’ve seen this before. We don’t even actually go by domain or like, “Oh, this looks like a menu.”
We figure out what looks like boilerplate and then, that gets weighted differently as well.”
Off-topic Content Given Less Consideration
Martin next mentions how after Google establishes what a web page is about, that if a section if off-topic then that off topic section is not given as much consideration, presumably for ranking purposes.
Martin explains:
“So if you happen to have content on a page that is not related to the main topic of the rest of the content, we might not give it as much of a consideration as you think.
We still use that information for the link discovery and figuring out your site structure and all of that.
But if a page has 10,000 words on dog food and then 3000 or 2000 or 1000 words on bikes, then probably this is not good content for bikes.”
That’s really interesting because it seems to show that when Google determines what a page is about, then the off-topic content might not have a chance for ranking or as Martin says, is not given “as much of a consideration.”
Jason Barnard asked:
“So that sounds to me like you’re guessing the semantic HTML5. Does semantic HTML5e give you any help or do you just not care? There’s no point?”
What Jason was referencing was the HTML5 markup that defines the different sections of a web page, like the header, navigation, footer, etc.
At the beginning of Martin’s discussion he was making reference to analyzing the content structure and the actual text. So now the topic is kind of drifting a little here into the HTML5 semantic structure.
Martin answered:
“It does help us, but it’s not the only thing that we look for. Yes.”
Centerpiece Annotation
An annotation is a note that explains something. A centerpiece is something that is intended as the center of attention.
A centerpiece annotation seems to be like a summary of the topic of the main content.
Martin explains how Google splits the page out into different sections and weights the parts outside of the centerpiece annotation differently.
He also mentions how parts of a page that are different than the main topic aren’t give much consideration, which seems to mean that it might not be content that can rank.
Citation
Duda Webinar on Essential Rendering
Watch Martin Splitt explain how Google analyzes a web page at the 28:42 minute mark:
You must be logged in to post a comment Login