How Google Analyzes Web Page Content and Weights It
Martin Splitt in a Duda webinar explained a concept called Centerpiece Annotation that discusses how Google analyzes content on a web page.
I won’t reproduce the question because it’s somewhat off topic and long.
But what Martin discusses is how Google separates out the boilerplate of a web page and then summarizes from the text content structure what the web page is about.
He mentions what’s called the Centerpiece Annotation.
Martin Splitt explained:
“That’s just us analyzing the content and, I don’t know what we have publicly said about this, but I think I brought it up in one of the podcasts episodes.
So I can probably say that we have a thing called the Centerpiece Annotation, for instance, and there’s a few other annotations that we have where we look at the semantic content, as well as potentially the layout tree.
But fundamentally we can read that from the content structure in HTML already and figure out so “Oh! This looks like from all the natural language processing that we did on this entire text content here that we got, it looks like this is primarily about topic A, dog food.”
Screenshot of Martin Splitt Discussing Centerpiece Annotation
Next Martin talks about how the page analysis separates the web page into component parts, some of which aren’t relevant to the Centerpiece.
The parts of the page, he explains, is weighted differently. Weighting is a reference to how important a page element is. So if a section receives a light weighting score then it’s not as important that is weighted with a higher score.
Martin continued:
“And then there’s this other thing here, which seems to be like links to related products but it’s not really part of the centerpiece. It’s not really main content here. This seems to be additional stuff.
And then there’s like a bunch of boilerplate or, “Hey, we figured out that the menu looks pretty much the same on all these pages and lists. This looks pretty much like that menu that we have on all the other pages of this domain,” for instance, or we’ve seen this before. We don’t even actually go by domain or like, “Oh, this looks like a menu.”
We figure out what looks like boilerplate and then, that gets weighted differently as well.”
Off-topic Content Given Less Consideration
Martin next mentions how after Google establishes what a web page is about, that if a section if off-topic then that off topic section is not given as much consideration, presumably for ranking purposes.
Martin explains:
“So if you happen to have content on a page that is not related to the main topic of the rest of the content, we might not give it as much of a consideration as you think.
We still use that information for the link discovery and figuring out your site structure and all of that.
But if a page has 10,000 words on dog food and then 3000 or 2000 or 1000 words on bikes, then probably this is not good content for bikes.”
That’s really interesting because it seems to show that when Google determines what a page is about, then the off-topic content might not have a chance for ranking or as Martin says, is not given “as much of a consideration.”
Jason Barnard asked:
“So that sounds to me like you’re guessing the semantic HTML5. Does semantic HTML5e give you any help or do you just not care? There’s no point?”
What Jason was referencing was the HTML5 markup that defines the different sections of a web page, like the header, navigation, footer, etc.
At the beginning of Martin’s discussion he was making reference to analyzing the content structure and the actual text. So now the topic is kind of drifting a little here into the HTML5 semantic structure.
Martin answered:
“It does help us, but it’s not the only thing that we look for. Yes.”
Centerpiece Annotation
An annotation is a note that explains something. A centerpiece is something that is intended as the center of attention.
A centerpiece annotation seems to be like a summary of the topic of the main content.
Martin explains how Google splits the page out into different sections and weights the parts outside of the centerpiece annotation differently.
He also mentions how parts of a page that are different than the main topic aren’t give much consideration, which seems to mean that it might not be content that can rank.
Citation
Duda Webinar on Essential Rendering
Watch Martin Splitt explain how Google analyzes a web page at the 28:42 minute mark:
Google Warns About Misuse of Its Indexing API
Google has updated its Indexing API documentation with a clear warning about spam detection and the possible consequences of misuse.
Warning Against API Misuse The new message in the guide says:
“All submissions through the Indexing API are checked for spam. Any misuse, like using multiple accounts or going over the usage limits, could lead to access being taken away.”
This warning is aimed at people trying to abuse the system by exceeding the API’s limits or breaking Google’s rules.
What Is the Indexing API? The Indexing API allows websites to tell Google when job posting or livestream video pages are added or removed. It helps websites with fast-changing content get their pages crawled and indexed quickly.
But it seems some users have been trying to abuse this by using multiple accounts to get more access.
Impact of the Update Google is now closely watching how people use the Indexing API. If someone breaks the rules, they might lose access to the tool, which could make it harder for them to keep their search results updated for time-sensitive content.
How To Stay Compliant To use the Indexing API properly, follow these rules:
- Don’t go over the usage limits, and if you need more, ask Google instead of using multiple accounts.
- Use the API only for job postings or livestream videos, and make sure your data is correct.
- Follow all of Google’s API guidelines and spam policies.
- Use sitemaps along with the API, not as a replacement.
Remember, the Indexing API isn’t a shortcut to faster indexing. Follow the rules to keep your access.
This Week in Search News: Simple and Easy-to-Read Update
Here’s what happened in the world of Google and search engines this week:
1. Google’s June 2024 Spam Update
Google finished rolling out its June 2024 spam update over a period of seven days. This update aims to reduce spammy content in search results.
2. Changes to Google Search Interface
Google has removed the continuous scroll feature for search results. Instead, it’s back to the old system of pages.
3. New Features and Tests
- Link Cards: Google is testing link cards at the top of AI-generated overviews.
- Health Overviews: There are more AI-generated health overviews showing up in search results.
- Local Panels: Google is testing AI overviews in local information panels.
4. Search Rankings and Quality
- Improving Rankings: Google said it can improve its search ranking system but will only do so on a large scale.
- Measuring Quality: Google’s Elizabeth Tucker shared how they measure search quality.
5. Advice for Content Creators
- Brand Names in Reviews: Google advises not to avoid mentioning brand names in review content.
- Fixing 404 Pages: Google explained when it’s important to fix 404 error pages.
6. New Search Features in Google Chrome
Google Chrome for mobile devices has added several new search features to enhance user experience.
7. New Tests and Features in Google Search
- Credit Card Widget: Google is testing a new widget for credit card information in search results.
- Sliding Search Results: When making a new search query, the results might slide to the right.
8. Bing’s New Feature
Bing is now using AI to write “People Also Ask” questions in search results.
9. Local Search Ranking Factors
Menu items and popular times might be factors that influence local search rankings on Google.
10. Google Ads Updates
- Query Matching and Brand Controls: Google Ads updated its query matching and brand controls, and advertisers are happy with these changes.
- Lead Credits: Google will automate lead credits for Local Service Ads. Google says this is a good change, but some advertisers are worried.
- tROAS Insights Box: Google Ads is testing a new insights box for tROAS (Target Return on Ad Spend) in Performance Max and Standard Shopping campaigns.
- WordPress Tag Code: There is a new conversion code for Google Ads on WordPress sites.
These updates highlight how Google and other search engines are continuously evolving to improve user experience and provide better advertising tools.
AI
Exploring the Evolution of Language Translation: A Comparative Analysis of AI Chatbots and Google Translate
According to an article on PCMag, while Google Translate makes translating sentences into over 100 languages easy, regular users acknowledge that there’s still room for improvement.
In theory, large language models (LLMs) such as ChatGPT are expected to bring about a new era in language translation. These models consume vast amounts of text-based training data and real-time feedback from users worldwide, enabling them to quickly learn to generate coherent, human-like sentences in a wide range of languages.
However, despite the anticipation that ChatGPT would revolutionize translation, previous experiences have shown that such expectations are often inaccurate, posing challenges for translation accuracy. To put these claims to the test, PCMag conducted a blind test, asking fluent speakers of eight non-English languages to evaluate the translation results from various AI services.
The test compared ChatGPT (both the free and paid versions) to Google Translate, as well as to other competing chatbots such as Microsoft Copilot and Google Gemini. The evaluation involved comparing the translation quality for two test paragraphs across different languages, including Polish, French, Korean, Spanish, Arabic, Tagalog, and Amharic.
In the first test conducted in June 2023, participants consistently favored AI chatbots over Google Translate. ChatGPT, Google Bard (now Gemini), and Microsoft Bing outperformed Google Translate, with ChatGPT receiving the highest praise. ChatGPT demonstrated superior performance in converting colloquialisms, while Google Translate often provided literal translations that lacked cultural nuance.
For instance, ChatGPT accurately translated colloquial expressions like “blow off steam,” whereas Google Translate produced more literal translations that failed to resonate across cultures. Participants appreciated ChatGPT’s ability to maintain consistent levels of formality and its consideration of gender options in translations.
The success of AI chatbots like ChatGPT can be attributed to reinforcement learning with human feedback (RLHF), which allows these models to learn from human preferences and produce culturally appropriate translations, particularly for non-native speakers. However, it’s essential to note that while AI chatbots outperformed Google Translate, they still had limitations and occasional inaccuracies.
In a subsequent test, PCMag evaluated different versions of ChatGPT, including the free and paid versions, as well as language-specific AI agents from OpenAI’s GPTStore. The paid version of ChatGPT, known as ChatGPT Plus, consistently delivered the best translations across various languages. However, Google Translate also showed improvement, performing surprisingly well compared to previous tests.
Overall, while ChatGPT Plus emerged as the preferred choice for translation, Google Translate demonstrated notable improvement, challenging the notion that AI chatbots are always superior to traditional translation tools.
Source: https://www.pcmag.com/articles/google-translate-vs-chatgpt-which-is-the-best-language-translator
-
SEO7 days ago
How to Estimate It and Source Data
-
SEO6 days ago
Yoast Co-Founder Suggests A WordPress Contributor Board
-
SEO5 days ago
6 Things You Can Do to Compete With Big Sites
-
WORDPRESS2 days ago
WordPress biz Automattic details WP Engine deal demands • The Register
-
SEARCHENGINES4 days ago
Daily Search Forum Recap: September 30, 2024
-
SEARCHENGINES6 days ago
Google’s 26th Birthday Doodle Is Missing
-
SEO7 days ago
9 Successful PR Campaign Examples, According to the Data
-
SEARCHENGINES5 days ago
Google Volatility With Gains & Losses, Updated Web Spam Policies, Cache Gone & More Search News
You must be logged in to post a comment Login