GOOGLE

How Google Analyzes Web Page Content and Weights It

Published

2 years ago

October 20, 2021

Martin Splitt in a Duda webinar explained a concept called Centerpiece Annotation that discusses how Google analyzes content on a web page.

I won’t reproduce the question because it’s somewhat off topic and long.

But what Martin discusses is how Google separates out the boilerplate of a web page and then summarizes from the text content structure what the web page is about.

He mentions what’s called the Centerpiece Annotation.

Martin Splitt explained:

“That’s just us analyzing the content and, I don’t know what we have publicly said about this, but I think I brought it up in one of the podcasts episodes.
So I can probably say that we have a thing called the Centerpiece Annotation, for instance, and there’s a few other annotations that we have where we look at the semantic content, as well as potentially the layout tree.
Advertisement

But fundamentally we can read that from the content structure in HTML already and figure out so “Oh! This looks like from all the natural language processing that we did on this entire text content here that we got, it looks like this is primarily about topic A, dog food.”

Screenshot of Martin Splitt Discussing Centerpiece Annotation

Google's Martin Splitt

Next Martin talks about how the page analysis separates the web page into component parts, some of which aren’t relevant to the Centerpiece.

The parts of the page, he explains, is weighted differently. Weighting is a reference to how important a page element is. So if a section receives a light weighting score then it’s not as important that is weighted with a higher score.

Martin continued:

“And then there’s this other thing here, which seems to be like links to related products but it’s not really part of the centerpiece. It’s not really main content here. This seems to be additional stuff.
And then there’s like a bunch of boilerplate or, “Hey, we figured out that the menu looks pretty much the same on all these pages and lists. This looks pretty much like that menu that we have on all the other pages of this domain,” for instance, or we’ve seen this before. We don’t even actually go by domain or like, “Oh, this looks like a menu.”
We figure out what looks like boilerplate and then, that gets weighted differently as well.”
Advertisement

Off-topic Content Given Less Consideration

Martin next mentions how after Google establishes what a web page is about, that if a section if off-topic then that off topic section is not given as much consideration, presumably for ranking purposes.

Martin explains:

“So if you happen to have content on a page that is not related to the main topic of the rest of the content, we might not give it as much of a consideration as you think.
We still use that information for the link discovery and figuring out your site structure and all of that.
But if a page has 10,000 words on dog food and then 3000 or 2000 or 1000 words on bikes, then probably this is not good content for bikes.”

That’s really interesting because it seems to show that when Google determines what a page is about, then the off-topic content might not have a chance for ranking or as Martin says, is not given “as much of a consideration.”

Jason Barnard asked:

“So that sounds to me like you’re guessing the semantic HTML5. Does semantic HTML5e give you any help or do you just not care? There’s no point?”

What Jason was referencing was the HTML5 markup that defines the different sections of a web page, like the header, navigation, footer, etc.

At the beginning of Martin’s discussion he was making reference to analyzing the content structure and the actual text. So now the topic is kind of drifting a little here into the HTML5 semantic structure.

Martin answered:

“It does help us, but it’s not the only thing that we look for. Yes.”

Centerpiece Annotation

An annotation is a note that explains something. A centerpiece is something that is intended as the center of attention.

A centerpiece annotation seems to be like a summary of the topic of the main content.

Martin explains how Google splits the page out into different sections and weights the parts outside of the centerpiece annotation differently.

He also mentions how parts of a page that are different than the main topic aren’t give much consideration, which seems to mean that it might not be content that can rank.

Citation

Duda Webinar on Essential Rendering

Watch Martin Splitt explain how Google analyzes a web page at the 28:42 minute mark:

Searchenginejournal.com

Share on Facebook

Post on X

Save

AI

Exploring the Evolution of Language Translation: A Comparative Analysis of AI Chatbots and Google Translate

Published

2 months ago

February 26, 2024

Max

A Comparative Analysis of AI Chatbots and Google Translate

According to an article on PCMag, while Google Translate makes translating sentences into over 100 languages easy, regular users acknowledge that there’s still room for improvement.

In theory, large language models (LLMs) such as ChatGPT are expected to bring about a new era in language translation. These models consume vast amounts of text-based training data and real-time feedback from users worldwide, enabling them to quickly learn to generate coherent, human-like sentences in a wide range of languages.

However, despite the anticipation that ChatGPT would revolutionize translation, previous experiences have shown that such expectations are often inaccurate, posing challenges for translation accuracy. To put these claims to the test, PCMag conducted a blind test, asking fluent speakers of eight non-English languages to evaluate the translation results from various AI services.

The test compared ChatGPT (both the free and paid versions) to Google Translate, as well as to other competing chatbots such as Microsoft Copilot and Google Gemini. The evaluation involved comparing the translation quality for two test paragraphs across different languages, including Polish, French, Korean, Spanish, Arabic, Tagalog, and Amharic.

In the first test conducted in June 2023, participants consistently favored AI chatbots over Google Translate. ChatGPT, Google Bard (now Gemini), and Microsoft Bing outperformed Google Translate, with ChatGPT receiving the highest praise. ChatGPT demonstrated superior performance in converting colloquialisms, while Google Translate often provided literal translations that lacked cultural nuance.

For instance, ChatGPT accurately translated colloquial expressions like “blow off steam,” whereas Google Translate produced more literal translations that failed to resonate across cultures. Participants appreciated ChatGPT’s ability to maintain consistent levels of formality and its consideration of gender options in translations.

The success of AI chatbots like ChatGPT can be attributed to reinforcement learning with human feedback (RLHF), which allows these models to learn from human preferences and produce culturally appropriate translations, particularly for non-native speakers. However, it’s essential to note that while AI chatbots outperformed Google Translate, they still had limitations and occasional inaccuracies.

In a subsequent test, PCMag evaluated different versions of ChatGPT, including the free and paid versions, as well as language-specific AI agents from OpenAI’s GPTStore. The paid version of ChatGPT, known as ChatGPT Plus, consistently delivered the best translations across various languages. However, Google Translate also showed improvement, performing surprisingly well compared to previous tests.

Overall, while ChatGPT Plus emerged as the preferred choice for translation, Google Translate demonstrated notable improvement, challenging the notion that AI chatbots are always superior to traditional translation tools.

Source: https://www.pcmag.com/articles/google-translate-vs-chatgpt-which-is-the-best-language-translator

Share on Facebook

Post on X

Save

GOOGLE

Google Implements Stricter Guidelines for Mass Email Senders to Gmail Users

Published

2 months ago

February 13, 2024

Entireweb News Bot

Beginning in April, Gmail senders bombarding users with unwanted mass emails will encounter a surge in message rejections unless they comply with the freshly minted Gmail email sender protocols, Google cautions.

Fresh Guidelines for Dispatching Mass Emails to Gmail Inboxes In an elucidative piece featured on Forbes, it was highlighted that novel regulations are being ushered in to shield Gmail users from the deluge of unsolicited mass emails. Initially, there were reports surfacing about certain marketers receiving error notifications pertaining to messages dispatched to Gmail accounts. Nonetheless, a Google representative clarified that these specific errors, denoted as 550-5.7.56, weren’t novel but rather stemmed from existing authentication prerequisites.

Moreover, Google has verified that commencing from April, they will initiate “the rejection of a portion of non-compliant email traffic, progressively escalating the rejection rate over time.” Google elaborates that, for instance, if 75% of the traffic adheres to the new email sender authentication criteria, then a portion of the remaining non-conforming 25% will face rejection. The exact proportion remains undisclosed. Google does assert that the implementation of the new regulations will be executed in a “step-by-step fashion.”

This cautious and methodical strategy seems to have already kicked off, with transient errors affecting a “fraction of their non-compliant email traffic” coming into play this month. Additionally, Google stipulates that bulk senders will be granted until June 1 to integrate “one-click unsubscribe” in all commercial or promotional correspondence.

Exclusively Personal Gmail Accounts Subject to Rejection These alterations exclusively affect bulk emails dispatched to personal Gmail accounts. Entities sending out mass emails, specifically those transmitting a minimum of 5,000 messages daily to Gmail accounts, will be mandated to authenticate outgoing emails and “refrain from dispatching unsolicited emails.” The 5,000 message threshold is tabulated based on emails transmitted from the same principal domain, irrespective of the employment of subdomains. Once the threshold is met, the domain is categorized as a permanent bulk sender.

These guidelines do not extend to communications directed at Google Workspace accounts, although all senders, including those utilizing Google Workspace, are required to adhere to the updated criteria.

Augmented Security and Enhanced Oversight for Gmail Users A Google spokesperson emphasized that these requisites are being rolled out to “fortify sender-side security and augment user control over inbox contents even further.” For the recipient, this translates to heightened trust in the authenticity of the email sender, thus mitigating the risk of falling prey to phishing attempts, a tactic frequently exploited by malevolent entities capitalizing on authentication vulnerabilities. “If anything,” the spokesperson concludes, “meeting these stipulations should facilitate senders in reaching their intended recipients more efficiently, with reduced risks of spoofing and hijacking by malicious actors.”

Share on Facebook

Post on X

Save

GOOGLE

Google’s Next-Gen AI Chatbot, Gemini, Faces Delays: What to Expect When It Finally Launches

Published

5 months ago

December 4, 2023

Max

In an unexpected turn of events, Google has chosen to postpone the much-anticipated debut of its revolutionary generative AI model, Gemini. Initially poised to make waves this week, the unveiling has now been rescheduled for early next year, specifically in January.

Gemini is set to redefine the landscape of conversational AI, representing Google’s most potent endeavor in this domain to date. Positioned as a multimodal AI chatbot, Gemini boasts the capability to process diverse data types. This includes a unique proficiency in comprehending and generating text, images, and various content formats, even going so far as to create an entire website based on a combination of sketches and written descriptions.

Originally, Google had planned an elaborate series of launch events spanning California, New York, and Washington. Regrettably, these events have been canceled due to concerns about Gemini’s responsiveness to non-English prompts. According to anonymous sources cited by The Information, Google’s Chief Executive, Sundar Pichai, personally decided to postpone the launch, acknowledging the importance of global support as a key feature of Gemini’s capabilities.

Gemini is expected to surpass the renowned ChatGPT, powered by OpenAI’s GPT-4 model, and preliminary private tests have shown promising results. Fueled by significantly enhanced computing power, Gemini has outperformed GPT-4, particularly in FLOPS (Floating Point Operations Per Second), owing to its access to a multitude of high-end AI accelerators through the Google Cloud platform.

SemiAnalysis, a research firm affiliated with Substack Inc., expressed in an August blog post that Gemini appears poised to “blow OpenAI’s model out of the water.” The extensive compute power at Google’s disposal has evidently contributed to Gemini’s superior performance.

Google’s Vice President and Manager of Bard and Google Assistant, Sissie Hsiao, offered insights into Gemini’s capabilities, citing examples like generating novel images in response to specific requests, such as illustrating the steps to ice a three-layer cake.

While Google’s current generative AI offering, Bard, has showcased noteworthy accomplishments, it has struggled to achieve the same level of consumer awareness as ChatGPT. Gemini, with its unparalleled capabilities, is expected to be a game-changer, demonstrating impressive multimodal functionalities never seen before.

During the initial announcement at Google’s I/O developer conference in May, the company emphasized Gemini’s multimodal prowess and its developer-friendly nature. An application programming interface (API) is under development, allowing developers to seamlessly integrate Gemini into third-party applications.

As the world awaits the delayed unveiling of Gemini, the stakes are high, with Google aiming to revolutionize the AI landscape and solidify its position as a leader in generative artificial intelligence. The postponed launch only adds to the anticipation surrounding Gemini’s eventual debut in the coming year.

Share on Facebook

Post on X

Save