Connect with us


The Past, Present and Future Of Structured Data With Google Search



The Past, Present and Future Of Structured Data With Google Search

Lizzi Sassman and Martin Splitt brought on a special Google guest on their Google search off the record podcast to discuss structured data. The guest is named Ryan Levering who has been with Google for over 11 years working on structured data.

Structured Data Past At Google

In short, Ryan Levering explained that when he first started working on the structured data project, he worked on that legacy data highlighter tool in Search Console. But early on, Google seemed to try to move away from requiring us to highlight or markup our content and wanted to use machine learning to figure it all out, which Google’s Gary Illyes said back in 2017 but kind of retracted in 2018. So Google poured a lot of effort into machine learning to figure it out.

Structured Data Present At Google

But over time, Ryan said, it was “much easier �to just ask people to give us their data rather than to pull it off of the web pages.” “It’s surprisingly more accurate,” he added. So they then moved more resources into building out structured data and support documents for site owners to use and hand over the data.

But machine learning is now thrown out the window. Ryan said they still use it a lot for (1) sites that do not use structured data where Google wants to still show rich results for those, (2) for mistakes or abuse, so Google can verify what really the page is saying compared to the structured data. So Ryan said it is a “multiple pronged approach” to using structured data and machine learning for understanding it all.

So that is how Google uses it all today but what about the future.

Structured Data Future At Google

The “medium term future,” Ryan said they plan on using structured data “not just visual treatments but actually help with more understanding on the page.” Google has mentioned this before, that structured data can help Google understand the page but it is not a ranking factor. I guess Google will be working more on that. Plus, medium term future” Ryan said Google wants figure out “how to use structured data more universally in a lot of our features rather than just like here and there, scattered around.”

Long term, Google said how Google can use structured data with how Google “interprets it in general into our internal graph.” Ryan said he “would like to move to where we are adjusting more and more data through structured data-specific channels rather than necessarily conveying all of our information on the web page itself.” Basically figuring out a “cleaner way to do data transfer between data providers and Google.” How does Google do this, he said maybe by working with the large CMS platforms so they can build it into their platforms directly.

Here is the podcast embed:

Here are parts of the transcript:

Ryan Levering : So, my introduction, when I started at Google, we were working on extraction from web pages. So like doing it via ML. So we came in, and the first thing I worked on was the data highlighter product, which is externally. We were looking at web pages and pulling structured data from unstructured text, and my whole team was very into the actual ML aspects of it. So how do we extract data, which in academic circles is often called “wrapper induction”? So when you take the– you build a wrapper that can pull the data out of a template. So reverse engineer the database. But after several years of working on it, there was another project that was side by side that was extracting structured data, which became the core of what we use now.

And I became convinced, after talking to people for a long period of time that, it was much easier �to just ask people to give us their data rather than to pull it off of the web pages. It’s surprisingly more accurate. There’s other problems that can happen because of that, but it’s generally an easier thing to do. And it’s a lot less work for us, and it’s a lot better for the provider. So I came to it from ML and seeing structured data as the enemy at first. And then I was won over as a good mechanism.

So machine learning is– I see as like multiple prongs in our approach for how we get stuff. We want to use machine learning for cases where either we don’t have more information where it’s not provided for us. But it’s always going to be easier to just have the data shown to us, I think. So we will try– I think it’s like a multi-tiered approach, where you have machine learning for cases where we don’t have that data specifically. But then providers always have the option of giving us data, which usually improves accuracy, which usually gives better benefit for the actual provider. So I always see them as working side by side in an ideal world.

Most of our features over time migrate to that approach where we ingest it. Maybe we start with one approach where we’re just using ML. And then we eventually add markups so people have control. Or it’s the opposite way around. And we start– we bootstrap with markup in an eco-system approach where people are giving us data. And then we enhance coverage of the feature by adding ML long run. So, I see them as very compatible. But it’s always good to empower people who are giving you data, to have control over that. So I think it’s really important that structured data in general is part of the overall strategy so the people can actually have some control over the content that we show.

The primary challenge is that we then have to figure out a way to verify that the structured data is accurate. And sometimes this is from actual abuse. And sometimes this is just because there’s a problem with synchronicity. Sometimes people generate structured data for their websites and it becomes out of sync with the actual stuff that’s being shown visually. We see a lot of both. So there needs to be other mechanisms to figure out some balancing act where those things are enforced. So that’s the cost of structured data, I guess, is that extra checking.

Lizzi Sassman: Yeah, speaking of the work that has been done, what about the work that’s to come, the next couple of years for structured data? If you were to give us a peek into the future, what is next for structured data?

Ryan Levering: In the medium-term, I think we’re… I mean we continue to flesh out the structured data usage in terms of adding more features and looking into more ways we can use it in cooler things that are not just visual treatments but actually help with more understanding on the page, I think. And figuring out how to use structured data more universally in a lot of our features rather than just like here and there, scattered around. I think that’s what we’re looking at in a medium-term.

Long-term, I think that it’s going to play a really interesting role at interacting with the way that we interpret it in general into our internal graph. So I would like to see more machine learning, figuring out– I would like to move to where we are adjusting more and more data through structured data-specific channels rather than necessarily conveying all of our information on the web page itself. So I think that’s a much cleaner approach, particularly for some of our structured data ingestion paths. So figuring out a way to get around the actual visual representation and figuring out ways to link the structured data with the web page but not necessarily embed it on the web page. So I think there’s a cleaner way to do data transfer between data providers and Google.

I think that it will make it easier for plug-ins and CMSs to create that information particularly. Because I feel like a lot of the eco-system has moved in that direction where people aren’t implementing the structured data themselves but rather are using content creation tools. I think it’s becoming more important that we have mechanisms to work directly with those content creation tools to ingest the data in a programmatic way in order to make it fresher and easier.

Forum discussion at Twitter.



Google Revamps The Canonicalization Search Help Documentation



Google Cluster Grapes

Google has updated its search help documentation around canonicalization this morning. The Google Search Relations team split in three distinct sections and updated a lot of the content to provide clearer details around how Google Search and canonicalization works.

The three sections include:

All of this use to be on a single help page, which you can review on the Wayback Machine over here to compare.

With this, Gary Illyes from Google dropped another LinkedIn tip on the topic of canonicalization, he wrote:

Friday ramble: you can stack canonicalization signals to strengthen that hint.

You have a rel=canonical pointing from A to B, but A is HTTPS, it’s in your hreflang clusters, all your links are pointing to A, and A is included in your sitemaps instead of B. Which one should search engines pick as canonical, A or B?

If you just change the URLs from A to B in your sitemaps and hreflang clusters, combined with that rel=canonical it might already be enough to tip over canonicalization to B. Change the links also, and you have an even greater chance to convince search engines about your canonical preference.

Recently, Gary also mentioned to use absoluate URLs for rel-canonical.

So check out these new docs and learn a bit more on canonicalization and Google Search.

Forum discussion at LinkedIn.

Source link

Continue Reading


Microsoft Bing’s New BingBot Now Fully Live Today




As a reminder, since April, Bing has been slowly testing a new BingBot user agent, slowly rolling it out to more percentages of crawls over the year. It should have rolled out to 100% of all crawls last month. But now Fabrice Canel said this week that it is near 100%, and an announcement is coming sometime today from Microsoft.

Fabrice Canel from Microsoft Bing wrote on Twitter, “We are near 100% and are proactively monitoring and rolling back any website having issues. Stay tuned for more communication Friday.”

Initially, Bing said it would be rolled out by Fall 2022, then January 2023. Maybe today is the day?

Here has been the rollout so far:

  • April 2022: Less than 5% of crawls
  • July 2022: 5% of all crawls
  • September 2022: 20% of all crawls
  • October 2022: 50% of all crawls
  • February 2023: New 100% of all crawls

Let’s see what Fabrice announces today.

Forum discussion at Twitter.

Update: Confirmed, it is fully live according to Fabrice on LinkedIn.

Source link

Continue Reading


Google’s CEO Sundar Pichai Confirms New Chat Based Search Feature



Google mobile phone texting

Last night I reported on Search Engine Land, based on the Google earnings report and earnings call, that Google’s CEO Sundar Pichai has confirmed the search company will release a chat based search feature based on its own AI, LaMDA, in the coming weeks and months. Plus, there is a big new search event this Wednesday – so maybe we will hear about it there?

Yes, the other day we reported about Apprentice Bard, the reports of Google’s ChatGPT answer to OpenAI and Microsoft Bing.

Here is what Sundar said on the call:

Sundar Pichai said, “In the coming weeks and months, we’ll make these language models available, starting with LaMDA, so that people can engage directly with them. This will help us continue to get feedback, test, and safely improve them. These models are particularly amazing for composing, constructing, and summarizing. They will become even more useful for people as they provide up-to-date more factual information.”

Sundar Pichai said that he “first spoke about Google being an AI-first company” more than “six years ago.” “We have been preparing for this moment since early last year, and you’re going to see a lot from us in the coming few months across three big areas of opportunity; first, large models. We published extensively about LaMDA and PoN, the industry’s largest, most sophisticated model plus extensive work at DeepMind,” he continued to say.

During the question and answer period, Sundar added “We’ll be launching — we’ll — more as labs products in certain cases, beta features in certain cases and just slowly scaling up from there. Obviously, we need to make sure we’re iterating in public, these models will keep getting better, so the field is fast changing. The serving costs will need to be improved.”

“So I view it as very, very early days, but we are committed to putting our experiences, both in terms of new products and experiences, actually bringing direct LLM experiences in Search, making APIs available for developers and enterprises and learn from there and iterate like we’ve always done. So I’m looking forward to it,” he added.

Sundar goes on to add, “In terms of Search too, now that we can integrate more direct LLM type experiences in Search, I think it will help us expand and serve new types of use cases, generative use cases. And so I think I see this as a chance to rethink and re-imagine and drive Search to solve more use cases for our users as well. So again, early days, you will see us be bold, put things out, get feedback and iterate and make things better.”

Here is the event live stream on Wednesday:

Everyone is so ChatGPT crazed!

Forum discussion at Twitter.

Source link

Continue Reading