Lizzi Sassman and Martin Splitt brought on a special Google guest on their Google search off the record podcast to discuss structured data. The guest is named Ryan Levering who has been with Google for over 11 years working on structured data.
Structured Data Past At Google
In short, Ryan Levering explained that when he first started working on the structured data project, he worked on that legacy data highlighter tool in Search Console. But early on, Google seemed to try to move away from requiring us to highlight or markup our content and wanted to use machine learning to figure it all out, which Google’s Gary Illyes said back in 2017 but kind of retracted in 2018. So Google poured a lot of effort into machine learning to figure it out.
Structured Data Present At Google
But over time, Ryan said, it was “much easier �to just ask people to give us their data rather than to pull it off of the web pages.” “It’s surprisingly more accurate,” he added. So they then moved more resources into building out structured data and support documents for site owners to use and hand over the data.
But machine learning is now thrown out the window. Ryan said they still use it a lot for (1) sites that do not use structured data where Google wants to still show rich results for those, (2) for mistakes or abuse, so Google can verify what really the page is saying compared to the structured data. So Ryan said it is a “multiple pronged approach” to using structured data and machine learning for understanding it all.
So that is how Google uses it all today but what about the future.
Structured Data Future At Google
The “medium term future,” Ryan said they plan on using structured data “not just visual treatments but actually help with more understanding on the page.” Google has mentioned this before, that structured data can help Google understand the page but it is not a ranking factor. I guess Google will be working more on that. Plus, medium term future” Ryan said Google wants figure out “how to use structured data more universally in a lot of our features rather than just like here and there, scattered around.”
Long term, Google said how Google can use structured data with how Google “interprets it in general into our internal graph.” Ryan said he “would like to move to where we are adjusting more and more data through structured data-specific channels rather than necessarily conveying all of our information on the web page itself.” Basically figuring out a “cleaner way to do data transfer between data providers and Google.” How does Google do this, he said maybe by working with the large CMS platforms so they can build it into their platforms directly.
Here is the podcast embed:
Here are parts of the transcript:
Ryan Levering : So, my introduction, when I started at Google, we were working on extraction from web pages. So like doing it via ML. So we came in, and the first thing I worked on was the data highlighter product, which is externally. We were looking at web pages and pulling structured data from unstructured text, and my whole team was very into the actual ML aspects of it. So how do we extract data, which in academic circles is often called “wrapper induction”? So when you take the– you build a wrapper that can pull the data out of a template. So reverse engineer the database. But after several years of working on it, there was another project that was side by side that was extracting structured data, which became the core of what we use now.
And I became convinced, after talking to people for a long period of time that, it was much easier �to just ask people to give us their data rather than to pull it off of the web pages. It’s surprisingly more accurate. There’s other problems that can happen because of that, but it’s generally an easier thing to do. And it’s a lot less work for us, and it’s a lot better for the provider. So I came to it from ML and seeing structured data as the enemy at first. And then I was won over as a good mechanism.
So machine learning is– I see as like multiple prongs in our approach for how we get stuff. We want to use machine learning for cases where either we don’t have more information where it’s not provided for us. But it’s always going to be easier to just have the data shown to us, I think. So we will try– I think it’s like a multi-tiered approach, where you have machine learning for cases where we don’t have that data specifically. But then providers always have the option of giving us data, which usually improves accuracy, which usually gives better benefit for the actual provider. So I always see them as working side by side in an ideal world.
Most of our features over time migrate to that approach where we ingest it. Maybe we start with one approach where we’re just using ML. And then we eventually add markups so people have control. Or it’s the opposite way around. And we start– we bootstrap with markup in an eco-system approach where people are giving us data. And then we enhance coverage of the feature by adding ML long run. So, I see them as very compatible. But it’s always good to empower people who are giving you data, to have control over that. So I think it’s really important that structured data in general is part of the overall strategy so the people can actually have some control over the content that we show.
The primary challenge is that we then have to figure out a way to verify that the structured data is accurate. And sometimes this is from actual abuse. And sometimes this is just because there’s a problem with synchronicity. Sometimes people generate structured data for their websites and it becomes out of sync with the actual stuff that’s being shown visually. We see a lot of both. So there needs to be other mechanisms to figure out some balancing act where those things are enforced. So that’s the cost of structured data, I guess, is that extra checking.
Lizzi Sassman: Yeah, speaking of the work that has been done, what about the work that’s to come, the next couple of years for structured data? If you were to give us a peek into the future, what is next for structured data?
Ryan Levering: In the medium-term, I think we’re… I mean we continue to flesh out the structured data usage in terms of adding more features and looking into more ways we can use it in cooler things that are not just visual treatments but actually help with more understanding on the page, I think. And figuring out how to use structured data more universally in a lot of our features rather than just like here and there, scattered around. I think that’s what we’re looking at in a medium-term.
Long-term, I think that it’s going to play a really interesting role at interacting with the way that we interpret it in general into our internal graph. So I would like to see more machine learning, figuring out– I would like to move to where we are adjusting more and more data through structured data-specific channels rather than necessarily conveying all of our information on the web page itself. So I think that’s a much cleaner approach, particularly for some of our structured data ingestion paths. So figuring out a way to get around the actual visual representation and figuring out ways to link the structured data with the web page but not necessarily embed it on the web page. So I think there’s a cleaner way to do data transfer between data providers and Google.
I think that it will make it easier for plug-ins and CMSs to create that information particularly. Because I feel like a lot of the eco-system has moved in that direction where people aren’t implementing the structured data themselves but rather are using content creation tools. I think it’s becoming more important that we have mechanisms to work directly with those content creation tools to ingest the data in a programmatic way in order to make it fresher and easier.
Forum discussion at Twitter.
Daily Search Forum Recap: September 30, 2022
Here is a recap of what happened in the search forums today, through the eyes of the Search Engine Roundtable and other search forums on the web.
Google is testing “more like this” star feature, the things to know box on the right side and the product panels with shaded backgrounds. Google said sometimes a brand becomes so popular it will rank above the general meaning of the word in Google Search. Page speed issues won’t lead to your site being removed from Google. I also posted my weekly SEO video recap – with a cold and cough.
Search Engine Roundtable Stories:
- Search News Buzz Video Recap: Google Core & Product Reviews Update Done, Local Search Ranking Bug Fixed, Search On Event Recap & More
Google has finished rolling out both the September 2022 core update and product reviews update on Monday, September 26th – yes, there is a lot of confusion. Google fixed a bug with the local search rankings and service area businesses. Google had its big Search On event…
- Google “More Like This” Star Search Snippet Feature
Google is testing a new feature that places a large and smaller star next to the search result snippet. When you click on the star icons, you are presented with a box beneath that search result snippet that shows a “more like this” section.
- Google Search: When A Brand Becomes More Popular Than The Meaning Of The Word
There are some brands that have generic names that have become more popular than the actual meaning of the word. In those cases, Google Search may rank or show information about the brand over the meaning of the word.
- Google Testing Things To Know On Right Search Panel
Google is testing the placement of the Things To Know section on the right-side panel. Typically, you find these within the main search results in the middle portion of the search results.
- Google Tests Shady Design For Product Panels
Google is testing another design for the product panels in Google Search. This new design shades the boxes for some of the products and reviews, etc. It also moves some of the filters around.
- Google: Page Speed Issues Wouldn’t Lead To Your Site Being Removed From Google Search
Google’s John Mueller said that your site or page would not be removed from the Google Search results over page speed issues. He was asked about this on Twitter and said no, page speed won’t lead to your site on the basis of page speed.
- 3D Chrome Google Sign
Here is an interesting looking sign at the Google Netherland office. It looks like a 3D chrome sign of sort, with the Google logo. This is hanging on a wall at that office.
Other Great Search Threads:
- It might be that the currency is just not supported by our systems. Unfortunately, that’s sometimes not a technical issue which we can resolve., John Mueller on Twitter
- If they’re not showing up instead of your “real” pages, I wouldn’t worry about it. These technical quirks come & go. Technically they can be indexed, but if nobody sees, John Mueller on Twitter
- A redirect is not a 404 🙂, John Mueller on Twitter
- Curious about how SEO has evolved across a giant part of the web? Check out the @HTTPArchive ‘s Web Almanac chapter on SEO at https://t.co/PKQAn9Vc05 . It’s based on a giant treasure of, John Mueller on Twitter
- Google’s push to get merchants to add product information to their Business Profiles continues. A new section about adding products was just added to the help doc “Edit your Business Profile on Google, Stefan Somborac on Twitter
- It depends on how you have your paywall set up, and what you want it to do. A paywal doesn’t have to be 100%; you might choose to show Google 30%, random, John Mueller on Twitter
- There’s nothing defined about the order of results in a site:-query, I wouldn’t read anything into it 🙂, John Mueller on Twitter
- When is cloaking. not penalisable?, WebmasterWorld
Search Engine Land Stories:
Other Great Search Stories:
Industry & Business
Links & Content Marketing
Local & Maps
Mobile & Voice
Have feedback on this daily recap; let me know on Twitter @rustybrick or @seroundtable, you can follow us on Facebook and make sure to subscribe to the YouTube channel, Apple Podcasts, Spotify, Google Podcasts or just contact us the old fashion way.
5 Social Media Advertising Tips to Nail Your Efforts and Get Your Expected Results
This Week on Xbox: Grounded is Here, Upcoming Releases and Much More
Daily Search Forum Recap: September 30, 2022
B2B PPC Experts Give Their Take On Google Search On Announcements
The 21 Best Lead Generation Tools in 2022
UK teen died after ‘negative effects of online content’: coroner
Critically-Acclaimed CRPG Pathfinder: Wrath of the Righteous Comes to Xbox Today
Daily Search Forum Recap: September 29, 2022
LinkedIn Rolls Out 3 Updates To Pages
The Best Programming Languages for Web Services and Their Advantages In 2022
Google Updates Documentation On Meta Descriptions
Explore the Path to Digital Future: Interconnect, Integrate and Innovate
Daily Search Forum Recap: September 5, 2022
Microsoft Advertising Gains Pinterest Import, More Google Imports, & More
Google Again Says Spikes In Crawling Activity Not A Sign Of The Helpful Content Update Rollout
How To Launch Your First Google Ads Remarketing Campaign
Confusion Over Google Search Console’s HTTPS Is Invalid And Might Prevent It From Being Indexed
The Ultimate Timeline of Google Algorithm Updates (+ Recommendations)
Google Adds More Options to Manage Ad Assets and Extensions
7 Tips For Creating Instagram Story Ads That Convert