SEO
Information Retrieval: An Introduction For SEOs
When we talk about information retrieval, as SEO pros, we tend to focus heavily on the information collection stage – the crawling.
During this phase, a search engine would discover and crawl URLs that it has access to (the volume and breadth depending on other factors we colloquially refer to as a crawl budget).
The crawl phase isn’t something we’re going to focus on in this article, nor am I going to go in-depth on how indexing works.
If you want to read more on crawl and indexing, you can do so here.
In this article, I will cover some of the basics of information retrieval, which, when understood, could help you better optimize web pages for ranking performance.
It can also help you better analyze algorithm changes and search engine results page (SERP) updates.
To understand and appreciate how modern-day search engines process practical information retrieval, we need to understand the history of information retrieval on the internet – particularly how it relates to search engine processes.
Regarding digital information retrieval and the foundation technologies adopted by search engines, we can go back to the 1960s and Cornell University, where Gerard Salton led a team that developed the SMART Information Retrieval System.
Salton is credited with developing and using vector space modeling for information retrieval.
Vector Space Models
Vector space models are accepted in the data science community as a key mechanism in how search engines “search” and platforms such as Amazon provide recommendations.
This method allows a processor, such as Google, to compare different documents with queries when queries are represented as vectors.
Google has referred to this in its documents as vector similarity search, or “nearest neighbor search,” defined by Donald Knuth in 1973.
In a traditional keyword search, the processor would use keywords, tags, labels, etc., within the database to find relevant content.
This is quite limited, as it narrows the search field within the database because the answer is a binary yes or no. This method can also be limited when processing synonyms and related entities.
The closer the two entities are in terms of proximity, the less space between the vectors, and the higher in similarity/accuracy they are deemed to be.
To combat this and provide results for queries with multiple common interpretations, Google uses vector similarity to tie various meanings, synonyms, and entities together.
A good example of this is when you Google my name.
To Google, [dan taylor] can be:
- I, the SEO person.
- A British sports journalist.
- A local news reporter.
- Lt Dan Taylor from Forrest Gump.
- A photographer.
- A model-maker.
Using traditional keyword search with binary yes/no criteria, you wouldn’t get this spread of results on page one.
With vector search, the processor can produce a search results page based on similarity and relationships between different entities and vectors within the database.
You can read the company’s blog here to learn more about how Google uses this across multiple products.
Similarity Matching
When comparing documents in this way, search engines likely use a combination of Query Term Weighting (QTW) and the Similarity Coefficient.
QTW applies a weighting to specific terms in the query, which is then used to calculate a similarity coefficient using the vector space model and calculated using the cosine coefficient.
The cosine similarity measures the similarity between two vectors and, in text analysis, is used to measure document similarity.
This is a likely mechanism in how search engines determine duplicate content and value propositions across a website.
Cosine is measured between -1 and 1.
Traditionally on a cosine similarity graph, it will be measured between 0 and 1, with 0 being maximum dissimilarity, or orthogonal, and 1 being maximum similarity.
The Role Of An Index
In SEO, we talk a lot about the index, indexing, and indexing problems – but we don’t actively talk about the role of the index in search engines.
The purpose of an index is to store information, which Google does through tiered indexing systems and shards, to act as a data reservoir.
That’s because it’s unrealistic, unprofitable, and a poor end-user experience to remotely access (crawl) webpages, parse their content, score it, and then present a SERP in real time.
Typically, a modern search engine index wouldn’t contain a complete copy of each document but is more of a database of key points and data that has been tokenized. The document itself will then live in a different cache.
While we don’t know exactly the processes which search engines such as Google will go through as part of their information retrieval system, they will likely have stages of:
- Structural analysis – Text format and structure, lists, tables, images, etc.
- Stemming – Reducing variations of a word to its root. For example, “searched” and “searching” would be reduced to “search.”
- Lexical analysis – Conversion of the document into a list of words and then parsing to identify important factors such as dates, authors, and term frequency. To note, this is not the same as TF*IDF.
We’d also expect during this phase, other considerations and data points are taken into account, such as backlinks, source type, whether or not the document meets the quality threshold, internal linking, main content/supporting content, etc.
Accuracy & Post-Retrieval
In 2016, Paul Haahr gave great insight into how Google measures the “success” of its process and also how it applies post-retrieval adjustments.
You can watch his presentation here.
In most information retrieval systems, there are two primary measures of how successful the system is in returning a good results set.
These are precision and recall.
Precision
The number of documents returned that are relevant versus the total number of documents returned.
Many websites have seen drops in the total number of keywords they rank for over recent months (such as weird, edge keywords they probably had no right in ranking for). We can speculate that search engines are refining the information retrieval system for greater precision.
Recall
The number of relevant documents versus the total number of relevant documents returned.
Search engines gear more towards precision over recall, as precision leads to better search results pages and greater user satisfaction. It is also less system-intensive in returning more documents and processing more data than required.
Conclusion
The practice of information retrieval can be complex due to the different formulas and mechanisms used.
For example:
As we don’t fully know or understand how this process works in search engines, we should focus more on the basics and guidelines provided versus trying to game metrics like TF*IDF that may or may not be used (and vary in how they weigh in the overall outcome).
More resources:
Featured Image: BRO.vector/Shutterstock
SEO
Reddit Makes Game-Changing Updates to Keyword Targeting
In a big move for digital advertisers, Reddit has just introduced a new Keyword Targeting feature, changing the game for how marketers reach their target audiences.
This addition brings fresh potential for PPC marketers looking to tap into Reddit’s highly engaged user base.
With millions of communities and conversations happening every day, Reddit is now offering advertisers a more precise way to get in front of users at the perfect moment.
The best part? They’re leveraging AI to make the process even more powerful.
Let’s break down why this is such an exciting development for digital advertisers.
Keyword Targeting for Conversation and Feed Placements
Reddit has always been about its vibrant communities, or “subreddits,” where users connect over shared interests and discuss a wide range of topics.
Until now, keyword targeting has only been available on conversation placements. Starting today, advertisers can use keyword targeting in both feed and conversation placements.
The targeting update allows advertisers to place ads directly within these conversations, ensuring they reach people when they’re actively engaged with content that’s related to their products or services.
For PPC marketers, this level of targeting means a higher chance of delivering ads to users who are in the right mindset.
Instead of serving ads to users scrolling passively through a general feed, Reddit is giving you the tools to place your ads into specific conversations, where users are already discussing topics related to your industry.
According to Reddit, advertisers who use keyword targeting have seen a 30% increase in conversion volumes. This is a significant lift for marketers focused on performance metrics, such as conversion rates and cost per acquisition.
Scaling Performance with AI-Powered Optimization
While precision is key, Reddit knows that advertisers also need scale.
Reddit mentioned two AI-powered solutions to help balance keyword targeting and scalability within the platform:
- Dynamic Audience Expansion
- Placement Expansion
Dynamic Audience Expansion
This feature works in tandem with keyword targeting to help advertisers broaden their reach, without sacrificing relevance.
Reddit’s AI does the heavy lifting by analyzing signals like user behavior and ad creative performance to identify additional users who are likely to engage with your ad. In essence, it’s expanding your audience in a smart, data-driven way.
For PPC marketers, this means more exposure without having to rely solely on manually selecting every keyword or interest.
You set the initial parameters, and Reddit’s AI expands from there. This not only saves time but also ensures that your ads reach a broader audience that’s still relevant to your goals.
Reddit claims campaigns using Dynamic Audience Expansion have seen a 30% reduction in cost per action (CPA), making it a must-have for marketers focused on efficiency and budget optimization.
Placement Expansion
Another standout feature is Reddit’s multi-placement optimization. This feature uses machine learning to determine the most effective places to show your ads, whether in the feed or within specific conversation threads.
This multi-placement strategy ensures your ads are delivered in the right context to maximize user engagement and conversions.
For PPC marketers, ad placement is a critical factor in campaign success. With Reddit’s AI optimizing these placements, you can trust that your ads will appear where they have the highest likelihood of driving action—whether that’s getting users to click, convert, or engage.
Introducing AI Keyword Suggestions
Reddit’s new AI Keyword Suggestions tool helps with this by analyzing Reddit’s vast conversation data to recommend keywords you might not have thought of.
It allows you to discover new, high-performing keywords related to your campaign, expanding your reach to conversations you might not have considered. And because it’s powered by AI, the suggestions are always based on real-time data and trends happening within Reddit’s communities.
This can be particularly helpful for marketers trying to stay ahead of trending topics or those who want to ensure they’re tapping into conversations with high engagement potential.
As conversations on Reddit shift, so do the keywords that drive those discussions. Reddit’s AI Keyword Suggestions help keep your targeting fresh and relevant, ensuring you don’t miss out on key opportunities.
New Streamlined Campaign Management
Reddit has also made strides in simplifying the campaign setup and management process. They’ve introduced a unified flow that allows advertisers to combine multiple targeting options within a single ad group.
You can now mix keywords, communities, and interests in one campaign, expanding your reach without overcomplicating your structure.
From a PPC perspective, this is huge. Simplifying campaign structure means you can test more variations, optimize faster, and reduce time spent on manual adjustments.
In addition, Reddit has enhanced its reporting capabilities with keyword-level insights, allowing you to drill down into what’s working and what’s not, giving you more control over your campaigns.
The Takeaway for PPC Marketers
For marketers working with Google Ads, Facebook, or Microsoft Advertising, this new update from Reddit should be on your radar.
The combination of keyword targeting, AI-driven audience expansion, and multi-placement optimization makes Reddit a serious contender in the digital advertising space.
If you’re looking to diversify your PPC campaigns, drive higher conversions, and optimize costs, Reddit’s new offerings provide a unique opportunity.
You can read the full announcement from Reddit here.
SEO
What The Google Antitrust Verdict Could Mean For The Future Of SEO
In August 2024, Google lost its first major antitrust case in the U.S. Department of Justice vs. Google.
While we all gained some interesting insights about how Google’s algorithm works (hello, NavBoost!), understanding the implications of this loss for Google as a business is not the easiest to unravel. Hence, this article.
There’s still plenty we don’t know about Google’s future as a result of this trial, but it’s clear there will be consequences ahead.
Even though Google representatives have said they will appeal the decision, both sides are already working on proposals for how to restore competition, which will be decided by August 2025.
My significant other is a corporate lawyer, and this trial has been a frequent topic at the dinner table over the course of the last year.
We come from different professional backgrounds, but we have been equally invested in the outcome – both for our respective careers and industries. His perspective has helped me better grasp the potential legal and business outcomes that could be ahead for Google.
I will break that down for you in this article, along with what that could mean for the SEO industry and Search at-large.
Background: The Case Against Google
In August 2024, Federal Judge Amit Mehta ruled that Google violated the U.S. antitrust law by maintaining an illegal monopoly through exclusive agreements it had with companies like Apple to be the world’s default search engine on smartphones and web browsers.
During the case, we learned that Google paid Apple $20 billion in 2022 to be the default search engine on its Safari browser, thus making it impossible for other search engines like DuckDuckGo or Bing to compete.
This case ruling also found Google guilty of monopolizing general search text advertising because Google was able to raise prices on ad products higher than what would have been possible in a free market.
Those ads are sold via Google Ads (formerly AdWords) and allow marketers to run ads against search keywords related to their business.
Note: There is a second antitrust case still underway about whether Google has created illegal monopolies with open web display ad technology as well. Closing arguments will be heard for that in November 2024 with a verdict to follow
Remedies Proposed By The DOJ
On Oct. 8, 2024, the DOJ filed proposed antitrust remedies for Google. Until this point, there has been plenty of speculation about potential solutions.
Now, we know that the DOJ will be seeking remedies in four “categories of harm”:
- Search Distribution and Revenue Sharing.
- Accumulation and Use of Data.
- Generation and Display of Search Results.
- Advertising Scale and Monetization.
The following sections highlight potential remedies the DOJ proposed in that filing.
Ban On Exclusive Contracts
In order to address Google’s search distribution and revenue sharing, it is likely that we will see a ban on exclusive contracts going forward for Google.
In the Oct. 8 filing, the DOJ outlined exploring limiting or prohibiting default agreements, pre-installation agreements, and other revenue-sharing agreements related to search and search-related products.
Given this is what the case was centered around, it seems most likely that we will see some flavor of this outcome, and that could provide new incentives for innovation around search at Apple.
Apple Search Engine?
Judge Mehta noted in his judgment that Apple had periodically considered building its own search technology, but decided against it when an analysis in 2018 concluded Apple would lose more than $12 billion in revenue during the first five years if they broke up with Google.
If Google were no longer able to have agreements of this nature, we may finally see Apple emerge with a search engine of its own.
According to a Bloomberg report in October 2023, Apple has been “tinkering” with search technology for years.
It has a large search team dedicated to a next-generation search engine for Apple’s apps called “Pegasus,” which has already rolled out in some apps.
And its development of Spotlight to help users find things across their devices has started adding web results to this tool pointing users to sites that answer search queries.
Apple already has a web crawler called Applebot that finds sites it can provide users in Siri and Spotlight. It has also built its own search engines for some of its services like the App Store, Maps, Apple TV, and News.
Apple purchased a company called Laserlike in 2019, which is an AI-based search engine founded by former Google employees. Apple’s machine learning team has been seeking new engineers to work on search technologies as well.
All of these could be important infrastructure for a new search engine.
Implications For SEO
If users are given more choices in their default search engine, some may stray away from Google, which could cut its market share.
However, as of now, Google is still thought of as the leader in search quality, so it’s hard to gauge how much would realistically change if exclusive contracts were banned.
A new search engine from Apple would obviously be an interesting development. It would be a new algorithm to test, understand, and optimize for.
Knowing that users are hungry for another quality option, people would likely embrace Apple in this space, and it could generate a significant amount of users, if the results are high enough quality. Quality is really key.
Search is the most used tool on smartphones, tablets, and computers. Apple has the users that Google needs.
Without Apple’s partnership with Google, Apple has the potential to disrupt this space. It can offer a more integrated search experience than any other company out there. And its commitment to privacy is appealing to many long-time Google users.
The DOJ would likely view this as a win as well because Apple is one of the few companies large enough to fully compete across the search space with Google.
Required Sharing Of Data To Competitors
Related to the accumulation and use of data harm Google has caused, the DOJ is considering a remedy that forces Google to license its data to competitors like Bing or DuckDuckGo.
The antitrust ruling found that Google’s contracts ensure that Google gets the most user data, and that data streams also keep its competitors from improving their search results to compete better.
In the Oct. 8 filing, the DOJ is considering forcing Google to make: 1) the indexes, data, fees, and models used for Google search, including those used in AI-assisted search features, and 2) Google search results, features, and ads, including the underlying ranking signals available via API.
Believe it or not, this solution has precedent, although certainly not at the same scale as what is being proposed for Google.
The DOJ required AT&T to provide royalty-free licenses to its patents in 1956, and required Microsoft to make some of its APIs available to third parties for free after they lost an antitrust case in 1999.
Google has argued that there are user privacy concerns related to data sharing. The DOJ’s response is that it is considering prohibiting Google from using or retaining data that cannot be shared with others because of privacy concerns.
Implications For SEO
Should Google be required to do any of this, it would be an unprecedented victory for the open web. It is overwhelming to think of the possibilities if any of these repercussions were to come to fruition.
We would finally be able to see behind the curtain of the algorithm and ranking signals at play. There would be a true open competition to build rival search engines.
If Google were no longer to use personalized data, we might see the end of personalized search results based on your search history, which has pros and cons.
I would also be curious what would happen to Google Discover since that product provides content based on your browsing history.
The flip side of this potential outcome is that it will be easier than ever to gamify search results again, at least in the short term.
If everyone knew what makes pages rank in Google, we would be back in the early days of SEO, when we could easily manipulate rank.
But if others take the search algorithm and build upon it in different ways, maybe that wouldn’t be as big of a concern in the long term.
Opting Out Of SERP Features
The DOJ filing briefly touched on one intriguing remedy for the harm Google has caused regarding the generation and display of search results.
The DOJ lawyers are proposing that website publishers receive the ability to opt out of Google features or products they wish to.
This would include Google’s AI Overviews, which they give as an example, but it could also include all other SERP features where Google relies on websites and other content created by third parties – in other words, all of them.
Because Google has held this monopoly, publishers have had virtually no bargaining power with Google in regards to being included in SERP features without risking complete exclusion from Google.
This solution would help publishers have more control over how they show up in the search results.
Implications For SEO
This could be potentially huge for SEO if the DOJ does indeed move forward with requiring Google to allow publishers to opt out of any and all features and products they wish without exclusion in Google’s results altogether.
There are plenty of website publishers who do not want Google to be able to use their content to train its AI products, and wish to opt out of AI Overviews.
When featured snippets first came about, there was a similar reaction to those.
Based on the query, featured snippets and AI Overviews have the ability to help or harm website traffic numbers, but it’s intriguing to think there could be a choice in the matter of inclusion.
Licensing Of Ad Feeds
To address advertising scale and monetization harm caused by Google, the DOJ filing provided a few half-baked solutions related to search text advertising.
Because Google holds a 91% market share of search in the U.S., other search engines have struggled to monetize through advertising.
One solution is to require Google to license or syndicate its ad feed independent of its search results. This way, other search engines could better monetize by utilizing Google’s advertising feed.
It is also looking at remedies to provide more transparent and detailed reporting to advertisers about search text ad auctions and monetization, and the ability to opt out of Google search features like keyword expansion and broad match that advertisers don’t want to partake in.
Implications For SEO
I don’t see obvious implications for SEO, but there are plenty for our friends in PPC.
While licensing the Google ad feed is intriguing in order to help other search engines monetize, it doesn’t get at the issue of Google overcharging advertisers in their auctions.
More thought and creativity might be needed here to find a solution that would make sense for both creating more competition in search and fairness for advertisers.
They are certainly on the right track with more transparency in reporting and allowing advertisers to opt out of programs they don’t want to be part of.
Breaking Up Of Google
The DOJ lawyers are also considering “structural remedies” like forcing Google to sell off parts of its business, like the Chrome browser or the Android operating system.
Divesting Android is the remedy that has been discussed the most. It would be another way to prevent Google from having a position of power over device makers and requiring them to enter into agreements for access to other Google product apps like Gmail or Google Play.
If the DOJ forced Google to sell Chrome, that would just be another way to force them to stop using the data from it to inform the search algorithm.
There are behavioral remedies already mentioned that could arguably accomplish the same thing, and without the stock market-shattering impact of a forced breakup.
That said, depending on the outcome of the U.S. election, we could see a DOJ that feels empowered to take bigger swings, so this may still be on the table.
The primary issue with this remedy is that Google’s revenue largely comes from search advertising. So, if the goal is to reduce its market share, would breaking up smaller areas of the business really accomplish that?
Implications For SEO
If Android became a stand-alone business, I don’t see implications for SEO because it isn’t directly related to search.
Also, Apple controls so much of the relevant mobile market that spinning Android off would have little to no effect in regards to addressing monopolistic practices.
If Chrome were sold, Google would lose the valuable user signals that inform Navboost in the algorithm.
That would have some larger implications for the quality of its results since we know, through trial testimony, that those Chrome user signals are heavily weighted in the algorithm.
How much of an impact that would have on the results may only be known inside Google, or maybe not even there, but it could be material.
Final Thoughts
There is so much to be decided in the year (potentially years) to come regarding Google’s fate.
While all of the recent headlines focus on the possibility of Google being broken up, I think this is a less likely outcome.
While divesting Chrome may be on the table, it seems like there are easier ways to accomplish the government’s goals.
And Android and Google Play are both free to customers and rely on open-source code, so mandating changes to them doesn’t seem the most logical way to solve monopolistic practices.
I suspect we’ll see some creative behavioral remedies instead. The banning of exclusive contracts feels like a no-brainer.
Of all the solutions out there, requiring Google to provide APIs of Google search results, ranking signals, etc. is by far the most intriguing idea.
I cannot even imagine a world where we have access to that information right now. And I can only hope that we do see the emergence of an Apple search engine. It feels long overdue for it to enter this space and start disrupting.
Even with Google appealing Mehta’s decision, the remedy proposals will continue ahead.
In November, the DOJ will file a more refined framework, and then Google will propose its own remedies in December.
More resources:
Featured Image: David Gyung/Shutterstock
SEO
Snapchat Is Testing 2 New Advertising Placements
The Snapchat ad ecosystem just expanded with two new placement options.
On Tuesday, Snap announced they started testing on two new placements:
- Sponsored Snaps
- Promoted Places
While not available to the general public yet, Snap provided information on the test, including their launch partners and more about the ad placements.
The goal of these placements are for brands to expand their reach across some of the most widely adopted parts of the platform.
Sponsored Snaps Ad Placement
Snapchat is testing a new Sponsored Snaps placement with Disney, in the announcement from October 8th.
The Sponsored Snaps placement shows a full-screen vertical video to users on Snapchat.
Users can then opt-in to opening the Snap, with options to engage with the advertiser in one of two ways:
- Sending a direct message to the advertiser by replying
- Use the call-to-action to open the link chosen by the advertiser.
Sponsored Snaps aren’t delivered via a push notification and will appear differently than other Snaps in a user’s inbox.
After a certain amount of time, any unopened Sponsored Snaps disappear from a user’s inbox.
Promoted Places Ad Placement
Snap partnered with two other brands for their Promoted Places ad placement test: McDonalds and Taco Bell.
This new ad placement shows on the Snap Map, which is meant to help users discover new places they may want to visit.
Promoted Places will highlight sponsored placements of interest within the Snap Map.
In early testing, Snap said they’ve found adding places as “Top Picks” drives a typical visitation lift of 17.6% for frequent Snapchat users.
They also mentioned the possibility of exploring ideas around customer loyalty on the Snap Map in future phases.
Summary
Snap hasn’t yet announced how long these ad placement tests will run, or when they’ll be available for broader advertisers.
Snap said the Sponsored Snaps and Promoted Places placements will evolve from feedback within the Snapchat community and the brands partnered with them at launch.
In the future, there’s possibility of integrating features like CRM systems and AI chatbot support to make communication more streamlined between brands and Snapchat users.
-
SEO6 days ago
Google’s AI Overviews Avoid Political Content, New Data Shows
-
WORDPRESS6 days ago
5 Most Profitable Online Businesses You Can Start Today for Free!
-
SEARCHENGINES5 days ago
Google Shopping Researched with AI
-
WORDPRESS7 days ago
The WordPress Saga: Does Matt Mullenweg Want a Fork or Not?
-
WORDPRESS5 days ago
8 Best Banks for ECommerce Businesses in 2024
-
SEARCHENGINES6 days ago
Google AI Overview Ads, New Link Format, AI Organized Search Results & Plus More
-
SEO7 days ago
Executive Director Of WordPress Resigns
-
AFFILIATE MARKETING6 days ago
Learn a New Language with This Fresh Approach
You must be logged in to post a comment Login