SEARCHENGINES
Yandex Search Ranking Factors Leaked & Exposed

Yandex had a boatload of its source code across all its technology allegedly leaked by a disgruntled employee and part of that was the source code for Russia’s largest search engine – Yandex. As you can imagine, SEOs and others are diving in and seeing what they can learn from the source code.
I personally did not download the source code, so I did not go through it myself but I wanted to share what people did find via Twitter from their investigations of the source code.
Here’s the alpha version of an explorer tool for the leaked #Yandex Search code.
It lets you browse through the ranking factors, view by tags, etc, and start to find connections.
Easy to add new features if there’s anything you want to see!https://t.co/AjbYnrDl9P pic.twitter.com/pQ4scOkP6w
— Rob Ousbey : @[email protected] (@RobOusbey) January 28, 2023
I downloaded the code, analyzed it and there is a lot of useful information for Google SEO as well. pic.twitter.com/RWrgnnlpj6
— Alex Buraks (@alex_buraks) January 27, 2023
Theoretically, what is the difference between algorithms used in Google and in Yandex?
They are quite similar:
– there is RankBrain analogue – MatrixNet;
– they are using PageRank (almost the same as in Google);
– a lot of text algorithms are the same. pic.twitter.com/Djjl8Bmjwn— Alex Buraks (@alex_buraks) January 27, 2023
According to Statcounter Yandex is close to Yahoo and Bing by market share: pic.twitter.com/5GKIvKIvAo
— Alex Buraks (@alex_buraks) January 27, 2023
Main insights after analysing this list:
#1 Age of links is a ranking factor. pic.twitter.com/U47uWvEq9w
— Alex Buraks (@alex_buraks) January 27, 2023
#3 Numbers in URLs is bad for rankings pic.twitter.com/ECgwGeGUfb
— Alex Buraks (@alex_buraks) January 27, 2023
#5 Hard pessimization equal PR=0 pic.twitter.com/RRbhuJyZr1
— Alex Buraks (@alex_buraks) January 27, 2023
#7 Fun fact – there is a separate ranking factor for uplifting Wikipedia pic.twitter.com/799F8KFpkE
— Alex Buraks (@alex_buraks) January 27, 2023
#9 Document age and last update both are ranking factors. pic.twitter.com/ay1GTMVEtJ
— Alex Buraks (@alex_buraks) January 27, 2023
Right now I checked ~40% of the list, there are a lot more (about text relevancy, behaivor factors, page rank, internal links,etc).
Will continue this thread after some time.
— Alex Buraks (@alex_buraks) January 27, 2023
The first thread got a lot of impressions (500k views for the moment, thanks for you retweets and likes!), so I decided to finalize.https://t.co/UQiQsnpWd2
— Alex Buraks (@alex_buraks) January 28, 2023
#2 Additionnaly: ranking factor for orphan pages.
You can easy find them via Screming Frog or other crawlers. pic.twitter.com/zIPwAelpD0
— Alex Buraks (@alex_buraks) January 28, 2023
#4 Number of search queries of your site/url is a ranking factor.
Obviously more = better. pic.twitter.com/xXQ6FMDghP
— Alex Buraks (@alex_buraks) January 28, 2023
#6 If your url whould be the last for search session (user will find what he needs) – it whould impact rankings.
There are strict factors for this and predictible factors as well. pic.twitter.com/Zx3sBZORCs
— Alex Buraks (@alex_buraks) January 28, 2023
#8 Special ranking factors for short videos (tiktok, shorts, reels) pic.twitter.com/oKPzL09MID
— Alex Buraks (@alex_buraks) January 28, 2023
#10 Keywords in URL is a ranking factors.
As we can see from the description – the optimal would be include up to 3 words from the search query. pic.twitter.com/Q1euKWSiST
— Alex Buraks (@alex_buraks) January 28, 2023
#14 One more ranking factor for content quality – broken embedded video on the page.
Embed videos – good for rankings.
Broken embed videos – bad. pic.twitter.com/2SUys65PHp— Alex Buraks (@alex_buraks) January 28, 2023
#16 If you backlinks anchors contain all words from the keywords – it’s good for SEO.
If it is in a one link – it’s more beneficial. Especially if the order of words is the same. pic.twitter.com/WrbESJ8Da5
— Alex Buraks (@alex_buraks) January 28, 2023
#18 The quality rank of texts on the domain is a ranking factor.
Pages with low quality content affect the entire domain. pic.twitter.com/MJUCTVB9CH
— Alex Buraks (@alex_buraks) January 28, 2023
#20 Funny, there is a random as a separate ranking factor.
When you don’t understant why some of page is on top – it could be just random (to test behaivor factors). pic.twitter.com/TGtzFrmBOV
— Alex Buraks (@alex_buraks) January 28, 2023
#22 Backlinks from the top 100 best websites by PageRank impacts on rankings.
That’s not news. pic.twitter.com/ikxldWLJqy
— Alex Buraks (@alex_buraks) January 28, 2023
Wow, I just found the list with initial weights of Yandex ranking factors.
Do you need one more thread? 😁
P.S. final weights calculated by AI (matrixnet), but initial values are useful as well. pic.twitter.com/WeroYQy7Yu
— Alex Buraks (@alex_buraks) January 28, 2023
That said, I’ve been digging into the codebase myself to find things of interest.
I’m doing this live, so I don’t know how long it will take between tweets.
— Mic King (@iPullRank) January 27, 2023
A lot of the code related to Yandex Search lives in the Kernel, ExtSearch, Search, and Robot archives, but again I won’t be able to be comprehensive here until I’ve looked through everything.
— Mic King (@iPullRank) January 27, 2023
Some really interesting things in the web_meta_factors_info/factors_gen.in file as it relates to content features and factors.
For instance, some things that we’d expect like a minimum expectation of the proximity of words in a title to the words in the query. pic.twitter.com/YRsrCpVsqU
— Mic King (@iPullRank) January 27, 2023
Interestingly, there are a lot of scrapers in here Google News, Shopping, YouTube and even other Yandex services.
— Mic King (@iPullRank) January 27, 2023
Hmm…this might be the structure of how Yandex stores documents in their version of a doc server.
Still looking for an idea of how they structure their inverted index. pic.twitter.com/1lwTbOirnx
— Mic King (@iPullRank) January 27, 2023
Here’s a protobuf of link factors. pic.twitter.com/1RM6o1xzRg
— Mic King (@iPullRank) January 27, 2023
In the “link prioritizer code” they talk about decreasing the priority of links with the same text from the same host. In other words, don’t count the links from duplicate content. pic.twitter.com/dQTUnScCUy
— Mic King (@iPullRank) January 27, 2023
How did y’all come up with that number of ranking factors?
I see 481 factors just related to “Rapid Clicks” pic.twitter.com/sw5A3ia3Bk
— Mic King (@iPullRank) January 28, 2023
Similar to the Googs, Yandex has multiple ranking models to choose from.
In this select_ranking_models.cpp file, they talk about having different models for different languages and locations. pic.twitter.com/m210tpOUDb
— Mic King (@iPullRank) January 28, 2023
I’m gonna go watch TV, but I obviously have to add this to my book so I’m gonna add more over the next couple days
— Mic King (@iPullRank) January 28, 2023
Been digging into how this robot archive is structured.
It looks like the Zora directory is where a lot of interesting things are happening. There’s a limits.pb.txt file that stores the requests per second rate for the host and the IP address for 204k hosts. pic.twitter.com/0oulKm58dx
— Mic King (@iPullRank) January 28, 2023
Here’s where the Document and Query factors are collected and scored.
Looks like it goes to storage after this tho. pic.twitter.com/qJAiLfSrsU
— Mic King (@iPullRank) January 29, 2023
Ok, real quick, top 5 most positively and negatively weighted ranking factors and their coefficients in the initial weighting in Yandex’s document relevance calculation. Negatives first
#1 FI_ADV: -0.2509284637
This factor determines that there is advertising on the site.
— Mic King (@iPullRank) January 29, 2023
#3 FI_QURL_STAT_POWER: -0.1943768768
Factor is the number of URL impressions for the request
— Mic King (@iPullRank) January 29, 2023
#5 FI_GEO_CITY_URL_REGION_COUNTRY: -0.168645758
Factor is the geographical coincidence of the document and the country that the user searched from.
Ok, now for the top 5 positively weighted factors.
— Mic King (@iPullRank) January 29, 2023
Here is a starting point for link related factors.https://t.co/fwP8TxuOrM
— Christoph C. Cemper 🇺🇦 🧡 SEO (@cemper) January 30, 2023
Will this help you do SEO on Google? Probably not but hey, it is super interesting.
Ah, but once they find the optimal word count …
BOOM
— John Mueller is watching out for Google+ 🐀 (@JohnMu) January 29, 2023
Forum discussion at WebmasterWorld.
SEARCHENGINES
Microsoft Advertising Target Shoppers By Browsing Categories With Keyword Boosters

The Microsoft Advertising team announced its PromoteIQ launched a new way to target your ads, by targeting shoppers based on the categories they browse with the ability to also use keywords as a booster for campaign bids.
Nicole Farley explained on Search Engine Land, “this latest development in category-based targeting with keyword leveraging is supposed to maximize revenue and sales for both retailers and advertisers, while also delivering an exceptional experience for shoppers. Interested advertisers should test the new.”
Unlike traditional keyword targeting, “which requires advertisers to research and build an exhaustive list of keywords per campaign,” Microsoft said. With this new targeting shoppers by what they browse, “advertisers only need to test and retain a few high-performing keywords,” Microsoft added.
Microsoft said that in their tests, “campaigns that boost bids by keyword whilst targeting by category exhibit 320% higher click-through-rate (CTR) than the campaigns without boosting bids by keyword.” “Meanwhile, retailers saw benefits from this solution by achieving 8x higher revenue per thousand impressions (RPM),” Microsoft added.
Forum discussion at Twitter.
SEARCHENGINES
Google Search Console Shows If embedURL Page Uses indexifembedded

Google Search Console’s URL Inspection tool can now report if the embedURL page for a video uses the newish indexifembedded robots tag. The indexifembedded tells Google if Google is allowed to index the content of a page if it’s embedded in another page through iframes or similar HTML tags, in spite of a noindex rule.
This was spotted by Jon Henshaw and posted on LinkedIn. He explained that he requested that Google add to the URL Inspection Tool to show if “indexifembedded” is being used, “and through the stars and moons aligning and perhaps other miracles, they told me they added it today,” he said.
Here is his screenshot:
You can see in the “indexing allowed” section it says “No: ‘noindex’ detected in ‘robots’ meta tag, ‘indexifembdedded’ detected in ‘robots’ meta tag.”
Jon explained what this means:
If you use YouTube and make your video Unlisted, and then embed the video on your site, Google won’t index it. Why? Because they add a “noindex” directive to the page that serves the video on your page. Bummer!
However, if you use Vimeo, make your video Unlisted, and then embed it on your site, Google can still index it! Why? Because unlike YouTube, Vimeo adds “noindex” *and* a special directive created by Google called “indexifembedded.” That tells Google to index the video on any page that has an iframe embedded video.
Coupled with Vimeo automatically generating and inserting VideoObject Schema structured data for all embedded videos (including Unlisted videos), businesses now have the best chance they’ve ever had to get their pages to rank for videos instead of competing with their video hosting provider.
Jon knows this because well, he is the Senior Director, SEO at Vimeo, and Vimeo is a massive video site.
Forum discussion at LinkedIn.
SEARCHENGINES
Google Bard Won’t Link To Sources Too Often

As you know, we’ve been playing with Google Bard, it just started to roll out a couple of days ago. Early on, we were disappointed thus far with how limited it seemed and more so, how it rarely linked to sources and content creators. Now, Google got back to us on why this is the case.
Google added a few topics to the Bard FAQs, including “How and when does Bard cite sources in its responses?” Let me quote what it says:
Bard, like some other standalone LLM experiences, is intended to generate original content and not replicate existing content at length. We’ve designed our systems to limit the chances of this occurring, and we will continue to improve how these systems function. If Bard does directly quote at length from a webpage, it cites that page.
Bard was built to be a creative and helpful collaborator—it works well in creative tasks like helping you write an email or brainstorm ideas for a birthday party. We see it as a complementary experience to Google Search. That’s why we added the “Google It” button to Bard, so people can easily move from Bard to explore information from across the web.
Bard is an experiment, and we’ll use its launch as an opportunity to learn, iterate, and improve the experience as we get feedback from a range of stakeholders including people like you, publishers, creators, and more.
So since Bard “generates original content and not replicate existing content at length,” Google does not feel the need to cite sources? Bard will however cite sources and link to them if Bard “directly quotes at length from a webpage.”
Instead, Google wants you to go from Bard to Google with the “Google It”, “so people can easily move from Bard to explore information from across the web.” So click on links from Google Search, do not click on links from Bard, too often.
But things with Bard are early and may change, “Bard is an experiment, and we’ll use its launch as an opportunity to learn, iterate, and improve the experience as we get feedback from a range of stakeholders including people like you, publishers, creators, and more.”
Honestly, I am shocked, I did not think Google would launch Bard without citing and linking to sources as much as and as well as Bing Chat does. Even Gary Illyes from Google hinted publishers would be okay with it.
Let me show some examples (click on the images to enlarge).
Google Bard on “Who is Barry Schwartz?” – this is not me, this is the famous Barry Schwartz, by the way:
No citations with the default response from Google Bard.
But Bing, it gives 15 links to 15 different sources:
To be fair, if I work hard, and go to draft two, I get some citations from Google Bard:
I posted about this on Twitter and here is some of the response and reaction to Google’s FAQ statement on the citation bit:
What a joke. Absolutely brazen content theft.
— Don Caldwell 🦑 (@DonCald) March 22, 2023
Meanwhile, Google could care less: https://t.co/QQmZ1jA8WK
— Rutledge Daugette (@TheRealRutledge) March 22, 2023
A positive perspective: Bard is bound to say weird things and give inaccurate information. If that’s the case, you won’t necessarily want your brand up there co-signing certain conversations or answers.
— dog excited to meet pluto (@dogmeetpluto) March 22, 2023
That’s not great for site owners.
I’ve also seen a number of people share Bard responses that are questionable or outright wrong. Responses should be treated like discussing a topic with a questionably-informed internet rando, rather than a factual response if there’s no source.— Peggy K (@PeggyKTC) March 22, 2023
Uggh. No/Minimal citations is a big negative for me. (both as a creator, and potential user of Bard)
— ElizabethH (@ElizabethH15) March 22, 2023
IMHO it’s impossible to overstate what an enormous problem this is for publishers. If citations are not prevalent and prominent, publishers should be able to opt out of being used in training data without it having any affect on SEO. And every publisher should opt out.
— Michael Magnuson (@mdmagnuson) March 22, 2023
To be honest, the user in me prefers Bard’s UI/UX compared to Bing Chat.
The SEO in me hates the lack of sources, but the way Bing Chat has them incorporated just looks a bit naff.
— Chloe Ivy Rose (@chloeivyroseseo) March 22, 2023
That’s a massive miscalculation on their side, it’s the wrong result that they will need to address
— @[email protected] (@davidiwanow) March 22, 2023
I mean this section is *interesting*…
“For now, Google Bard likely won’t be sending a lot of traffic to the web or websites.”And likely a challenge for anyone trying to do research.
— Crystal Carter (she/her) (@CrystalontheWeb) March 22, 2023
I actually think #Bard could work very well for local if Google was willing to include URLs, use more its local knowledge graph and offer Maps links. pic.twitter.com/YZLB1DrY3u
— Greg Sterling 🇺🇦 (@gsterling) March 22, 2023
The same thought I had when started playing with it https://t.co/RllWsaQ9KQ
— Gianluca Fiorelli (@gfiorelli1) March 22, 2023
One shimmer of hope is that if and when Bard is integrated some how into Google Search, those integrations you will see more prominent links to content creators. Via the WSJ, “Sissie Hsiao, a vice president in charge of Google Assistant, said the company “is deeply committed in supporting a healthy and vibrant content ecosystem” and “will be welcoming conversations with stakeholders.” She said when AI tools are integrated into search the company will give priority to sending valuable traffic to content creators. “
Good to hear from Google’s Sissie Hsiao about Bard for Search + Citations -> “She said when AI tools are integrated into search, the company will give priority to sending valuable traffic to content creators.” https://t.co/K3U82vtAu6 pic.twitter.com/xWbRl7SLRs
— Glenn Gabe (@glenngabe) March 22, 2023
So we will see. Until now, prepare to be disappointed with any little traffic you might see from Google Bard.
Forum discussion at Twitter.
-
WORDPRESS2 days ago
Internal Linking for SEO: The Ultimate Guide of Best Practices
-
AMAZON4 days ago
The Top 10 Benefits of Amazon AWS Lightsail: Why It’s a Great Choice for Businesses
-
WORDPRESS6 days ago
The best web hosting solutions for your personal webpage or business site
-
WORDPRESS6 days ago
ActivityPub for WordPress Joins the Automattic Family – WordPress.com News
-
SEARCHENGINES1 day ago
Google Search Status Dashboard Adds Google Ranking Updates
-
PPC4 days ago
PPC Campaign Testing: The Dos & Don’ts to Turn Risks into Rewards
-
MARKETING5 days ago
How to calculate customer lifetime value and maximize it for your business
-
SEARCHENGINES4 days ago
Google Generative AI Models Prohibited Use Policies