SEARCHENGINES
Yandex Search Ranking Factors Leaked & Exposed
Yandex had a boatload of its source code across all its technology allegedly leaked by a disgruntled employee and part of that was the source code for Russia’s largest search engine – Yandex. As you can imagine, SEOs and others are diving in and seeing what they can learn from the source code.
I personally did not download the source code, so I did not go through it myself but I wanted to share what people did find via Twitter from their investigations of the source code.
Here’s the alpha version of an explorer tool for the leaked #Yandex Search code.
It lets you browse through the ranking factors, view by tags, etc, and start to find connections.
Easy to add new features if there’s anything you want to see!https://t.co/AjbYnrDl9P pic.twitter.com/pQ4scOkP6w
— Rob Ousbey : @[email protected] (@RobOusbey) January 28, 2023
I downloaded the code, analyzed it and there is a lot of useful information for Google SEO as well. pic.twitter.com/RWrgnnlpj6
— Alex Buraks (@alex_buraks) January 27, 2023
Theoretically, what is the difference between algorithms used in Google and in Yandex?
They are quite similar:
– there is RankBrain analogue – MatrixNet;
– they are using PageRank (almost the same as in Google);
– a lot of text algorithms are the same. pic.twitter.com/Djjl8Bmjwn— Alex Buraks (@alex_buraks) January 27, 2023
According to Statcounter Yandex is close to Yahoo and Bing by market share: pic.twitter.com/5GKIvKIvAo
— Alex Buraks (@alex_buraks) January 27, 2023
Main insights after analysing this list:
#1 Age of links is a ranking factor. pic.twitter.com/U47uWvEq9w
— Alex Buraks (@alex_buraks) January 27, 2023
#3 Numbers in URLs is bad for rankings pic.twitter.com/ECgwGeGUfb
— Alex Buraks (@alex_buraks) January 27, 2023
#5 Hard pessimization equal PR=0 pic.twitter.com/RRbhuJyZr1
— Alex Buraks (@alex_buraks) January 27, 2023
#7 Fun fact – there is a separate ranking factor for uplifting Wikipedia pic.twitter.com/799F8KFpkE
— Alex Buraks (@alex_buraks) January 27, 2023
#9 Document age and last update both are ranking factors. pic.twitter.com/ay1GTMVEtJ
— Alex Buraks (@alex_buraks) January 27, 2023
Right now I checked ~40% of the list, there are a lot more (about text relevancy, behaivor factors, page rank, internal links,etc).
Will continue this thread after some time.
— Alex Buraks (@alex_buraks) January 27, 2023
The first thread got a lot of impressions (500k views for the moment, thanks for you retweets and likes!), so I decided to finalize.https://t.co/UQiQsnpWd2
— Alex Buraks (@alex_buraks) January 28, 2023
#2 Additionnaly: ranking factor for orphan pages.
You can easy find them via Screming Frog or other crawlers. pic.twitter.com/zIPwAelpD0
— Alex Buraks (@alex_buraks) January 28, 2023
#4 Number of search queries of your site/url is a ranking factor.
Obviously more = better. pic.twitter.com/xXQ6FMDghP
— Alex Buraks (@alex_buraks) January 28, 2023
#6 If your url whould be the last for search session (user will find what he needs) – it whould impact rankings.
There are strict factors for this and predictible factors as well. pic.twitter.com/Zx3sBZORCs
— Alex Buraks (@alex_buraks) January 28, 2023
#8 Special ranking factors for short videos (tiktok, shorts, reels) pic.twitter.com/oKPzL09MID
— Alex Buraks (@alex_buraks) January 28, 2023
#10 Keywords in URL is a ranking factors.
As we can see from the description – the optimal would be include up to 3 words from the search query. pic.twitter.com/Q1euKWSiST
— Alex Buraks (@alex_buraks) January 28, 2023
#14 One more ranking factor for content quality – broken embedded video on the page.
Embed videos – good for rankings.
Broken embed videos – bad. pic.twitter.com/2SUys65PHp— Alex Buraks (@alex_buraks) January 28, 2023
#16 If you backlinks anchors contain all words from the keywords – it’s good for SEO.
If it is in a one link – it’s more beneficial. Especially if the order of words is the same. pic.twitter.com/WrbESJ8Da5
— Alex Buraks (@alex_buraks) January 28, 2023
#18 The quality rank of texts on the domain is a ranking factor.
Pages with low quality content affect the entire domain. pic.twitter.com/MJUCTVB9CH
— Alex Buraks (@alex_buraks) January 28, 2023
#20 Funny, there is a random as a separate ranking factor.
When you don’t understant why some of page is on top – it could be just random (to test behaivor factors). pic.twitter.com/TGtzFrmBOV
— Alex Buraks (@alex_buraks) January 28, 2023
#22 Backlinks from the top 100 best websites by PageRank impacts on rankings.
That’s not news. pic.twitter.com/ikxldWLJqy
— Alex Buraks (@alex_buraks) January 28, 2023
Wow, I just found the list with initial weights of Yandex ranking factors.
Do you need one more thread? 😁
P.S. final weights calculated by AI (matrixnet), but initial values are useful as well. pic.twitter.com/WeroYQy7Yu
— Alex Buraks (@alex_buraks) January 28, 2023
That said, I’ve been digging into the codebase myself to find things of interest.
I’m doing this live, so I don’t know how long it will take between tweets.
— Mic King (@iPullRank) January 27, 2023
A lot of the code related to Yandex Search lives in the Kernel, ExtSearch, Search, and Robot archives, but again I won’t be able to be comprehensive here until I’ve looked through everything.
— Mic King (@iPullRank) January 27, 2023
Some really interesting things in the web_meta_factors_info/factors_gen.in file as it relates to content features and factors.
For instance, some things that we’d expect like a minimum expectation of the proximity of words in a title to the words in the query. pic.twitter.com/YRsrCpVsqU
— Mic King (@iPullRank) January 27, 2023
Interestingly, there are a lot of scrapers in here Google News, Shopping, YouTube and even other Yandex services.
— Mic King (@iPullRank) January 27, 2023
Hmm…this might be the structure of how Yandex stores documents in their version of a doc server.
Still looking for an idea of how they structure their inverted index. pic.twitter.com/1lwTbOirnx
— Mic King (@iPullRank) January 27, 2023
Here’s a protobuf of link factors. pic.twitter.com/1RM6o1xzRg
— Mic King (@iPullRank) January 27, 2023
In the “link prioritizer code” they talk about decreasing the priority of links with the same text from the same host. In other words, don’t count the links from duplicate content. pic.twitter.com/dQTUnScCUy
— Mic King (@iPullRank) January 27, 2023
How did y’all come up with that number of ranking factors?
I see 481 factors just related to “Rapid Clicks” pic.twitter.com/sw5A3ia3Bk
— Mic King (@iPullRank) January 28, 2023
Similar to the Googs, Yandex has multiple ranking models to choose from.
In this select_ranking_models.cpp file, they talk about having different models for different languages and locations. pic.twitter.com/m210tpOUDb
— Mic King (@iPullRank) January 28, 2023
I’m gonna go watch TV, but I obviously have to add this to my book so I’m gonna add more over the next couple days
— Mic King (@iPullRank) January 28, 2023
Been digging into how this robot archive is structured.
It looks like the Zora directory is where a lot of interesting things are happening. There’s a limits.pb.txt file that stores the requests per second rate for the host and the IP address for 204k hosts. pic.twitter.com/0oulKm58dx
— Mic King (@iPullRank) January 28, 2023
Here’s where the Document and Query factors are collected and scored.
Looks like it goes to storage after this tho. pic.twitter.com/qJAiLfSrsU
— Mic King (@iPullRank) January 29, 2023
Ok, real quick, top 5 most positively and negatively weighted ranking factors and their coefficients in the initial weighting in Yandex’s document relevance calculation. Negatives first
#1 FI_ADV: -0.2509284637
This factor determines that there is advertising on the site.
— Mic King (@iPullRank) January 29, 2023
#3 FI_QURL_STAT_POWER: -0.1943768768
Factor is the number of URL impressions for the request
— Mic King (@iPullRank) January 29, 2023
#5 FI_GEO_CITY_URL_REGION_COUNTRY: -0.168645758
Factor is the geographical coincidence of the document and the country that the user searched from.
Ok, now for the top 5 positively weighted factors.
— Mic King (@iPullRank) January 29, 2023
Here is a starting point for link related factors.https://t.co/fwP8TxuOrM
— Christoph C. Cemper 🇺🇦 🧡 SEO (@cemper) January 30, 2023
Will this help you do SEO on Google? Probably not but hey, it is super interesting.
Ah, but once they find the optimal word count …
BOOM
— John Mueller is watching out for Google+ 🐀 (@JohnMu) January 29, 2023
Forum discussion at WebmasterWorld.
SEARCHENGINES
Google Again Says Ignore Link Spam Especially To 404 Pages
I am not sure how many times Google has said that you do not need to disavow spammy links, that you can ignore link spam attacks and that links pointing to pages that 404/410 are links that do not count – but John Mueller from Google said it again.
In a thread on X, John Mueller from Google wrote, “if the links are going to URLs that 404 on your site, they’re already dropped.” “They do nothing,” he added, “If there’s no indexable destination URL, there’s no link.”
John then added, “I’d generally ignore link-spam, and definitely ignore link-spam to 404s.”
Asking if it would hurt to disavow, after responding with the messages above, John wrote:
It will do absolutely nothing. I would take the time to rework a holistic & forward-looking strategy for the site overall instead of working on incremental tweaks (other tweaks might do something, but you probably need real change, not tweaks).
Earlier this year we had tons of SEOs notice spammy links to 404 error pages, John said ignore them. In 2021, Google said links to 404 pages do not count, Google also said that in 2012 and many other times.
Plus, outside of links to 404 pages, Google has said to ignore spammy links, time and time again – even the toxic links – ignore them. The messaging around this changed in 2016 when Penguin 4.0 was released and Google began devaluing links over demoting them.
Here are those new posts in context:
I’d say add both. Lol
— Jeremy Rivera (@JeremyRiveraSEO) April 11, 2024
Sure. But also, save yourself the work completely :-).
— John 🧀 … 🧀 (@JohnMu) April 11, 2024
Re-reading your initial post – if the links are going to URLs that 404 on your site, they’re already dropped. They do nothing. If there’s no indexable destination URL, there’s no link. I’d generally ignore link-spam, and definitely ignore link-spam to 404s.
— John 🧀 … 🧀 (@JohnMu) April 11, 2024
… but still… is this a dumb idea?
— Rebekah Edwards (@rebekah_creates) April 11, 2024
It will do absolutely nothing. I would take the time to rework a holistic & forward-looking strategy for the site overall instead of working on incremental tweaks (other tweaks might do something, but you probably need real change, not tweaks).
— John 🧀 … 🧀 (@JohnMu) April 11, 2024
And in general, Google says it ignores spammy links, so you should too (not new) but this post from John Mueller is:
I would just ignore them, Google ignores them too. Sometimes they’re just more visible in tools, but that doesn’t mean they’re a problem.
— John 🧀 … 🧀 (@JohnMu) April 18, 2024
And then also on Mastodon wrote about a similar situation, “Google has 2 decades of practice of ignoring spammy links. There’s no need to do anything for those links.”
Forum discussion at X.
Note: This was pre-written and scheduled to be posted today, I am currently offline for Passover.
SEARCHENGINES
Google Needs Very Few Links To Rank Pages; Links Are Less Important
Gary Illyes from Google spoke at the SERP Conf on Friday and he said what he said numerous times before, that Google values links a lot less today than it did in the past. He added that Google Search “needs very few links to rank pages.”
Gary reportedly said, “We need very few links to rank pages… Over the years we’ve made links less important.”
I am quoting Patrick Stox who is quoting what he heard Gary say on stage at the event. Here is Patrick’s post where Gary did a rare reply:
I shouldn’t have said that… I definitely shouldn’t have said that
— Gary 鯨理/경리 Illyes (so official, trust me) (@methode) April 19, 2024
Gary said this a year ago, also in 2022 and other times as well. We previously covered that Google said links would likely become even less important in the future. And even Matt Cutts, the former Googler, said something similar about eight years ago and the truth is, links are weighted a lot less than it was eight years ago and that trend continues. A couple of years ago, Google said links are not the most important Google search ranking factor.
Of course, many SEOs think Google lies about this.
Judith Lewis interviewed Gary Illyes at the SERP Conf this past Friday.
SEARCHENGINES
Google Core Update Flux, AdSense Ad Intent, California Link Tax & More
For the original iTunes version, click here.
The Google March 2024 core update is still rolling out, almost 6 weeks now, and we saw two shifts of ranking volatility, both mid-week and the weekend before. Google’s Danny Sullivan went on the defensive on search quality and forum listings in the search results. Google’s site reputation abuse spam policy will be fought both algorithmically and through manual actions. Google responded to The Verge mocking its search rankings over best printer. Google Search Console has a new unused ownership tokens page. Some sites may see the Google Indexing API work for a limited time on unsupported content types. And having two sites won’t result in your sites search ranking decline. BingBot now fully supports Brotli compression and will test Zstd compression soon. Google Search is testing thumbs-up and down buttons for product carousels. Google is testing new sitelinks designs. Google Notes on Search may not go away in May. Google Maps no longer supports draft reviews. Google Maps released a bunch of new maps, directions, travel and EV features. Google Ads Demand Gen campaigns now support AI image generation. Google Ads is testing a similar product carousel. Google Ads reminds advertisers that ad customizers are going away. Google Ads is testing a new horizontal ad card format. Google AdSense has these new ad intent formats. Google AdSense publishers are reporting lower RPM earnings since mid-February. Google threatens to drop links to California news publishers amongst link tax bill. That was the search news this week at the Search Engine Roundtable.
SPONSOR: Wix Studio lets digital marketing agencies get all of the benefits Wix has to offer from best-in-class SEO capabilities to 99% up-time with the added value of an extensive client and team management system baked right into the platform.
Make sure to subscribe to our video feed or subscribe directly on iTunes, Apple Podcasts, Spotify, Google Podcasts or your favorite podcast player to be notified of these updates and download the video in the background. Here is the YouTube version of the feed:
Search Topics of Discussion:
Please do subscribe on YouTube or subscribe via iTunes or on your favorite RSS reader. Don’t forget to comment below with the right answer and good luck!
-
PPC4 days ago
19 Best SEO Tools in 2024 (For Every Use Case)
-
MARKETING7 days ago
Will Google Buy HubSpot? | Content Marketing Institute
-
SEARCHENGINES7 days ago
Daily Search Forum Recap: April 16, 2024
-
SEO7 days ago
Google Clarifies Vacation Rental Structured Data
-
MARKETING6 days ago
Streamlining Processes for Increased Efficiency and Results
-
SEARCHENGINES6 days ago
Daily Search Forum Recap: April 17, 2024
-
SEO6 days ago
An In-Depth Guide And Best Practices For Mobile SEO
-
PPC6 days ago
97 Marvelous May Content Ideas for Blog Posts, Videos, & More
You must be logged in to post a comment Login