Yandex is the search engine with the majority of market share in Russia and the fourth-largest search engine in the world.
On January 27, 2023, it suffered what is arguably one of the largest data leaks that a modern tech company has endured in many years – but is the second leak in less than a decade.
In 2015, a former Yandex employee attempted to sell Yandex’s search engine code on the black market for around $30,000.
The initial leak in January this year revealed 1,922 ranking factors, of which more than 64% were listed as unused or deprecated (superseded and best avoided).
This leak was just the file labeled kernel, but as the SEO community and I delved deeper, more files were found that combined contain approximately 17,800 ranking factors.
When it comes to practicing SEO for Yandex, the guide I wrote two years ago, for the most part, still applies.
Yandex, like Google, has always been public with its algorithm updates and changes, and in recent years, how it has adopted machine learning.
Notable updates from the past two-three years include:
A fresh rollout and assumed update of the PF filter.
On a personal note, this data leak is like a second Christmas.
Since January 2020, I’ve run an SEO news website as a hobby dedicated to covering Yandex SEO and search news in Russia with 600+ articles, so this is probably the peak event of the hobby site.
I’ve also spoken twice at the Optimization conference – the largest SEO conference in Russia.
This is also a good test to see how closely Yandex’s public statements match the codebase secrets.
In 2019, working with Yandex’s PR team, I was able to interview engineers in their Search team and ask a number of questions sourced from the wider Western SEO community.
Whilst Yandex is primarily known for its presence in Russia, the search engine also has a presence in Turkey, Kazakhstan, and Georgia.
The data leak was believed to be politically motivated and the actions of a rogue employee, and contains a number of code fragments from Yandex’s monolithic repository, Arcadia.
Within the 44GB of leaked data, there’s information relating to a number of Yandex products including Search, Maps, Mail, Metrika, Disc, and Cloud.
the contents of the archive (leaked code base) correspond to the outdated version of the repository – it differs from the current version used by our services
And:
It is important to note that the published code fragments also contain test algorithms that were used only within Yandex to verify the correct operation of the services.
So, how much of this code base is actively used is questionable.
Yandex has also revealed that during its investigation and audit, it found a number of errors that violate its own internal principles, so it is likely that portions of this leaked code (that are in current use) may be changing in the near future.
Factor Classification
Yandex classifies its ranking factors into three categories.
This has been outlined in Yandex’s public documentation for some time, but I feel is worth including here, as it better helps us understand the ranking factor leak.
Static factors – Factors that are related directly to the website (e.g. inbound backlinks, inbound internal links, headers, and ads ratio).
Dynamic factors – Factors that are related to both the website and the search query (e.g. text relevance, keyword inclusions, TF*IDF).
User search-related factors – Factors relating to the user query (e.g. where is the user located, query language, and intent modifiers).
The ranking factors in the document are tagged to match the corresponding category, with TG_STATIC and TG_DYNAMIC, and then TG_QUERY_ONLY, TG_QUERY, TG_USER_SEARCH, and TG_USER_SEARCH_ONLY.
Yandex Leak Learnings So Far
From the data thus far, below are some of the affirmations and learnings we’ve been able to make.
There is so much data in this leak, it is very likely that we will be finding new things and making new connections in the next few weeks.
Below, I’ve expanded on some other affirmations and learnings from the leak.
Where possible, I’ve also tied these leaked ranking factors to the algorithm updates and announcements that relate to them, or where we were told about them being impactful.
MatrixNet
MatrixNet is mentioned in a few of the ranking factors and was announced in 2009, and then superseded in 2017 by Catboost, which was rolled out across the Yandex product sphere.
This further adds validity to comments directly from Yandex, and one of the factor authors DenPlusPlus (Den Raskovalov), that this is, in fact, an outdated code repository.
MatrixNet was originally introduced as a new, core algorithm that took into consideration thousands of ranking factors and assigned weights based on the user location, the actual search query, and perceived search intent.
It is typically seen as an early version of Google’s RankBrain, when they are indeed two very different systems. MatrixNet was launched six years before RankBrain was announced.
MatrixNet has also been built upon, which isn’t surprising, given it is now 14 years old.
In 2016, Yandex introduced the Palekh algorithm that used deep neural networks to better match documents (webpages) and queries, even if they didn’t contain the right “levels” of common keywords, but satisfied the user intents.
Palekh was capable of processing 150 pages at a time, and in 2017 was updated with the Korolyov update, which took into account more depth of page content, and could work off 200,000 pages at once.
URL & Page-Level Factors
From the leak, we have learned that Yandex takes into consideration URL construction, specifically:
The presence of numbers in the URL.
The number of trailing slashes in the URL (and if they are excessive).
The number of capital letters in the URL is a factor.
Screenshot from author, January 2023
The age of a page (document age) and the last updated date are also important, and this makes sense.
As well as document age and last update, a number of factors in the data relate to freshness – particularly for news-related queries.
Yandex formerly used timestamps, specifically not for ranking purposes but “reordering” purposes, but this is now classified as unused.
Also in the deprecated column are the use of keywords in the URL. Yandex has previously measured that three keywords from the search query in the URL would be an “optimal” result.
Internal Links & Crawl Depth
Whilst Google has gone on the record to say that for its purposes, crawl depth isn’t explicitly a ranking factor, Yandex appears to have an active piece of code that dictates that URLs that are reachable from the homepage have a “higher” level of importance.
Screenshot from author, January 2023
This mirrors John Mueller’s 2018 statement that Google gives “a little more weight” to pages found more than one click from the homepage.
The ranking factors also highlight a specific token weighting for webpages that are “orphans” within the website linking structure.
Clicks & CTR
In 2011, Yandex released a blog post talking about how the search engine uses clicks as part of its rankings and also addresses the desires of the SEO pros to manipulate the metric for ranking gain.
Specific click factors in the leak look at things like:
The ratio of the number of clicks on the URL, relative to all clicks on the search.
The same as above, but broken down by region.
How often do users click on the URL for the search?
Manipulating Clicks
Manipulating user behavior, specifically “click-jacking”, is a known tactic within Yandex.
Yandex has a filter, known as the PF filter, that actively seeks out and penalizes websites that engage in this activity using scripts that monitor IP similarities and then the “user actions” of those clicks – and the impact can be significant.
The below screenshot shows the impact on organic sessions (сессии) after being penalized for imitating user clicks.
Image from Russian Search News, January 2023
User Behavior
The user behavior takeaways from the leak are some of the more interesting findings.
User behavior manipulation is a common SEO violation that Yandex has been combating for years. At the 2020 Optimization conference, then Head of Yandex Webmaster Tools Mikhail Slevinsky said the company is making good progress in detecting and penalizing this type of behavior.
Yandex penalizes user behavior manipulation with the same PF filter used to combat CTR manipulation.
102 of the ranking factors contain the tag TG_USERFEAT_SEARCH_DWELL_TIME, and reference the device, user duration, and average page dwell time.
All but 39 of these factors are deprecated.
Screenshot from author, January 2023
Bing first used the term Dwell time in a 2011 blog, and in recent years Google has made it clear that it doesn’t use dwell time (or similar user interaction signals) as ranking factors.
YMYL
YMYL (Your Money, Your Life) is a concept well-known within Google and is not a new concept to Yandex.
Within the data leak, there are specific ranking factors for medical, legal, and financial content that exist – but this was notably revealed in 2019 at the Yandex Webmaster conference when it announced the Proxima Search Quality Metric.
Metrika Data Usage
Six of the ranking factors relate to the usage of Metrika data for the purposes of ranking. However, one of them is tagged as deprecated:
The number of similar visitors from the YandexBar (YaBar/Ябар).
The average time spent on URLs from those same similar visitors.
The “core audience” of pages on which there is a Metrika counter [deprecated].
The average time a user spends on a host when accessed externally (from another non-search site) from a specific URL.
Average ‘depth’ (number of hits within the host) of a user’s stay on the host when accessed externally (from another non-search site) from a particular URL.
Whether or not the domain has Metrika installed.
In Metrika, user data is handled differently.
Unlike Google Analytics, there are a number of reports focused on user “loyalty” combining site engagement metrics with return frequency, duration between visits, and source of the visit.
For example, I can see a report in one click to see a breakdown of individual site visitors:
Screenshot from Metrika, January 2023
Metrika also comes “out of the box” with heatmap tools and user session recording, and in recent years the Metrika team has made good progress in being able to identify and filter bot traffic.
With Google Analytics, there is an argument that Google doesn’t use UA/GA4 data for ranking purposes because of how easy it is to modify or break the tracking code – but with Metrika counters, they are a lot more linear, and a lot of the reports are unchangeable in terms of how the data is collected.
Impact Of Traffic On Rankings
Following on from looking at Metrika data as a ranking factor; These factors effectively confirm that direct traffic and paid traffic (buying ads via Yandex Direct) can impact organic search performance:
Share of direct visits among all incoming traffic.
Green traffic share (aka direct visits) – Desktop.
Green traffic share (aka direct visits) – Mobile.
Search traffic – transitions from search engines to the site.
Share of visits to the site not by links (set by hand or from bookmarks).
The number of unique visitors.
Share of traffic from search engines.
News Factors
There are a number of factors relating to “News”, including two that mention Yandex.News directly.
Yandex.News was an equivalent of Google News, but was sold to the Russian social network VKontakte in August 2022, along with another Yandex product “Zen”.
So, it’s not clear if these factors related to a product no longer owned or operated by Yandex, or to how news websites are ranked in “regular” search.
Backlink Importance
Yandex has similar algorithms to combat link manipulation as Google – and has since the Nepot filter in 2005.
From reviewing the backlink ranking factors and some of the specifics in the descriptions, we can assume that the best practices for building links for Yandex SEO would be to:
Build links with a more natural frequency and varying amounts.
Build links with branded anchor texts as well as use commercial keywords.
Om buying links, avoid buying links from websites that have mixed topics.
Below is a list of link-related factors that can be considered affirmations of best practices:
The age of the backlink is a factor.
Link relevance based on topics.
Backlinks built from homepages carry more weight than internal pages.
Links from the top 100 websites by PageRank (PR) can impact rankings.
Link relevance based on the quality of each link.
Link relevance, taking into account the quality of each link, and the topic of each link.
Link relevance, taking into account the non-commercial nature of each link.
Percentage of inbound links with query words.
Percentage of query words in links (up to a synonym).
The links contain all the words of the query (up to a synonym).
However, there are some link-related factors that are additional considerations when planning, monitoring, and analyzing backlinks:
The ratio of “good” versus “bad” backlinks to a website.
The frequency of links to the site.
The number of incoming SEO trash links between hosts.
The data leak also revealed that the link spam calculator has around 80 active factors that are taken into consideration, with a number of deprecated factors.
This creates the question as to how well Yandex is able to recognize negative SEO attacks, given it looks at the ratio of good versus bad links, and how it determines what a bad link is.
A negative SEO attack is also likely to be a short burst (high frequency) link event in which a site will unwittingly gain a high number of poor quality, non-topical, and potentially over-optimized links.
Yandex uses machine learning models to identify Private Blog Networks (PBNs) and paid links, and it makes the same assumption between link velocity and the time period they are acquired.
Typically, paid-for links are generated over a longer period of time, and these patterns (including link origin site analysis) are what the Minusinsk update (2015) was introduced to combat.
Yandex Penalties
There are two ranking factors, both deprecated, named SpamKarma and Pessimization.
Pessimization refers to reducing PageRank to zero and aligns with the expectations of severe Yandex penalties.
SpamKarma also aligns with assumptions made around Yandex penalizing hosts and individuals, as well as individual domains.
Onpage-annonsering
Det finns ett antal faktorer som har att göra med annonsering på sidan, några av dem har föråldrats (som exemplet på skärmdumpen nedan).
Screenshot from author, January 2023
Det är inte känt från beskrivningen exakt vad tankeprocessen med denna faktor var, men det kan antas att ett högt förhållande mellan annonser och synlig skärm var en negativ faktor – ungefär som hur Google tar illa upp om annonser fördunklar sidans huvudinnehåll, eller är påträngande.
Genom att koppla tillbaka detta till kända Yandex-mekanismer tog Proxima-uppdateringen också hänsyn till förhållandet mellan användbart och reklaminnehåll på en sida.
På grund av denna kamp för talang kan vi dra slutsatsen att några av dessa byggmästare och ingenjörer kommer att ha byggt saker på ett liknande sätt (men inte direkta kopior), och tillämpat lärdomar från tidigare iterationer av sina byggen med sina nya arbetsgivare.
Vad ryska SEO-proffs säger om läckan
Precis som i västvärlden har SEO-proffs i Ryssland sagt sitt om läckan i de olika Runet-forumen.
Reaktionen i dessa forum har varit annorlunda än SEO Twitter och Mastodon, med fokus mer på Yandex filter och andra Yandex-produkter som är optimerade som en del av bredare Yandex-optimeringskampanjer.
Det är också värt att notera att ett antal slutsatser och fynd från data matchar vad den västerländska SEO-världen också finner.
Vanliga teman i de ryska sökforumen:
Webbansvariga ber om insikter i de senaste filtren, såsom Mimicry och det uppdaterade PF-filtret.
Åldern och relevansen av några av faktorerna, på grund av att författarnamn inte längre finns på Yandex, och omnämnanden av sedan länge pensionerade Yandex-produkter.
De viktigaste intressanta lärdomarna är kring användningen av Metrika-dataoch information om Crawler & Indexer.
Ett antal faktorer beskriver användningen av DSSM, som i teorin ersattes av lanseringen av Palekh 2016. Detta var en sökalgoritm som använder maskininlärning, tillkännagav av Yandex 2016.
En debatt kring ICS-poäng i Yandex, och om Yandex kan ge mer trafik till en webbplats och påverka dess egna faktorer genom att göra det.
De läckta faktorerna, särskilt kring hur Yandex utvärderar webbplatsens kvalitet, har också kommit under granskning.
Det finns en långvarig känsla i det ryska SEO-communityt att Yandex ofta favoriserar sina egna produkter och tjänster i sökresultat framför andra webbplatser, och webbansvariga ställer frågor som:
Varför bryr den sig om att göra allt det här besväret, när det ändå bara spikar sina tjänster överst på sidan?
I löst översatta dokument hänvisas dessa till som Sorcerers eller Yandex Sorcerers. I Google skulle vi kalla dessa sökmotorresultatsidor (SERP) funktioner – som Google Hotels, etc.
I oktober 2022 krävde Kassir (en rysk biljettportal) 328 miljoner ₽ kompensation från Yandex på grund av förlorade intäkter, orsakade av de "diskriminerande villkoren" där Yandex Sorcerers tog kundbasen bort från det privata företaget.
Detta är på baksidan av en 2020 grupptalan där flera företag väckte ett ärende med Federal Antimonopoly Service (FAS) för konkurrensbegränsande marknadsföring av sina egna tjänster.
Microsoft Advertising details several important updates and expansions in its June product roundup.
The new tools and features aim to enhance website performance analytics, improve cross-device conversion tracking, expand into new global markets, and integrate more seamlessly with other platforms.
Introducing Universal Event Tracking Insights
This month’s standout news is the introduction of Universal Event Tracking (UET) insights, a feature that gives advertisers a deeper understanding of their website’s performance.
The new feature requires no additional coding and will enhance the capabilities of existing UET tags.
“We’re introducing UET insights, a valuable new feature that we’ll add to your existing UET tags with no additional coding required from you. You’ll get a deeper understanding of your website’s performance and also enable Microsoft Advertising to optimize your ad performance more effectively via improved targeting, fraud detection, and reduced conversion loss.”
The new insights tool will roll out automatically starting July 3.
Cross-Device Conversion Attribution Update
Microsoft Advertising is introducing a cross-device attribution model later this month.
This update will enable advertisers to track and connect customers’ conversion journeys across multiple devices and sessions.
Microsoft explains the new feature in a blog article: “For example, if a user clicks on an ad using their laptop but converts on their phone, we’ll now credit that conversion to the initial ad click on the laptop.”
While the update doesn’t introduce new features or settings, advertisers may notice a slight increase in the number of conversions due to improved accuracy.
Expanding to New Markets
In line with its expansion push throughout 2022, Microsoft Advertising announces it’s expanding its advertising reach to 23 new markets.
The new additions include diverse locations ranging from Antigua and Barbuda to Wallis and Futuna.
This expansion allows advertisers to reach their audiences in more parts of the world.
Seamless Integration With Pinterest & Dynamic Remarketing
Microsoft Advertising is releasing Pinterest Import in all markets via the Microsoft Audience Network (MSAN), allowing advertisers to import campaigns from Pinterest Ads.
Further, Dynamic remarketing on MSAN for Autos, Events & Travel is now available in the US, Canada, and the UK.
The remarketing tool enables advertisers to use their feeds to create rich ad experiences on the Microsoft Audience Network and match their target audience with items in their feed where they’ve shown interest.
Sammanfattningsvis
Key takeaways from the June product roundup include the automatic rollout of UET Insights starting July 3, introducing a new cross-device attribution model, expanding into 23 new global markets, and enhanced integration with Pinterest via the Microsoft Audience Network.
These developments collectively offer advertisers increased insight into campaign performance, improved accuracy in conversion tracking, and more opportunities to reach audiences worldwide.
Apple’s recently announced Safari 17 brings several key updates that promise to enhance user experience and web page loading times.
Unveiled at the annual Worldwide Developers Conference (WWDC23), two new features of Safari 17 worth paying attention to are JPEG XL support and expanded capabilities of font-size-adjust.
As Safari continues to evolve, these updates highlight the ever-changing landscape of webbutveckling and the importance of adaptability.
JPEG XL: A Game Changer For Page Speed Optimization
One of the most noteworthy features of Safari 17 is its support for JPEG XL, a new image format that balances image quality and file size.
JPEG XL allows for the recompression of existing JPEG files without any data loss while significantly reducing their size—by up to 60%.
Page loading speed is a crucial factor that search engines consider when ranking websites. With JPEG XL, publishers can drastically reduce the file size of images on their sites, potentially leading to faster page loads.
Additionally, the support for progressive loading in JPEG XL means users can start viewing images before the entire file is downloaded, improving the user experience on slower connections.
This benefits websites targeting regions with slower internet speeds, enhancing user experience and potentially reducing bounce rates.
Font Size Adjust: Improving User Experience & Consistency
Safari 17 expands the capabilities of font-size-adjust, a CSS property that ensures the visual size of different fonts remains consistent across all possible combinations of fallback fonts.
By allowing developers to pull the sizing metric from the main font and apply it to all fonts, the from-font value can help websites maintain a consistent visual aesthetic, which is critical for user experience.
Conversely, the two-value syntax provides more flexibility in adjusting different font metrics, supporting a broader range of languages and design choices.
Websites with consistent and clear text display, irrespective of the font in use, will likely provide a better user experience. A better experience could lead to longer visits and higher engagement.
Reimagining SEO Strategies With Safari 17
Given these developments, SEO professionals may need to adjust their strategies to leverage the capabilities of Safari 17 fully.
This could involve:
Image Optimization: With support for JPEG XL, SEO professionals might need to consider reformatting their website images to this new format.
Website Design: The expanded capabilities of font-size-adjust could require rethinking design strategies. Consistent font sizes across different languages and devices can improve CLS, one of Google’s core web vitals.
Performance Tracking: SEO professionals will need to closely monitor the impact of these changes on website performance metrics once the new version of Safari rolls out.
Google Search Advocate, John Mueller, has turned to Twitter seeking advice on what to consider when hiring an SEO consultant.
He posed the question on behalf of someone running an animal park, indicating the need for a local SEO consultant to enhance their online presence.
“Does anyone have recommendations in terms of people/companies & things to watch out for when picking one?” Mueller asked in his tweeta, “How do you separate the serious from the less-serious folks?”
Here’s a look into the ensuing conversation, including a range of insights into what you should look for when hiring SEO professionals.
Cost Considerations In SEO Consultancy
One of the replies to Mueller’s tweet highlighted the cost-sensitive nature of SEO consultancy.
Kris Roadruck emphasized that most low-budget local SEO services, charging below $1000 per month, are “chop-shops” offering minimal service.
He claims these services follow a generic formula and warned that’s the level of quality businesses can expect unless they expand their budget.
“This is unfortunately budget-sensitive. Nearly all the folks that do low-budget local SEO (Sub $1000/mo) are chop-shops doing very little indeed and just following a generic formula performed by underpaid folks. Sadly this is the price range most local companies can afford unless they are attorneys, dentists, or some other high value service based biz.”
“This is painfully true,” reads a reply to his tweet.
Look For Experienced SEO Agencies
Isaline Muelhauser responded to Mueller’s tweet by suggesting that a local agency could be a helpful resource.
She highlights the advantages of working with an agency that operates in the same area as the business requiring local SEO services:
“Chose someone who’s experimented in local SEO and understands the local market. E.g. for Local SEO we manage multilingualism and often low [monthly search volume]. The animal park owner should feel that their business is understood not receive a general pitch deck.”
To that end, it helps to ensure the agency you hire is familiar with the tools of the trade.
An individual who goes by the handle SEOGoddess on Twitter states:
“If the local SEO company knows what @yext or @RenderSeo is then that’s the first step in trusting they know what they’re doing. From there it’s a matter of drafting up a good strategy and knowing how to help execute on it.”
Insights From The Twitter Exchange
The Twitter thread sparked a variety of responses. However, these key topics were left unexplored:
Specific criteria or red flags to watch out for when evaluating potential SEO consultants.
Concrete examples or case studies demonstrating successful SEO strategies (bonus points for businesses like an animal park.
To address these gaps, let’s look at a successful SEO strategy for a local business dealing with animals.
The most relevant case study I could find is from Step 5 Creative, who performed local SEO services for Deer Grove Animal Hospital in Illinois.
This case study shows what’s involved in a comprehensive SEO approach and what you should expect from an experienced agency.
Takeaways From A Successful SEO Strategy
The first step in Step 5’s SEO campaign was a thorough onboarding process, which included the following:
The above steps aim to improve a site’s visibility to human visitors and search engine bots.
Step 5’s strategy also involved ensuring the business was listed in local online directories and had a complete Google Business Profile.
Keyword reassessment was another crucial step. The team monitored clickthroughs and positions for targeted keywords and adjusted their strategy accordingly.
The team conducted extensive keyword research, experimenting with different variations of keywords to see which yielded better results.
Reviewing the animal hospital’s traffic acquisition report and bounce rate helped the Step 5 Creative team understand where visitors came from and how they interacted with the site. This information allowed them to improve the user experience and encourage better traffic flow.
Lastly, the team underscores the need for continual updates for long-term success. This refers to the earlier point of working with an SEO consultant committed to staying current with best practices.
Criteria & Red Flags in SEO Consultancy
In the complex world of SEO, it’s essential to differentiate between “serious” and “less-serious” consultants.
Serious consultants are invested in understanding your business, industry, and specific needs.
In contrast, less-serious consultants might offer generic services not catering to your unique requirements.
Here are additional criteria to consider and red flags to watch for when hiring an SEO professional:
Customized Strategy: A serious SEO consultant will take the time to understand your business, its goals, and its challenges and design a tailored SEO strategy accordingly.
A consultant offering a one-size-fits-all solution without understanding your needs could be a red flag.
Transparency: Reputable SEO consultants will be open about their methods and strategies and able to explain them in terms you understand.
Beware of consultants who promise immediate results or use jargon to confuse you.
Track Record: Look for consultants with a proven track record of success. They should be able to provide case studies or references from previous clients.
Consider this a red flag if a consultant is unwilling or unable to provide evidence of their success.
Continual Learning: SEO is a constantly changing field. Serious consultants will stay up-to-date with the latest SEO trends and algorithm changes.
Be cautious of consultants who rely on outdated practices.
Ethical Practices: A serious SEO consultant adheres to ethical SEO practices, often called “White Hat SEO.”
Avoid consultants who suggest or employ unethical practices, known as “Black Hat SEO,” such as keyword stuffing or hidden text.
The Twitter conversation initiated by Google’s John Mueller provided valuable insights into what to consider when choosing an SEO professional.
Cost shouldn’t be the sole factor, as low-budget services often deliver minimal results.
Instead, prioritize experienced SEO agencies that understand the local market and have a track record of success.
Look for consultants knowledgeable about industry tools who can develop a customized strategy tailored to your business’s needs.
Other essential factors include transparency, a proven track record, continual learning, and adherence to ethical practices.
By following these guidelines, you can make an informed decision when hiring an SEO consultant that will help your business thrive in the digital landscape.
Utvald bild genererad av författaren med Midjourney.