Connect with us


Top 5 Chinese Search Engines & How They Work



In 2021, China surpassed one billion internet users, making it the biggest online market in the world.

But as global businesses seek to gain a foothold in this rapidly growing digital economy, they face a unique set of challenges, including optimizing their websites for the world’s most populous nation.

Unlike in the rest of the world, where Google is the undisputed king of search engines, it held just 3.56% of the Chinese market in June 2022.

Bing, its main global competitor, fared slightly better, with an 11.47% market share.

But Chinese internet users still need a means of finding products and information on the web.

If they’re not using the search engines popular in the rest of the world, what are they using?

Domestic search engines, designed in China for use in China, of course.

To help you enter the Chinese online market or attract new customers from the east, this piece will cover the top five search engines in the People’s Republic of China (PRC) and give you tips for using them to reach your goals.

But before we get to that, it’s important to get some background information.

The Internet And Censorship In China

Though it had supported simplified and traditional Chinese searches since 2000, Google did not officially join the Chinese mainland market until 2006.

At this time, the country had just 137 million internet users.

Just four years later, the search engine giant abandoned the country in favor of Hong Kong to avoid Chinese censors.

In response, the Chinese government banned Google search sites in all languages.

This was all set in motion by an internet explosion in 2009.

Worried about the impacts of unrestricted access to information, the Chinese Ministry of Industry and Information Technology issued the Circular on Computer Pre-Loaded Green Internet Filter Software.

This required a web filter on all devices made or sold in China to block access to certain sites, including news, streaming, and social media sites, among others. More sites are added to this list every year and if you’re worried your domain is included, you can check here.

But where there’s a will, there’s always a way and enterprising and tech-savvy Chinese citizens have turned to use virtual private networks (VPNs) to access restricted sites.

Surprisingly, while there are numerous VPNs on the list of blocked sites, their use is not illegal.

And while this workaround exists, it’s simply not a significant means of driving traffic to western sites from the Asian country.

Google To Return To China?

It’s hard to imagine the world’s biggest search engine would completely forgo the world’s biggest online market and there have been hints it intends to return at some point.

In a 2018 letter obtained by the New York Times, hundreds of Google employees signed a letter objecting to working on a censored Chinese version of the search engine, which was being built in secret.

However, just one year later, Karan Bhatia, a vice president of government affairs and public policy at Google testified before a Senate Judiciary Committee that the project had been terminated.

But that hasn’t silenced murmuring that the company plans to get back into the Chinese market. If it does, it will be in for stiff competition from homegrown search engines that are already well-entrenched.

But more on that in a bit.

First, let’s talk about how the Chinese online marketplace and the Asian giant’s unique customer journey.

How Chinese Consumers Shop Online

The first thing every ecommerce company that wants to do business in China needs to understand is that the way Chinese consumers use the internet is very different from what most non-Chinese companies are used to.

For one thing, while mobile internet surpasses usage on computers in most countries, it does not dominate search traffic in the way it does in China.

In 2020, almost every Chinese internet user (99.7%) accessed the web via their smartphone.

By comparison, 32.8% accessed the internet via desktop and only 28.2% on laptop computers. Thus, any company entering the online marketplace in the country would be wise to focus its efforts on the mobile market.

Chinese consumers also rarely visit company or brand websites, preferring instead single-entry points where numerous brands are represented. Instead of searching for specific products, they tend to perform extensive research and read (often automated) recommendations before making a purchase.

Social media and influencers also have a strong influence on purchasing decisions.

On- and off-line sales channels tend to be more integrated with the PRC, while the line between entertainment and shopping is fuzzy.

Chinese users can often click on items they like in social media posts and buy them in a linked online store.

Additionally, Chinese merchants place an emphasis on customer service, which contributes to high levels of purchasing loyalty.

What Search Engines Is China Using?

While cultural differences exist from country to country, and sometimes region to region everywhere in the world, Chinese norms are often quite unfamiliar to Western companies. And that includes the search engines used.

So, what sites are the Chinese using to find things on the internet? Here are the top five:

1. Baidu – China’s Answer To Google

Much like “to Google” has become a standard verb meaning to look something up online, in China people “Baidu” something.

It controls more than 75% of the search engine market in the PRC and even brings in some users from other countries, including the U.S. and Japan.

Baidu got its start with funding from Silicon Valley in 2000, initially as just a homepage that allowed companies to bid on ad space. Since then, it has expanded not just into search, but also artificial intelligence and a number of internet-related products and services.

What You Need To Know

Baidu only indexes sites that use simplified Chinese characters.

That means if you don’t have a Mandarin website, you won’t show up.

It also prefers websites that are hosted on Chinese servers.

To host a website in the PRC, you must have an Internet Content Provider License.

Search engine position is determined by homepage and Baidu’s rankings still include meta keywords, partially due to image AI that is not as advanced as Google’s.

That means image alt texts and metadata are important to ensuring its image understanding.

HTTPS is also included as a ranking signal and it seems to take loading speed, content quality, and content prominence into account as well.

It’s also important to note that Baidu does not handle JavaScript well, so all content and links should be in plain HTML on both mobile and desktop versions of your site.

2. Sogou – Search-Dog

Initially launched in 2004, Sogou (literally “search dog”) holds the second spot in the domestic Chinese search market, claiming a 4.83% share.

In September 2021, it completed a $3.5 billion merger to become a subsidiary of Tencent, a technology company with gaming, social media, and entertainment interests.

What You Need To Know

Sogou’s search algorithm places a high value on original content and site authority.

Like Baidu, it favors websites that use simplified Chinese and those hosted on Chinese servers.

Backlinks are an important ranking factor, with the emphasis seemingly on quantity rather than quality. Meta descriptions don’t seem to be as important, but title tags are vital.

Because of its connection to Tencent, Sogou is the default search engine for QQ Browser, QQ Messenger, and WeChat, all major apps in the Chinese market.

3. Haosuo – Secure Search

Also known as Qihoo 360 Search and, Haosuo comes in at #3 in the Chinese domestic search engine market. Launched in 2012, it went through a series of domain changes, operating as,, and

Backed by one of China’s largest internet providers (Qihoo 360), it became Haosuo in 2016.

This change came with a simplified interface and an increased focus on mobile experiences.

What You Need To Know

The Qihoo 360 browser comes preinstalled on most Chinese computers, making it the Internet Explorer of China. No word on whether it’s used primarily by technologically challenged seniors, though.

Known for its security features, any Chinese businesses recommend their employees use Haosuo, making it a powerful player in B2B marketing.

Also, this cybersecurity emphasis appears to be reflected in rankings, with sites with higher authority and trustworthiness seeming to be ranked higher.

There is less competition on 360 Search, which often means a lower cost-per-click on paid ads.

A unit based in Hong Kong may also make it easier for foreign companies to advertise on this platform.

4. Shenma – The First Name In Mobile

A venture between ecommerce giant Alibaba and UC Web, Shenma claims 1.74% of the Chinese market.

It is the default search engine on the UC web browser, which is one of the most used browsers.

What differentiates Shenma from the competition, and most search engines for that matter is that it is mobile-only.

Calling itself the “experts in mobile search,” Shenma is a combination search engine and app store.

What You Need To Know

Shenma’s link with Alibaba allows it to include direct links to product pages.

It’s widely used for home goods, clothing, and books, as well as apps.

Products that are listed on Taobao or Tmall (Alibaba shopping properties) are given priority, which improves placement in search results.

5. Youdao – The Translation Search Engine

A division of Chinese internet technology company NetEase, Youdao operates more like an online education platform than a traditional search engine.

It allows users to search websites, images, news, and perhaps most importantly to foreign users, Chinese-to-English entries.

What You Need To Know

Youdao can translate Mandarin into more than 20 languages.

It is the biggest translation tool and online dictionary in the PRC, providing example sentences and word usage help.

More than half of Youdao’s users are 24 or younger.

Primarily used by students and high-income individuals, it offers opportunities for foreign companies looking to sell international products in China.

Getting Started In SEO In China

Getting a foot in the door in China’s search engine rankings can be tricky.

And if you don’t have a site in Mandarin, preferably hosted within the PRC, it can be very tough.

But in a country of more than 1 billion internet users, it’s worth the effort.

Baidu is the big dog on the block, but it doesn’t dominate the Chinese market in the same way Google dominates the American one.

Competitors are finding new ways to carve out their own niches.

And this provides opportunities for international companies.

International SEO requires some extra work, but by doing your research, becoming familiar with Chinese search habits, and working within the confines of the PRC’s internet environment, you can claim your spot in the rankings and expand into new markets.

More resources:

Featured Image: Shayli/Shutterstock

Source link


How to Block ChatGPT From Using Your Website Content



How to Block ChatGPT From Using Your Website Content

There is concern about the lack of an easy way to opt out of having one’s content used to train large language models (LLMs) like ChatGPT. There is a way to do it, but it’s neither straightforward nor guaranteed to work.

How AIs Learn From Your Content

Large Language Models (LLMs) are trained on data that originates from multiple sources. Many of these datasets are open source and are freely used for training AIs.

Some of the sources used are:

  • Wikipedia
  • Government court records
  • Books
  • Emails
  • Crawled websites

There are actually portals and websites offering datasets that are giving away vast amounts of information.

One of the portals is hosted by Amazon, offering thousands of datasets at the Registry of Open Data on AWS.

Screenshot from Amazon, January 2023

The Amazon portal with thousands of datasets is just one portal out of many others that contain more datasets.

Wikipedia lists 28 portals for downloading datasets, including the Google Dataset and the Hugging Face portals for finding thousands of datasets.

Datasets of Web Content


A popular dataset of web content is called OpenWebText. OpenWebText consists of URLs found on Reddit posts that had at least three upvotes.

The idea is that these URLs are trustworthy and will contain quality content. I couldn’t find information about a user agent for their crawler, maybe it’s just identified as Python, I’m not sure.

Nevertheless, we do know that if your site is linked from Reddit with at least three upvotes then there’s a good chance that your site is in the OpenWebText dataset.

More information about OpenWebText is here.

Common Crawl

One of the most commonly used datasets for Internet content is offered by a non-profit organization called Common Crawl.

Common Crawl data comes from a bot that crawls the entire Internet.

The data is downloaded by organizations wishing to use the data and then cleaned of spammy sites, etc.

The name of the Common Crawl bot is, CCBot.

CCBot obeys the robots.txt protocol so it is possible to block Common Crawl with Robots.txt and prevent your website data from making it into another dataset.

However, if your site has already been crawled then it’s likely already included in multiple datasets.

Nevertheless, by blocking Common Crawl it’s possible to opt out your website content from being included in new datasets sourced from newer Common Crawl data.

The CCBot User-Agent string is:


Add the following to your robots.txt file to block the Common Crawl bot:

User-agent: CCBot
Disallow: /

An additional way to confirm if a CCBot user agent is legit is that it crawls from Amazon AWS IP addresses.

CCBot also obeys the nofollow robots meta tag directives.

Use this in your robots meta tag:

<meta name="robots" content="nofollow">

Blocking AI From Using Your Content

Search engines allow websites to opt out of being crawled. Common Crawl also allows opting out. But there is currently no way to remove one’s website content from existing datasets.

Furthermore, research scientists don’t seem to offer website publishers a way to opt out of being crawled.

The article, Is ChatGPT Use Of Web Content Fair? explores the topic of whether it’s even ethical to use website data without permission or a way to opt out.

Many publishers may appreciate it if in the near future, they are given more say on how their content is used, especially by AI products like ChatGPT.

Whether that will happen is unknown at this time.

More resources:

Featured image by Shutterstock/ViDI Studio

Source link

Continue Reading


Google’s Mueller Criticizes Negative SEO & Link Disavow Companies



Google's Mueller Criticizes Negative SEO & Link Disavow Companies

John Mueller recently made strong statements against SEO companies that provide negative SEO and other agencies that provide link disavow services outside of the tool’s intended purpose, saying that they are “cashing in” on clients who don’t know better.

While many frequently say that Mueller and other Googlers are ambiguous, even on the topic of link disavows.

The fact however is that Mueller and other Googlers have consistently recommended against using the link disavow tool.

This may be the first time Mueller actually portrayed SEOs who liberally recommend link disavows in a negative light.

What Led to John Mueller’s Rebuke

The context of Mueller’s comments about negative SEO and link disavow companies started with a tweet by Ryan Jones (@RyanJones)

Ryan tweeted that he was shocked at how many SEOs regularly offer disavowing links.

He tweeted:

“I’m still shocked at how many seos regularly disavow links. Why? Unless you spammed them or have a manual action you’re probably doing more harm than good.”

The reason why Ryan is shocked is because Google has consistently recommended the tool for disavowing paid/spammy links that the sites (or their SEOs) are responsible for.

And yet, here we are, eleven years later, and SEOs are still misusing the tool for removing other kinds of tools.

Here’s the background information about that.

Link Disavow Tool

In the mid 2000’s there was a thriving open market for paid links prior to the Penguin Update in April 2012. The commerce in paid links was staggering.

I knew of one publisher with around fifty websites who received a $30,000 check every month for hosting paid links on his site.

Even though I advised my clients against it, some of them still purchased links because they saw everyone else was buying them and getting away with it.

The Penguin Update caused the link selling boom collapsed.

Thousands of websites lost rankings.

SEOs and affected websites strained under the burden of having to contact all the sites from which they purchased paid links to ask to have them removed.

So some in the SEO community asked Google for a more convenient way to disavow the links.

Months went by and after resisting the requests, Google relented and released a disavow tool.

Google cautioned from the very beginning to only use the tool for disavowing links that the site publishers (or their SEOs) are responsible for.

The first paragraph of Google’s October 2012 announcement of the link disavow tool leaves no doubt on when to use the tool:

“Today we’re introducing a tool that enables you to disavow links to your site.

If you’ve been notified of a manual spam action based on ‘unnatural links’ pointing to your site, this tool can help you address the issue.

If you haven’t gotten this notification, this tool generally isn’t something you need to worry about.”

The message couldn’t be clearer.

But at some point in time, link disavowing became a service applied to random and “spammy looking” links, which is not what the tool is for.

Link Disavow Takes Months To Work

There are many anecdotes about link disavows that helped sites regain rankings.

They aren’t lying, I know credible and honest people who have made this claim.

But here’s the thing, John Mueller has confirmed that the link disavow process takes months to work its way through Google’s algorithm.

Sometimes things happen that are not related, no correlation. It just looks that way.

John shared how long it takes for a link disavow to work in a Webmaster Hangout:

“With regards to this particular case, where you’re saying you submitted a disavow file and then the ranking dropped or the visibility dropped, especially a few days later, I would assume that that is not related.

So in particular with the disavow file, what happens is we take that file into account when we reprocess the links kind of pointing to your website.

And this is a process that happens incrementally over a period of time where I would expect it would have an effect over the course of… I don’t know… maybe three, four, five, six months …kind of step by step going in that direction.

So if you’re saying that you saw an effect within a couple of days and it was a really strong effect then I would assume that this effect is completely unrelated to the disavow file. …it sounds like you still haven’t figured out what might be causing this.”

John Mueller: Negative SEO and Link Disavow Companies are Making Stuff Up

Context is important to understand what was said.

So here’s the context for John Mueller’s remark.

An SEO responded to Ryan’s tweet about being shocked at how many SEOs regularly disavow links.

The person responding to Ryan tweeted that disavowing links was still important, that agencies provide negative SEO services to take down websites and that link disavow is a way to combat the negative links.

The SEO (SEOGuruJaipur) tweeted:

“Google still gives penalties for backlinks (for example, 14 Dec update, so disavowing links is still important.”

SEOGuruJaipur next began tweeting about negative SEO companies.

Negative SEO companies are those that will build spammy links to a client’s competitor in order to make the competitor’s rankings drop.

SEOGuruJaipur tweeted:

“There are so many agencies that provide services to down competitors; they create backlinks for competitors such as comments, bookmarking, directory, and article submission on low quality sites.”

SEOGuruJaipur continued discussing negative SEO link builders, saying that only high trust sites are immune to the negative SEO links.

He tweeted:

“Agencies know what kind of links hurt the website because they have been doing this for a long time.

It’s only hard to down for very trusted sites. Even some agencies provide a money back guarantee as well.

They will provide you examples as well with proper insights.”

John Mueller tweeted his response to the above tweets:

“That’s all made up & irrelevant.

These agencies (both those creating, and those disavowing) are just making stuff up, and cashing in from those who don’t know better.”

Then someone else joined the discussion:

Mueller tweeted a response:

“Don’t waste your time on it; do things that build up your site instead.”

Unambiguous Statement on Negative SEO and Link Disavow Services

A statement by John Mueller (or anyone) can appear to conflict with prior statements when taken out of context.

That’s why I not only placed his statements into their original context but also the history going back eleven years that is a part of that discussion.

It’s clear that John Mueller feels that those selling negative SEO services and those providing disavow services outside of the intended use are “making stuff up” and “cashing in” on clients who might not “know better.”

Featured image by Shutterstock/Asier Romero

Source link

Continue Reading


Source Code Leak Shows New Ranking Factors to Consider



Source Code Leak Shows New Ranking Factors to Consider

January 25, 2023, the day that Yandex—Russia’s search engine—was hacked. 

Its complete source code was leaked online. And, it might not be the first time we’ve seen hacking happen in this industry, but it is one of the most intriguing, groundbreaking events in years.

But Yandex isn’t Google, so why should we care? Here’s why we do: these two search engines are very similar in how they process technical elements of a website, and this leak just showed us the 1,922 ranking factors Yandex uses in its algorithm. 

Simply put, this information is something that we can use to our advantage to get more traffic from Google.

Yandex vs Google

As I said, a lot of these ranking factors are possibly quite similar to the signals that Google uses for search.

Yandex’s algorithm shows a RankBrain analog: MatrixNext. It also seems that they are using PageRank (almost the same way as Google does), and a lot of their text algorithms are the same. Interestingly, there are also a lot of ex-Googlers working in Yandex. 

So, reviewing these factors and understanding how they play into search rankings and traffic will provide some very useful insights into how search engines like Google work. No doubt, this new trove of information will greatly influence the SEO market in the months to come. 

That said, Yandex isn’t Google. The chances of Google having the exact same list of ranking factors is low — and Google may not even give that signal the same amount of weight that Yandex does. 

Still, it’s information that potentially will be useful for driving traffic, so make sure to take a look at them here (before it’s scrubbed from the internet forever).

An early analysis of ranking factors

Many of their ranking factors are as expected. These include:

  • Many link-related factors (e.g., age, relevancy, etc.).
  • Content relevance, age, and freshness.
  • Host reliability
  • End-user behavior signals.

Some sites also get preference (such as Wikipedia). FI_VISITS_FROM_WIKI even shows that sites that are referenced by Wikipedia get plus points. 

These are all things that we already know.

But something interesting: there were several factors that I and other SEOs found unusual, such as PageRank being the 17th highest weighted factor in Yandex, and the 19th highest weighted factor being query-document relevance (in other words, how close they match thematically). There’s also karma for likely spam hosts, based on Whois information.

Other interesting factors are the average domain ranking across queries, percent of organic traffic, and the number of unique visitors.

You can also use this Yandex Search Ranking Factor Explorer, created by Rob Ousbey, to search through the various ranking factors.

The possible negative ranking factors:

Here’s my thoughts on Yandex’s factors that I found interesting: 

FI_ADV: -0.2509284637 — this factor means having tons of adverts scattered around your page and buying PPC can affect rankings. 

FI_DATER_AGE: -0.2074373667 — this one evaluates content age, and whether your article is more than 10 years old, or if there’s no determinable date. Date metadata is important. 

FI_COMM_LINKS_SEO_HOSTS: -0.1809636391 — this can be a negative factor if you have too much commercial anchor text, particularly if the proportion of such links goes above 50%. Pay attention to anchor text distribution. I’ve written a guide on how to effectively use anchor texts if you need some help on this. 

FI_RANK_ARTROZ — outdated, poorly written text will bring your rankings down. Go through your site and give your content a refresh. FI_WORD_COUNT also shows that the number of words matter, so avoid having low-content pages.

FI_URL_HAS_NO_DIGITS, FI_NUM_SLASHES, FI_FULL_URL_FRACTION — urls shouldn’t have digits, too many slashes (too much hierarchy), and of course contain your targeted keyword.

FI_NUM_LINKS_FROM_MP — always interlink your main pages (such as your homepage or landing pages) to any other important content you want to rank. Otherwise, it can hurt your content.

FI_HOPS — reduce the crawl depth for any pages that matter to you. No important pages should be more than a few clicks away from your homepage. I recommend keeping it to two clicks, at most. 

FI_IS_UNREACHABLE — likewise, avoid making any important page an orphan page. If it’s unreachable from your homepage, it’s as good as dead in the eyes of the search engine.

The possible positive ranking factors:

FI_IS_COM: +0.2762504972 — .com domains get a boost in rankings.

FI_YABAR_HOST_VISITORS — the more traffic you get, the more ranking power your site has. The strategy of targeting smaller, easier keywords first to build up an audience before targeting harder keywords can help you build traffic.

FI_BEAST_HOST_MEAN_POS — the average position of the host for keywords affects your overall ranking. This factor and the previous one clearly show that being smart with your keyword and content planning matters. If you need help with that, check out these 5 ways to build a solid SEO strategy.

FI_YABAR_HOST_SEARCH_TRAFFIC — this might look bad but shows that having other traffic sources (such as social media, direct search, and PPC) is good for your site. Yandex uses this to determine if a real site is being run, not just some spammy SEO project.

This one includes a whole host of CTR-related factors. 

CTR ranking factors from Yandex

It’s clear that having searchable and interesting titles that drive users to check your content out is something that positively affects your rankings.

Google is rewarding sites that help end a user’s search journey (as we know from the latest mobile search updates and even the Helpful Content update). Do what you can to answer the query early on in your article. The factor “FI_VISITORS_RETURN_MONTH_SHARE“ also shows that it helps to encourage users to return to your site for more information on the topics they’re interested in. Email marketing is a handy tool here.

FI_GOOD_RATIO and FI_MANY_BAD — the percentage of “good” and “bad” backlinks on your site. Getting your backlinks from high-quality websites with traffic is important for your rankings. The factor FI_LINK_AGE also shows that adding a link-building strategy to your SEO as early as possible can help with your rankings.

FI_SOCIAL_URL_IS_VERIFIED — that little blue check has actual benefits now. Links from verified accounts have more weight.

Key Takeaway

Yandex and Google, being so similar to each other in theory, means that this data leak is something we must pay attention to. 

Several of these factors may already be common knowledge amongst SEOs, but having them confirmed by another search engine enforces how important they are for your strategy.

These initial findings, and understanding what it might mean for your website, can help you identify what to improve, what to scrap, and what to focus on when it comes to your SEO strategy. 

Source link

Continue Reading