Connect with us

SEO

11 Privacy-Focused, Alternative Search Engines to Google

Published

on

11 Privacy-Focused, Alternative Search Engines to Google


Google alternatives are everywhere, but are they any good?

To answer this question, I poked around a few popular alternative search engines for a day or two and used them for my daily work.

My main judging criterion was how each search engine fared in its commitment to protecting user data and privacy. Here are the 11 that got my approval:

  1. Startpage
  2. DuckDuckGo
  3. Brave Search
  4. Swisscows
  5. Search Encrypt
  6. OneSearch
  7. MetaGer
  8. Mojeek
  9. Qwant
  10. Ecosia
  11. You.com

Startpage's homepage. Search term "ahrefs" in text field

Startpage is effectively Google without the tracking. And for this reason, it takes our top spot.

According to the Netherlands-based search engine, your queries are anonymized before search results are pulled from Google. This means all identifying information is blanketed, including your IP address. No tracking cookies are used either.

Startpage also complies with GDPR, a European Union regulation that protects users’ data. Still, you should know the country is part of the Nine Eyes intelligence alliance, which shares mass surveillance data with eight countries.

Feature spotlight

"Anonymized View" message in search queries

The Anonymous View feature lets you visit websites from your search results anonymously for both desktop and mobile. It essentially behaves like a (free) VPN.

DuckDuckGo's homepage. Search term "ahrefs" in text field

DuckDuckGo is easily the most well-known private search engine around—and the antithesis to Google when it comes to favoring user privacy. (Though, it does monetize from user searches.)

Your search history is saved in a non-identifiable manner, meaning tracking cookies and personal identifiers, such as IP addresses, are not stored.

This is a huge plus for us despite the fact that results pulled from over 400 sources, including its own crawler (DuckDuckBot), crowdsourced sites like Wikipedia and partners like Bing and Yahoo.

To test its efficacy, I plugged the same keywords into Google and DuckDuckGo—from “covid-19 updates” to “kaws marina bay.” The results were often similar. I found this to be true of breaking news stories too.

A possible downside is that DuckDuckGo is based in the U.S. and, by extension, part of the Five Eyes intelligence alliance—which frequently collects mass surveillance data from internet companies.

Feature spotlight

Gif showing Bangs feature

DuckDuckGo’s “Bangs” feature takes you directly to search results on other sites. For example, typing “!w” and a keyword (e.g., !w singapore) takes you directly to Wikipedia’s page for Singapore.

Brave's homepage. Search term "Ahrefs" in text field

I like Brave Search for its focus on unbiased results, which it pulls from its own index. The company made the bold move of relinquishing its reliance on Google SERPs in October 2021.

Upon searching for both evergreen content and time-sensitive news, I found the results to be similar enough to Google’s—so long as you allow for anonymous local results.

For more privacy, you can choose to turn this toggle off and conduct manual searches, e.g., “things to do in [location].”

Toggle button for "Anonymous local results"

Just like the other search engines here, there is no user profiling and no personalized or targeted ads.

The search engine is part of Brave Software, whose co-founders include Brendan Eich (creator of JavaScript and co-founder of browser Mozilla Firefox) and Brian Bondy (former senior platform engineer at Mozilla).

Feature spotlight

Search results for "marketing." Knowledge panel on the right

Brave has knowledge panels for quick answers to your burning questions, just like Google.

Swisscows' homepage. Search term "Ahrefs" in text field

Swisscows is a Switzerland-based search engine that has its own index for German-language queries. For all other languages, results are pulled from Bing. But this isn’t an issue, given that all search queries are stripped of personal identifiers.

The search engine also omits the use of tracking cookies and geo-targeting.

While Swisscows’ search results certainly aren’t on par with Google’s, I like how family friendly the search engine is: It automatically filters out violent and pornographic search results by way of an enforced feature.

As far as privacy goes, Switzerland is not part of an intelligence alliance, but it does have a Mutual Legal Assistance Treaty with the U.S.

Feature spotlight

Search results for "digital marketing." Semantic map on the right

Swisscows has “semantic maps” to help you refine your searches.

Search Encrypt's homepage. Search term "ahrefs" in text field

Search Encrypt sources for its results from its content partners and search engines (including Google, Bing, and Yahoo), albeit in a privacy-safe manner.

Like the other search engines on this list, Search Encrypt anonymizes search queries, doesn’t retain server logs or IP addresses, and doesn’t store tracking cookies.

When it comes to searches, your terms are encrypted locally before being sent to the servers. After 30 minutes of inactivity, your browsing history will be erased.

However, Search Encrypt does store your search data (albeit without any of your personal identifiers) to improve its product performance.

I found its search results comparable to Ecosia’s. So while it’s not quite up there with DuckDuckGo and Startpage, Search Encrypt is reliable enough to be used for all kinds of queries—whether navigational, transactional, or informational.

Feature spotlight

Options to retry search on Yahoo, Bing, and Google

If you’re dissatisfied with the results or want a quick comparison, you can easily do so using the “retry this search” function located at the top of the page.

OneSearch's homepage. Search term "ahrefs" in text field

Owned by Yahoo’s parent company, Verizon Media, OneSearch claims to be a privacy-oriented search engine—with search results pulled from Bing.

No cookies are stored, and there is no sharing of identifiable personal data with advertisers. Having also perused its privacy policy, I found the search engine to be pretty safe.

On the downside, OneSearch profiles you based on query search terms and your imprecise location at the time of your search. So you may get contextual ads—or educated guesses about your interests based on your search keywords. There is no personal profiling or retargeting, though.

What I like are the little extras: the SafeSearch function, the ability to set a more specific location, and the “Advanced Privacy Mode” option (more below).

Feature spotlight

Toggle button to turn Advance Privacy Mode on or off. Short write-up about privacy below

Enable “Advanced Privacy Mode” to encrypt search terms and search URLs, which will mask your search content from third parties.

MetaGer's homepage. Search term "ahrefs" in text field

MetaGer is a Germany-based, open-source metasearch engine. Like Ecosia, its servers run on renewable energy.

Results are pulled from Scopia, Bing, OneNewspage, and OneNewspage (Video), so they’re pretty timely. You can also deselect the search engines used or create a blacklist of websites in the settings.

I appreciate how transparent MetaGer is in its handling of user information, from queries to maps. Still, it’s not without caveats: The search engine stores your full IP address for 96 hours, and your name and email address are kept if you fill out its contact form.

However, it does use an anonymizing proxy that ensures you retain full control over your data.

Feature spotlight

Example of "Did you know" box

The “Did you know”’ box (to the right of search results) offers tidbits about MetaGer and how to refine your searches. You can also click on the text to view the full list of tips.

Mojeek's homepage. Search term "ahrefs" in text field

Mojeek is a crawler-based search engine with its own search index of over 4 billion pages. This makes it excellent for unbiased information. But it also means there may be limited results, as it doesn’t pull results from other search engines.

Still, I like the search engine for its straightforward no-tracking policy. Your personal data will also never be sold or distributed, which is a huge plus in our books. If you’ve filled its contact form, you can request to have the information deleted too, as per GDPR.

All that said, Mojeek is based in the U.K., which is part of the Five Eyes intelligence alliance—just like DuckDuckGo.

Feature spotlight

Emoticons below text field

The emotion-based search classification feature allows you to enter a keyword and search by emotion.

Qwant's homepage. Search term "ahrefs" in text field

Qwant is a Paris-based search engine whose search results are powered by Bing and its own web crawler. It’s fully accessible in over 30 countries but, alas, not Singapore.

At last check, my colleague, SQ, found the accuracy of search results to be decent enough. But don’t expect location-specific answers, as Qwant doesn’t track your geolocation.

It also doesn’t collect data or use tracking cookies. But it keeps your IP address for fraud detection purposes. If you prefer complete anonymity, Qwant suggests using a VPN or the TOR relay service.

Like Startpage, Qwant offers GDPR protection. France is, however, part of the Nine Eyes intelligence alliance.

Feature spotlight

Example of search shortcut. Typing in a specific query leads to Amazon's page

There are “search shortcuts” that offer you quick access to specific websites. For instance, using the term “&a books” yields results from Amazon’s books category.

View the list of shortcuts here.

Ecosia's logo. Next to it is a text field to enter search queries

Did you know every action you take on your digital device emits carbon dioxide? And Google plays a big part in it: It’s accountable for ~40% of the internet’s carbon footprint.

To counteract this, private search engine Ecosia donates 80% of its profits to tree-planting projects, or roughly one tree for every 45 searches made. It has also built a solar plant so that its servers can run on clean power.

On the whole, I found its search results to be close enough to Google’s.

Ecosia isn’t fully private. It collects search data and personally identifiable information, both of which are only anonymized after seven days. But we reckon using the search engine is a worthy trade-off, as it seeks to tackle climate change.

Feature spotlight

Search results page. On top right-hand corner, a small scoreboard with a tree next to it

Your number of searches is shown on a scoreboard (unless you choose to clear your browser cookies), allowing you to keep track of your impact on the environment.

You.com's logo. Next to it is a text field containing search term "ahrefs"

Rounding off the list is You.com. The beta-stage search engine offers a highly customizable experience: Your search results appear on one webpage but are split into several sections that you can rearrange according to your preferences.

"Web results," "Images," and "Videos" sections on search results page

To further customize your results, sign up for an account to add and save apps to your dashboard (find out more under Feature Spotlight below). These apps are essentially your preferred sources of information and will show up along with your search results.

Saved app "Reddit" on search results page

While it’s a neat feature, it also means your first-party cookies will be stored for personalization purposes. Alternatively, you’re able to eliminate first-party cookie-tracking entirely by browsing privately or via VPN.

I found the search results for various keywords (such as “ahrefs” and “covid-19 singapore daily cases”) to be comparable to Google’s. And while some results are drawn from Microsoft, You.com maintains that user data is kept safe and not sold to advertisers. The company is carbon neutral too.

Feature spotlight

Section to add apps

Personalize your search results by adding apps to your feed—there are scores of categories to choose from. You can even add developer apps, such as Github, to your dashboard.

Final thoughts

While Google dominates the global search engine market, it’s had its fair share of criticism: antitrust issues, creating a filter bubble, violating user privacy, and more.

For users, alternative search engines may be one workaround. But don’t take our word for it; try out the ones on this list to decide what works best for you.

Got questions or comments? Ping me on Twitter.





Source link

SEO

How to Block ChatGPT From Using Your Website Content

Published

on

How to Block ChatGPT From Using Your Website Content

There is concern about the lack of an easy way to opt out of having one’s content used to train large language models (LLMs) like ChatGPT. There is a way to do it, but it’s neither straightforward nor guaranteed to work.

How AIs Learn From Your Content

Large Language Models (LLMs) are trained on data that originates from multiple sources. Many of these datasets are open source and are freely used for training AIs.

Some of the sources used are:

  • Wikipedia
  • Government court records
  • Books
  • Emails
  • Crawled websites

There are actually portals and websites offering datasets that are giving away vast amounts of information.

One of the portals is hosted by Amazon, offering thousands of datasets at the Registry of Open Data on AWS.

Screenshot from Amazon, January 2023

The Amazon portal with thousands of datasets is just one portal out of many others that contain more datasets.

Wikipedia lists 28 portals for downloading datasets, including the Google Dataset and the Hugging Face portals for finding thousands of datasets.

Datasets of Web Content

OpenWebText

A popular dataset of web content is called OpenWebText. OpenWebText consists of URLs found on Reddit posts that had at least three upvotes.

The idea is that these URLs are trustworthy and will contain quality content. I couldn’t find information about a user agent for their crawler, maybe it’s just identified as Python, I’m not sure.

Nevertheless, we do know that if your site is linked from Reddit with at least three upvotes then there’s a good chance that your site is in the OpenWebText dataset.

More information about OpenWebText is here.

Common Crawl

One of the most commonly used datasets for Internet content is offered by a non-profit organization called Common Crawl.

Common Crawl data comes from a bot that crawls the entire Internet.

The data is downloaded by organizations wishing to use the data and then cleaned of spammy sites, etc.

The name of the Common Crawl bot is, CCBot.

CCBot obeys the robots.txt protocol so it is possible to block Common Crawl with Robots.txt and prevent your website data from making it into another dataset.

However, if your site has already been crawled then it’s likely already included in multiple datasets.

Nevertheless, by blocking Common Crawl it’s possible to opt out your website content from being included in new datasets sourced from newer Common Crawl data.

The CCBot User-Agent string is:

CCBot/2.0

Add the following to your robots.txt file to block the Common Crawl bot:

User-agent: CCBot
Disallow: /

An additional way to confirm if a CCBot user agent is legit is that it crawls from Amazon AWS IP addresses.

CCBot also obeys the nofollow robots meta tag directives.

Use this in your robots meta tag:

<meta name="robots" content="nofollow">

Blocking AI From Using Your Content

Search engines allow websites to opt out of being crawled. Common Crawl also allows opting out. But there is currently no way to remove one’s website content from existing datasets.

Furthermore, research scientists don’t seem to offer website publishers a way to opt out of being crawled.

The article, Is ChatGPT Use Of Web Content Fair? explores the topic of whether it’s even ethical to use website data without permission or a way to opt out.

Many publishers may appreciate it if in the near future, they are given more say on how their content is used, especially by AI products like ChatGPT.

Whether that will happen is unknown at this time.

More resources:

Featured image by Shutterstock/ViDI Studio



Source link

Continue Reading

SEO

Google’s Mueller Criticizes Negative SEO & Link Disavow Companies

Published

on

Google's Mueller Criticizes Negative SEO & Link Disavow Companies

John Mueller recently made strong statements against SEO companies that provide negative SEO and other agencies that provide link disavow services outside of the tool’s intended purpose, saying that they are “cashing in” on clients who don’t know better.

While many frequently say that Mueller and other Googlers are ambiguous, even on the topic of link disavows.

The fact however is that Mueller and other Googlers have consistently recommended against using the link disavow tool.

This may be the first time Mueller actually portrayed SEOs who liberally recommend link disavows in a negative light.

What Led to John Mueller’s Rebuke

The context of Mueller’s comments about negative SEO and link disavow companies started with a tweet by Ryan Jones (@RyanJones)

Ryan tweeted that he was shocked at how many SEOs regularly offer disavowing links.

He tweeted:

“I’m still shocked at how many seos regularly disavow links. Why? Unless you spammed them or have a manual action you’re probably doing more harm than good.”

The reason why Ryan is shocked is because Google has consistently recommended the tool for disavowing paid/spammy links that the sites (or their SEOs) are responsible for.

And yet, here we are, eleven years later, and SEOs are still misusing the tool for removing other kinds of tools.

Here’s the background information about that.

Link Disavow Tool

In the mid 2000’s there was a thriving open market for paid links prior to the Penguin Update in April 2012. The commerce in paid links was staggering.

I knew of one publisher with around fifty websites who received a $30,000 check every month for hosting paid links on his site.

Even though I advised my clients against it, some of them still purchased links because they saw everyone else was buying them and getting away with it.

The Penguin Update caused the link selling boom collapsed.

Thousands of websites lost rankings.

SEOs and affected websites strained under the burden of having to contact all the sites from which they purchased paid links to ask to have them removed.

So some in the SEO community asked Google for a more convenient way to disavow the links.

Months went by and after resisting the requests, Google relented and released a disavow tool.

Google cautioned from the very beginning to only use the tool for disavowing links that the site publishers (or their SEOs) are responsible for.

The first paragraph of Google’s October 2012 announcement of the link disavow tool leaves no doubt on when to use the tool:

“Today we’re introducing a tool that enables you to disavow links to your site.

If you’ve been notified of a manual spam action based on ‘unnatural links’ pointing to your site, this tool can help you address the issue.

If you haven’t gotten this notification, this tool generally isn’t something you need to worry about.”

The message couldn’t be clearer.

But at some point in time, link disavowing became a service applied to random and “spammy looking” links, which is not what the tool is for.

Link Disavow Takes Months To Work

There are many anecdotes about link disavows that helped sites regain rankings.

They aren’t lying, I know credible and honest people who have made this claim.

But here’s the thing, John Mueller has confirmed that the link disavow process takes months to work its way through Google’s algorithm.

Sometimes things happen that are not related, no correlation. It just looks that way.

John shared how long it takes for a link disavow to work in a Webmaster Hangout:

“With regards to this particular case, where you’re saying you submitted a disavow file and then the ranking dropped or the visibility dropped, especially a few days later, I would assume that that is not related.

So in particular with the disavow file, what happens is we take that file into account when we reprocess the links kind of pointing to your website.

And this is a process that happens incrementally over a period of time where I would expect it would have an effect over the course of… I don’t know… maybe three, four, five, six months …kind of step by step going in that direction.

So if you’re saying that you saw an effect within a couple of days and it was a really strong effect then I would assume that this effect is completely unrelated to the disavow file. …it sounds like you still haven’t figured out what might be causing this.”

John Mueller: Negative SEO and Link Disavow Companies are Making Stuff Up

Context is important to understand what was said.

So here’s the context for John Mueller’s remark.

An SEO responded to Ryan’s tweet about being shocked at how many SEOs regularly disavow links.

The person responding to Ryan tweeted that disavowing links was still important, that agencies provide negative SEO services to take down websites and that link disavow is a way to combat the negative links.

The SEO (SEOGuruJaipur) tweeted:

“Google still gives penalties for backlinks (for example, 14 Dec update, so disavowing links is still important.”

SEOGuruJaipur next began tweeting about negative SEO companies.

Negative SEO companies are those that will build spammy links to a client’s competitor in order to make the competitor’s rankings drop.

SEOGuruJaipur tweeted:

“There are so many agencies that provide services to down competitors; they create backlinks for competitors such as comments, bookmarking, directory, and article submission on low quality sites.”

SEOGuruJaipur continued discussing negative SEO link builders, saying that only high trust sites are immune to the negative SEO links.

He tweeted:

“Agencies know what kind of links hurt the website because they have been doing this for a long time.

It’s only hard to down for very trusted sites. Even some agencies provide a money back guarantee as well.

They will provide you examples as well with proper insights.”

John Mueller tweeted his response to the above tweets:

“That’s all made up & irrelevant.

These agencies (both those creating, and those disavowing) are just making stuff up, and cashing in from those who don’t know better.”

Then someone else joined the discussion:

Mueller tweeted a response:

“Don’t waste your time on it; do things that build up your site instead.”

Unambiguous Statement on Negative SEO and Link Disavow Services

A statement by John Mueller (or anyone) can appear to conflict with prior statements when taken out of context.

That’s why I not only placed his statements into their original context but also the history going back eleven years that is a part of that discussion.

It’s clear that John Mueller feels that those selling negative SEO services and those providing disavow services outside of the intended use are “making stuff up” and “cashing in” on clients who might not “know better.”

Featured image by Shutterstock/Asier Romero



Source link

Continue Reading

SEO

Source Code Leak Shows New Ranking Factors to Consider

Published

on

Source Code Leak Shows New Ranking Factors to Consider

January 25, 2023, the day that Yandex—Russia’s search engine—was hacked. 

Its complete source code was leaked online. And, it might not be the first time we’ve seen hacking happen in this industry, but it is one of the most intriguing, groundbreaking events in years.

But Yandex isn’t Google, so why should we care? Here’s why we do: these two search engines are very similar in how they process technical elements of a website, and this leak just showed us the 1,922 ranking factors Yandex uses in its algorithm. 

Simply put, this information is something that we can use to our advantage to get more traffic from Google.

Yandex vs Google

As I said, a lot of these ranking factors are possibly quite similar to the signals that Google uses for search.

Yandex’s algorithm shows a RankBrain analog: MatrixNext. It also seems that they are using PageRank (almost the same way as Google does), and a lot of their text algorithms are the same. Interestingly, there are also a lot of ex-Googlers working in Yandex. 

So, reviewing these factors and understanding how they play into search rankings and traffic will provide some very useful insights into how search engines like Google work. No doubt, this new trove of information will greatly influence the SEO market in the months to come. 

That said, Yandex isn’t Google. The chances of Google having the exact same list of ranking factors is low — and Google may not even give that signal the same amount of weight that Yandex does. 

Still, it’s information that potentially will be useful for driving traffic, so make sure to take a look at them here (before it’s scrubbed from the internet forever).

An early analysis of ranking factors

Many of their ranking factors are as expected. These include:

  • Many link-related factors (e.g., age, relevancy, etc.).
  • Content relevance, age, and freshness.
  • Host reliability
  • End-user behavior signals.

Some sites also get preference (such as Wikipedia). FI_VISITS_FROM_WIKI even shows that sites that are referenced by Wikipedia get plus points. 

These are all things that we already know.

But something interesting: there were several factors that I and other SEOs found unusual, such as PageRank being the 17th highest weighted factor in Yandex, and the 19th highest weighted factor being query-document relevance (in other words, how close they match thematically). There’s also karma for likely spam hosts, based on Whois information.

Other interesting factors are the average domain ranking across queries, percent of organic traffic, and the number of unique visitors.

You can also use this Yandex Search Ranking Factor Explorer, created by Rob Ousbey, to search through the various ranking factors.

The possible negative ranking factors:

Here’s my thoughts on Yandex’s factors that I found interesting: 

FI_ADV: -0.2509284637 — this factor means having tons of adverts scattered around your page and buying PPC can affect rankings. 

FI_DATER_AGE: -0.2074373667 — this one evaluates content age, and whether your article is more than 10 years old, or if there’s no determinable date. Date metadata is important. 

FI_COMM_LINKS_SEO_HOSTS: -0.1809636391 — this can be a negative factor if you have too much commercial anchor text, particularly if the proportion of such links goes above 50%. Pay attention to anchor text distribution. I’ve written a guide on how to effectively use anchor texts if you need some help on this. 

FI_RANK_ARTROZ — outdated, poorly written text will bring your rankings down. Go through your site and give your content a refresh. FI_WORD_COUNT also shows that the number of words matter, so avoid having low-content pages.

FI_URL_HAS_NO_DIGITS, FI_NUM_SLASHES, FI_FULL_URL_FRACTION — urls shouldn’t have digits, too many slashes (too much hierarchy), and of course contain your targeted keyword.

FI_NUM_LINKS_FROM_MP — always interlink your main pages (such as your homepage or landing pages) to any other important content you want to rank. Otherwise, it can hurt your content.

FI_HOPS — reduce the crawl depth for any pages that matter to you. No important pages should be more than a few clicks away from your homepage. I recommend keeping it to two clicks, at most. 

FI_IS_UNREACHABLE — likewise, avoid making any important page an orphan page. If it’s unreachable from your homepage, it’s as good as dead in the eyes of the search engine.

The possible positive ranking factors:

FI_IS_COM: +0.2762504972 — .com domains get a boost in rankings.

FI_YABAR_HOST_VISITORS — the more traffic you get, the more ranking power your site has. The strategy of targeting smaller, easier keywords first to build up an audience before targeting harder keywords can help you build traffic.

FI_BEAST_HOST_MEAN_POS — the average position of the host for keywords affects your overall ranking. This factor and the previous one clearly show that being smart with your keyword and content planning matters. If you need help with that, check out these 5 ways to build a solid SEO strategy.

FI_YABAR_HOST_SEARCH_TRAFFIC — this might look bad but shows that having other traffic sources (such as social media, direct search, and PPC) is good for your site. Yandex uses this to determine if a real site is being run, not just some spammy SEO project.

This one includes a whole host of CTR-related factors. 

CTR ranking factors from Yandex

It’s clear that having searchable and interesting titles that drive users to check your content out is something that positively affects your rankings.

Google is rewarding sites that help end a user’s search journey (as we know from the latest mobile search updates and even the Helpful Content update). Do what you can to answer the query early on in your article. The factor “FI_VISITORS_RETURN_MONTH_SHARE“ also shows that it helps to encourage users to return to your site for more information on the topics they’re interested in. Email marketing is a handy tool here.

FI_GOOD_RATIO and FI_MANY_BAD — the percentage of “good” and “bad” backlinks on your site. Getting your backlinks from high-quality websites with traffic is important for your rankings. The factor FI_LINK_AGE also shows that adding a link-building strategy to your SEO as early as possible can help with your rankings.

FI_SOCIAL_URL_IS_VERIFIED — that little blue check has actual benefits now. Links from verified accounts have more weight.

Key Takeaway

Yandex and Google, being so similar to each other in theory, means that this data leak is something we must pay attention to. 

Several of these factors may already be common knowledge amongst SEOs, but having them confirmed by another search engine enforces how important they are for your strategy.

These initial findings, and understanding what it might mean for your website, can help you identify what to improve, what to scrap, and what to focus on when it comes to your SEO strategy. 

Source link

Continue Reading

Trending

en_USEnglish