J. R. Swab

YaCy, A Decentralized Search Alternative

Categories: [Technology]
Tags: [open-source], [privacy], [decentralization], [p2p]

Today we are getting more in-depth wit YaCy and why it is important. We, as internet users, need to be mindful of who is using our data and how. Does it benefit us to give up our information and privacy?

Convenience is often the only benefit found for today's mass data collection of internet users. Search engines are a part of this privacy war playing out in front of us. They collect data to sell ads tailored to our search history at a higher price than they could for random ads.

They can do this because there is a level of certainty when estimating clicks on an ad targeted to an individual. Even if they don't raise the cost of the ads and still use this targeted data they win. A better targeted ad will bring more clicks. More clicks means the company who bought the ad has to pay more.

Without targeting the ads the number of clicks would be much lower and therefore the search engine makes less money. That is how it all started. These search engines needed money to pay for the servers they used so they sold ads. Once they saw it was profitable, they tracked search history to better target the ads.

The companies paying for the ads do not mind paying more each day because they get more clicks. More clicks on their ads means more people to their site. More visitors means more sales and money. There was no backlash when the tracking started because the people to notice it first had enjoyed the change. Those that did not, keep searching without a clue.

This is the nature of the centralized model for the sites and online tools we use every day. The providers have bills and employees to pay so money is a necessity. Asking people used to using the service for free to pay for the same would be a death wish. It is much smoother for these search giants to track, log, and use our data to make the money they need.

What Is YaCy

YaCy is a search engine that uses the decentralized peer-to-peer model. When you become a peer, you crawl the web just like the big players. As you gather data you share it with the rest of the network to improve the speed and coverage of YaCy.

This search engine is free to use and there are no ads. YaCy can do this because they do not have to pay a server bill since you become one of the many servers on the network. They also do not have to pay anyone to keep the search engine running because there is no single point of failure.

The term for this is "distributed search engine" and there are a few others out there. Some include InfraSearch, Opencola, and FAROO. All of which started in the early 2000s with YaCy starting at the end of 2003 by Michael Christen. Each of these search engines have a similar structure to YaCy. All rely on the nodes to crawl, mine, and query information to display to the users.


With YaCy there is a web interface but the difference, from the well-known search engines, is that you can't just type in yacy.net and look up your question. There is a download for Windows, Mac, and Linux on their site. Once you have the file all you need to do is decompress the directory and run the script.

There is no install to use YaCy!

After you run the script, the YaCy search page will appear in your default web browser. From there you search as you would on Google, Bing, or Duck Duck Go. Some searches will take more time to complete than others. This all depends on how often the phrase gets a request, far as I can tell.

You may also run into a few searches with little or no results. This is the nature of the beast but with more users and nodes this problem will go away. However, you will only run into this if you are searching for something obscure. For over a decade YaCy has run in some form picking up more data to serve to its users.

More Than Websites

This data is not information on what you are searching for but what the internet has to offer you. There are over one billion documents indexed by YaCy peers. This includes html documents (websites), PDF files, images, and any other data that can store on a web server. Upon the search results loading you may even see files that the big engines won't show.

The biggest reason for many people using YaCy is that it will not censor data because it does not have to. Sites like Google, Bing, and Duck Duck Go may have to remove a result due to some government forcing them to do so. These centralized sources must remove the data or risk fines and even a shutdown.

Due to the decentralized, peer-to-peer nature of YaCy this is not an issue. There is no company to strong arm and get results taken offline. The best a government or company can do is force the site where people download the program offline. This won't stop it because there is already numerous people with a copy.

The YaCy script will end up on file sharing platforms like BitTorrent in less than a day's time if needed. This happens all the time when even centralized platforms get taken down. Gab grew much faster after Google removed their app from the play store. Now the site teaches people how to download and install the app without the Google Play Store.

When traditional hackers are silenced we find ways to screen louder and make it even harder to shut down our projects.

Ways to support the blog.

If you are an email kind of nerd you can sign up for mine here. You can donate to this site from my Liberapay account if you so choose. If you want a more passive way to support this site, use this link when shopping on Amazon; it kicks some of Amazon's profit to me at no extra cost to you.