05 Jul 2018

The IndieWeb, Discovery and Web Search

First, let me say, I understand that the IndieWeb movement already has a lot on their plate and that they have already accomplished a lot.

It seems to me, at some point, the problem of commercial silos of web search engines must be addressed since 1. a near monopoly is held by Google, 2. both spidering engines (Google and Bing) are oriented towards brands, data mining user profiles, advertising and the commercial.

How we build websites today, is largely controlled by what Google likes and dislikes. If you don’t build the site Google’s way, you don’t rank in Google. If you don’t rank in Google, you might as well be dead. It has happened slowly over time, but Google has warped the Web into it’s own image. We don’t build websites the same way we did Before Google (BG).

But there are alternative search engines, and I like and use several of them.

Duckduckgo - protects user privacy. Draws search results from many sources. Bing is the backbone of their search results. They do have their own spider and index but I’m not sure how large that is. If Bing should cease operating DDG will be in trouble.

StartPage - protects user privacy. Is basically Google feed stripped of geolocation and personal search history data. But if Google ever turns off access to the search feed, StartPage is gone.

Hotbot - claims to protect privacy. Appears to be straight up Bing feed, stripped of advertising and tracking customization. Clean, minimalist results pages. But if Bing should cease operating, HotBot will go down.

As much as I like Duckduckgo and StartPage, both depend on the search indexes of large siloed companies. With only two major search engines (Google and Bing) that spider the web it seems like an unhealthy state of affairs.

What can the IndieWeb do?

Off the self solutions

There are a couple of smaller search engines that are trying to stay alive.

Mojeek.com - privacy protecting. UK based. Active spidering as money permits. Has potential.

Gigablast.com - Open Source code. Active spidering as money permits. Can index URL’s very quickly. Has potential.

Both need larger databases. Both need funding. Both need some R&D to improve search results and ranking logarithms.

One option might be for the IndieWeb to campaign for private donations for one or both of these independent search engines. Publicity within the movement would help.

Build Our Own IndieWeb Search Engine(s)?

Here are a few resources available “off the self.”

Searx - (running instance). Open Source Meta search script. At best this would be a stop gap solution. One major problem is it is scraping results from other engines without permission, sooner or later that will get shut down.

Gigablast - The script is Open Source. Gigablast, as is, is quite impressively capable if one can afford to keep it indexing and provide the coding to fork it and improve it.

More Open Source and P2P Search Engines.

Curlie - is the revival/continuation of the Open Directory Project. Large meta directories have had their day but Curlie could provide two things: 1. a big index for a starting crawl for a new search engine, 2. A ranking indicator of quality for web pages. Human reviewed collections like ODP were used by Google in the early days as a quality indicator in ranking and can be used by new engines.

Wikipedia - 1. Using Wikipedia itself to answer queries, 2. Wikipedia contains a lot of outside links, so it would be another place to use as a starting crawl.

Guerrilla Discovery Options

Webrings and Blogrolls - Webringo is free and seems underutilized, why not use it to create web rings for discovery? Blogrolls, hey why not?

Directories and Filter Blogs - niche directories can still work when tied to an interest community (just ask almost any local Chamber of Commerce). Filter blogs might work too. Boingboing is an example of a filter blog that leads you to things posted on the web. Most bloggers already do some filter blogging.

I’m sure this has already been discussed within the IndieWeb community. Coming up with a full fledged search engine would be a monumental task and expensive. But I’ve also tried to lay out smaller interim steps that will gain experience or help break the corporate silos.

Agree? Disagree? What am I missing? Feel free to comment.

Brad Enslen

The IndieWeb, Discovery and Web Search

What can the IndieWeb do?

Off the self solutions

Build Our Own IndieWeb Search Engine(s)?

Guerrilla Discovery Options