First, let me say, I understand that the IndieWeb movement already has a lot on their plate and that they have already accomplished a lot.

It seems to me, at some point, the problem of commercial silos of web search engines must be addressed since 1. a near monopoly is held by Google, 2. both spidering engines (Google and Bing) are oriented towards brands, data mining user profiles, advertising and the commercial.

How we build websites today, is largely controlled by what Google likes and dislikes.  If you don’t build the site Google’s way, you don’t rank in Google.  If you don’t rank in Google, you might as well be dead.  It has happened slowly over time, but Google has warped the Web into it’s own image. We don’t build websites the same way we did Before Google (BG).

But there are alternative search engines, and I like and use several of them.

Duckduckgo – protects user privacy. Draws search results from many sources. Bing is the backbone of their search results.  They do have their own spider and index but I’m not sure how large that is.  If Bing should cease operating DDG will be in trouble.

StartPage – protects user privacy. Is basically Google feed stripped of geolocation and personal search history data. But if Google ever turns off access to the search feed, StartPage is gone.

Hotbot –  claims to protect privacy.  Appears to be straight up Bing feed, stripped of advertising and tracking customization.  Clean, minimalist results pages. But if Bing should cease operating, HotBot will go down.

As much as I like Duckduckgo and StartPage, both depend on the search indexes of large siloed companies.  With only two major search engines (Google and Bing) that spider the web it seems like an unhealthy state of affairs.

What can the IndieWeb do?

Off the self solutions

There are a couple of smaller search engines that are trying to stay alive.

Mojeek.com – privacy protecting. UK based. Active spidering as money permits.  Has potential.

Gigablast.com – Open Source code. Active spidering as money permits.  Can index URL’s very quickly. Has potential.

Both need larger databases. Both need funding. Both need some R&D to improve search results and ranking logarithms.

One option might be for the IndieWeb to campaign for private donations for one or both of these independent search engines. Publicity within the movement would help.

Build Our Own IndieWeb Search Engine(s)?

Here are a few resources available “off the self.”

Searx – (running instance). Open Source Meta search script.  At best this would be a stop gap solution.  One major problem is it is scraping results from other engines without permission, sooner or later that will get shut down.

Gigablast – The script is Open Source.  Gigablast, as is, is quite impressively capable if one can afford to keep it indexing and provide the coding to fork it and improve it.

More Open Source and P2P Search Engines.

Curlie – is the revival/continuation of the Open Directory Project.  Large meta directories have had their day but Curlie could provide two things: 1. a big index for a starting crawl for a new search engine, 2. A ranking indicator of quality for web pages. Human reviewed collections like ODP were used by Google in the early days as a quality indicator in ranking and can be used by new engines.

Wikipedia – 1. Using Wikipedia itself to answer queries, 2. Wikipedia contains a lot of outside links, so it would be another place to use as a starting crawl.

Guerrilla Discovery Options

Webrings and Blogrolls – Webringo is free and seems underutilized, why not use it to create web rings for discovery?  Blogrolls, hey why not?

Directories and Filter Blogs –  niche directories can still work when tied to an interest community (just ask almost any local Chamber of Commerce).  Filter blogs might work too. Boingboing is an example of a filter blog that leads you to things posted on the web.  Most bloggers already do some filter blogging.

I’m sure this has already been discussed within the IndieWeb community.  Coming up with a full fledged search engine would be a monumental task and expensive.  But I’ve also tried to lay out smaller interim steps that will gain experience or help break the corporate silos.

Agree? Disagree? What am I missing? Feel free to comment.

 

 

 

I am looking at Curlie.org, which is the continuation of the human edited Open Directory Project (dmoz.org), and my conclusion remains the same: unfortunately, big static meta directories are obsolete for daily web search.  It pains me to say that because I was a directory owner for many years.

Google has the Curlie’s of the web beat on every point save one big issue: Quality.  A good human editor can say, “this site is quality on the subject, and it deserves to be listed and it deserves to be found.” It’s a bit like a librarian recommending a book: you know it’s going to be on topic and probably pretty good.

The Limits of Google

Google can’t do that.  Google can really only measure, popularity and that is not quality.  Despite all the 200 Plus ranking factors Google claims it still can’t distinguish quality.  You can have authorrank, pagerank, domain authority, encryption, mobile friendly, metatags, title tags, on page factors but they still can’t tell you that one page is brilliantly written and the other not so much.

Theoretically, the brilliantly written page will eventually gain more links back, more notice, but how long do you have to wait: months, years?

We first discussed this back in 2000 on the old Searchking.com forums, and even though Google has improved in so many ways, that central fact remains true.

Does it Matter?

Well actually, it can.  When Google was young, it used listing in dmoz.org as a factor in it’s ranking calculations.  Pages listed in the directory got into Google’s index a bit quicker as well.  There is no reason a new search engine cannot do the same.  I’m making these numbers up, but let’s say Curlie has 4 million URL’s listed. While 4 million is small in search engine terms, it still helps identify the cream and in effect say – “These are the ‘warhorses.'”

Old School

Of course there is the old school (Web 1.0) way of incorporating a large meta directory into your Search Engine Results Pages (SERPs): include  those directory category breadcrumb links that might match the search query into either the top or the bottom of your SERP.  It’s not very satisfying to the searcher, but it can help as a stopgap.  Yahoo did this with their directory, NBCi did it with theirs and (old) HotBot, Lycos did it with domoz.org.  A user preference switch to turn that feature on and off might work.

Anyway, the answer is yes, human edited directories can still have value if they are used right. A fountain pen is a perfectly good writing instrument, we just don’t use them much everyday for writing.

Feel free to tell me what uses am I not seeing.