The IndieWeb, Discovery and Web Search

First, let me say, I understand that the IndieWeb movement already has a lot on their plate and that they have already accomplished a lot.

It seems to me, at some point, the problem of commercial silos of web search engines must be addressed since 1. a near monopoly is held by Google, 2. both spidering engines (Google and Bing) are oriented towards brands, data mining user profiles, advertising and the commercial.

How we build websites today, is largely controlled by what Google likes and dislikes.  If you don’t build the site Google’s way, you don’t rank in Google.  If you don’t rank in Google, you might as well be dead.  It has happened slowly over time, but Google has warped the Web into it’s own image. We don’t build websites the same way we did Before Google (BG).

But there are alternative search engines, and I like and use several of them.

Duckduckgo – protects user privacy. Draws search results from many sources. Bing is the backbone of their search results.  They do have their own spider and index but I’m not sure how large that is.  If Bing should cease operating DDG will be in trouble.

StartPage – protects user privacy. Is basically Google feed stripped of geolocation and personal search history data. But if Google ever turns off access to the search feed, StartPage is gone.

Hotbot –  claims to protect privacy.  Appears to be straight up Bing feed, stripped of advertising and tracking customization.  Clean, minimalist results pages. But if Bing should cease operating, HotBot will go down.

As much as I like Duckduckgo and StartPage, both depend on the search indexes of large siloed companies.  With only two major search engines (Google and Bing) that spider the web it seems like an unhealthy state of affairs.

What can the IndieWeb do?

Off the self solutions

There are a couple of smaller search engines that are trying to stay alive. – privacy protecting. UK based. Active spidering as money permits.  Has potential. – Open Source code. Active spidering as money permits.  Can index URL’s very quickly. Has potential.

Both need larger databases. Both need funding. Both need some R&D to improve search results and ranking logarithms.

One option might be for the IndieWeb to campaign for private donations for one or both of these independent search engines. Publicity within the movement would help.

Build Our Own IndieWeb Search Engine(s)?

Here are a few resources available “off the self.”

Searx – (running instance). Open Source Meta search script.  At best this would be a stop gap solution.  One major problem is it is scraping results from other engines without permission, sooner or later that will get shut down.

Gigablast – The script is Open Source.  Gigablast, as is, is quite impressively capable if one can afford to keep it indexing and provide the coding to fork it and improve it.

More Open Source and P2P Search Engines.

Curlie – is the revival/continuation of the Open Directory Project.  Large meta directories have had their day but Curlie could provide two things: 1. a big index for a starting crawl for a new search engine, 2. A ranking indicator of quality for web pages. Human reviewed collections like ODP were used by Google in the early days as a quality indicator in ranking and can be used by new engines.

Wikipedia – 1. Using Wikipedia itself to answer queries, 2. Wikipedia contains a lot of outside links, so it would be another place to use as a starting crawl.

Guerrilla Discovery Options

Webrings and Blogrolls – Webringo is free and seems underutilized, why not use it to create web rings for discovery?  Blogrolls, hey why not?

Directories and Filter Blogs –  niche directories can still work when tied to an interest community (just ask almost any local Chamber of Commerce).  Filter blogs might work too. Boingboing is an example of a filter blog that leads you to things posted on the web.  Most bloggers already do some filter blogging.

I’m sure this has already been discussed within the IndieWeb community.  Coming up with a full fledged search engine would be a monumental task and expensive.  But I’ve also tried to lay out smaller interim steps that will gain experience or help break the corporate silos.

Agree? Disagree? What am I missing? Feel free to comment.




Liked this post? Follow this blog to get more. Follow

11 responses on “The IndieWeb, Discovery and Web Search”

  1. A major missing feature from the Big search engines is that they don’t bother understanding the richer microformats2 content on many IndieWeb enabled sites. Data like “this a like of that post” and “this is a reply to some other post”.
    Ryan Barrett did some great exploratory work last year on a microformats2-enabled crawler and on the resulting data set:

  2. Fantastic. Yes! This is exactly the problem. It’s very difficult for a new
    participant to make headway because they simply get lost in the pile.

    I think all of these approaches need to be reconsidered just like you’re doing,
    but I want to focus on directories—which are still very much in vogue, but
    are unrecognizable when compared to the old Yahoo. The big directories these
    days are Reddit and Pinterest. (And Wikipedia, like you say.) Delicious was one
    of these too. They are directories because they are topic-based catalogs of thought.
    Niche directories exist in the form of stuff like Pinboard, Hacker News, and so on. So, Hacker News acts basically like a directory of thought
    for its community. And the users there spend their time pruning and curating
    this directory.
    All of these directories struggle with a sort of memory failure—no one really
    plumbs the archives of these sites—but that makes perfect sense given that
    the focus is on absolute recency. Part of the spectacular failure of Yahoo-style
    directories was due to no sense of recency (a heartbeat) on all those links.
    The nice thing about Yahoo was that you could categorize yourself.
    Reddit, Pinterest, Pinboard—you have to wait for someone else to find you.
    To your point about filter blogs: I think there used to be an answer here.
    It used to be that for topic-based technology blogs, much of their grassroots
    content came from mailing lists. Mailing lists used to be the primary
    announcement system for software. (If you look at technology blogs, they are
    much more commercial now.) So the mailing lists acted like a completely open
    submission system where you could safely self-promote. And then blogs skimmed
    their favorite stuff out of these. (IRC and web forums also acted as support
    systems here.)
    So here’s what I’d like to see in a directory:

    Allows self-promotion. No one wants to leave a software (or fan art,
    essays, open question) announcement to their lonely blog. You want to push it
    out to the relevant community yourself.

    Sorting isn’t driven by upvotes or algorithms. You shouldn’t need to have to
    figure out how to game the system. Yes, these algorithms help prevent spam. But
    they enable clickbait!

    Drives users to the blogs themselves. Reddit isn’t just a directory—writers
    post their thoughts directly on Reddit. But those writers aren’t given all the
    tools. First off, the posts are limited in their layout. But also, they aren’t
    given syndication and identity tools like blogs have.

    Personal views of the directory. Right now it seems that we have this idea
    that our moderators should have the power to decide who’s in and who’s out. But
    this was never true of mailing lists. I still like the idea of killfiles. You
    block the specific users and blogs that drive you nuts.

    I am trying to accomplish some of this with my directory,
    but I’m also not sure it will all work due to self-promotion having problems of
    its own—in addition to being a dirty word on its own.
    At any rate, wonderful post. Thankyou!

    Also on:


  • Brad
  • Brad
  • Peter Rukavina
  • Chris Aldrich
  • Ton Zijlstra

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.