Once upon a time, when there were actually a bunch of search engines to compare results from, we had Parallel Search Forms.  I won’t call them an “engine” because all they did was display the results from several search engines side by side.

You had one form to enter your search terms and the site would query 2 or 3 search engines simultaneously, displaying the  results side by side for comparison.

Example: You can see the remains of one here (opens in new tab) as an illustration.

I would love to have one of these today, setup like this:

Privacy Search engine #1: Duckduckgo – DDG uses Bing as the backbone of their search.

Privacy Search engine #2: Startpage – SP uses un geolocated, un personalized Google.

Privacy Search engine #3: Mojeek – uses their own index.

This would make it easy to compare search results of the first couple of pages of each AND how many ads are on those pages all in real time.

This was also posted to
/en/search-engines.

Also on:

If I were building a search engine…

  1. You need to start building your own index of the web.  That means you need a crawler (robot) and it needs to be good.  It takes time to build an index and it is not cheap.
  2. I would gather in info from other providers: Wikipedia, Wolfram Alpha whatever I could cobble together.
  3. Until your own index is ready, you need to have a search results from a big search engine.  There are only two left: Google and Bing in English.
  4. What happens if Google or Bing refuse to sell you a search feed or if they refuse to renew because you are getting too big?  You could probably hammer together a good blended backfill feed by combining Mojeek, Gigablast and Yandex using your own algorithm.  You either lead with your own results and use the other three as backfill or you blend your own results in with the three others in a sort of seamless metasearch.  (You do need to plan for “what if” scenarios.
  5. All the above is to buy you time while you learn and refine how to crawl the web on your own.
  6. Eventually you roll out your own search index as the backbone of the search results with the others maybe on standby for queries you are weak on.

Is Duckduckgo using Bing as it’s backbone search provider while it builds its own index?  We know DDG has it’s own crawler, but we don’t know if it is building it’s own index or just spam hunting.  I certainly do not know.  If it were me, I would not be building a search engine user base successfully, and expect somebody else to provide the search feed forever.  Not when there are only two big indexes.  If I were DDG I would have a Big Plan plus I would have contingency plans about 3 or 4 layers deep.  DDG is pretty quiet about all this which is probably wise.

Qwant seems to be the other player.  Their strategy is more open:  they are actively building an index by crawling the web.  Bing is providing backfill.  It is not obvious where Qwant’s index ends and Bing’s begins.  The results are good and seamless.  With a little luck, marketing and money Qwant will eventually need to use Bing less and less.  This too is a good plan.

Fortunately, Bing is willing to sell it’s search feed to just about anyone that can pay the fee.  For now.  I do not think Microsoft really knows what to do with Bing, except to somehow milk it for all the dollars they can get until they decide what to do with it and how it fits in with the company strategy.

Search engines are important but they are not as important as they were before the social media silos of Facebook and Twitter.  You don’t need Google to find a company’s Facebook page.

If I were trying to build a search engine today those are some of the things I would try. All this could change in a year.

Also on:

UK based Mojeek.com is the next privacy search engine I am going to make my default and use for a few weeks.  (There is a UK specialized version at Mojeek.co.uk, plus German and French versions.)

I did a quick look over before, but this test will be daily use as my default browser.  I will try my best to use it exclusively, without resorting to a backup search engine.

I really don’t know what to expect.  Mojeek has it’s own unique database that it is building by crawling the web with a robot.  But, unlike Duckduckgo or Qwant, it has no backfill from a massively larger search engine to back up the Mojeek search index.  They out there with no training wheels.  That takes guts.

So we will just have to see how I get on.

I don’t have to do this alone.  You can join me!  Just make Mojeek your default search engine for a few weeks and use it as your daily driver.  I would love to hear how you do and what you think.

Also on:

My experiment with Qwant search engine is ending early.  On July 20th, 2018 I started using it as my default search engine for a month long test.  Today, August 7th, I am terminating further testing.

The test terminated when I got yet another captcha because they detected unusual activity from me and thought I was a robot. That never happens to me.  I just don’t type that fast.  When I’m trying to find something I do keep refining my search query.  I guess that is too much for Qwant.  They apparently only want dilettantes doing an occasional search. This is not a robust search engine for somebody who seriously works and searches.

The truth is, I was already getting irritated with Qwant.  Not for the quality of it’s search results, but because the search results page was taking too long to load.  The page is pretty but slow.  I’d want lean and fast results.  The actual search results were pretty darn good.

But I can’t use a search engine that craps out on me right in the middle of a project.

Pros:

Trying to build their own unique index.

Decent search results.

Privacy

Cons:

Slow

Not robust. Can’t handle multiple searches typed by a slow human typist.

Also on:

Yesterday I noticed that there is an Add URL form at Gigablast search engine.  So I took a few seconds and added both of my current domains.  Why? Because it’s free, it’s there and it puts my domiains, both newish, on Gigablast’s eventual crawl list.  At least they know about them.

Back in the day, all the engines and directories had an add url form.  When a new directory or search engine launched all the webmasters would race over to add their sites.  Especially on the directories, it was easier to get listed in a new directory for free when they were new and needed to build an index.  The longer you waited the harder it became to get listed.

The whole idea behind having a website on the Web is to be found.  To do that you want to be listed in as many legit places as possible.  You never know where a faithful reader is going to come from.  We’re living in a unique time, where social networks like Facebook and Twitter have loosened Google’s throttling, monopoly, grip on web discovery.  We could go back to that overnight, so get yourself listed in as many search engines large and small as you can.

Gigablast has been around for years.  It used to supply almost all the meta-search engines.  That brought traffic. I think most meta’s have died because there are so few general search engines to get search feeds from that it is not worth running a meta-search.  But Gigablast still has it’s own index, it still crawls the web and if adding my URL’s aids them and me then it’s worth doing.

This is so simple, I don’t know why I didn’t think of it before.  You can make a decent blog search engine, for free, in minutes with Duckduckgo.

  1. Go to the Duckduckgo Search Box page.
  2. Do whatever customization you like.
  3. In the field “Site Search” type in: blogspot.com,wordpress.com,medium.com

blogspot.com,wordpress.com,medium.com

Medium.com is optional.  Copy the searchbox code and Paste it on an HTML page and you are done.

Pros:

  • Because it’s DDG all searches are private.
  • Blogspot and WordPress are the two largest blog hosts on the web, at least in the Western World.
  • Fast results with very few ads.
  • DDG does not insist on having their branding all over the searchbox.
  • You can do multi word, complex search queries.

 

Cons:

You are not searching for all blogs. Only blogs hosted on a subdomain “blogname.blogspot.com” are going to be searched.  Blogs on these same hosts that pay to use their own domain will be overlooked.  Also, blogs that are self hosted on their own servers will be overlooked.

Have fun. Adapt this to your own needs.  If you find this useful on your own site, come back and let me know how you use it.  Thanks.

In a previous conversation, I made a rough list of types of blog search directories and search engines.

Blog Discovery:  I’m sure directories are not the best solution for blog discovery, but like blogrolls they have a place at the table because they are low tech and cheap. Here’s a rough hierarchy:

  • Do In A Pinch: Blog acting as a directory.

  • Minimum: A proper directory script (ie. phplinkdirectory or similar) This allows for blog owner to submit their blog, write a description etc.

  • Better:  A directory script that not only lists blog URL but also lists blogs RSS feed.

  • Better x 2: Directory described above which also generates it’s own RSS feed for each category and subcategories.

  • Better x 3:  Some sort of fusion of the x2 directory above, and Indieweb stuff to some degree.  Maybe a fusion of a standard directory with Kicks Condor’s Indieweb.xyz.  This is just brainstorming.

  • Better x 4: Probably an RSS search engine like the old defunct Daypop or Icerocket. Because this leads the searcher to individual posts about a topic in close to real time.  Such an engine could use Post Kinds as filters for the searcher to refine their search.  There used to be a lot of RSS blog search engines I could find only RSSMicro today.

  • Best: Some sort of hybrid directory/RSS/crawler engine listing only blogs.  The search crawler digs deep into a blog for those posts from 2015 or before that are buried and won’t appear in feeds.  The RSS search engine for the newest posts.

I think I need to expand on this.

The more advanced the solution, the greater the technology bar to entry.  Just about anyone can start a human edited directory, but creating an RSS search engine requires more programming skill.  Moreover, the more advanced solutions require more money to run which again raises the bar to entry.

We see this now with web search, the cost to build a search engine that will rival Google is more than many countries can afford.

But if we only follow the technology route we create another set of silo’s like Google and Bing.

So in the blog world we need lots of different ways to discover blogs and blog posts.  We need ways for new blogs to be found by readers before their bloggers lose heart in talking to themselves.  We need many high tech search engines and many low tech directories so that the blogosphere does not become consolidated once again into a few silos.  No one source is going to index the entire blogosphere, and that’s okay. because each index is a collection.

The Indieweb plays an important role in this.  Webmentions will become the new backbone of making the Web the social network.  And they will also help blogs get discovered.  But right now Indieweb adoption is relatively small.  We also need to find audiences that are just readers who do not blog.  Because even siloed social networks have their own search function.

  • If there are 100,000 bloggers, then we need 100,000 blogrolls.
  • We need site searches, like phinde, that can index multiple domains.
  • We need many dozens of blog directories of all sorts: human reviewed and, somehow, Indieweb automated.
  • We need a dozen blog and RSS feed search engines.

All at the same time.  Let people choose their trusted sources.

Other reading:

Search Engine History

Bomis Webrings had Important Differences.

Indieweb, Discovery and Search.

Also on: