If I were building a search engine…

  1. You need to start building your own index of the web.  That means you need a crawler (robot) and it needs to be good.  It takes time to build an index and it is not cheap.
  2. I would gather in info from other providers: Wikipedia, Wolfram Alpha whatever I could cobble together.
  3. Until your own index is ready, you need to have a search results from a big search engine.  There are only two left: Google and Bing in English.
  4. What happens if Google or Bing refuse to sell you a search feed or if they refuse to renew because you are getting too big?  You could probably hammer together a good blended backfill feed by combining Mojeek, Gigablast and Yandex using your own algorithm.  You either lead with your own results and use the other three as backfill or you blend your own results in with the three others in a sort of seamless metasearch.  (You do need to plan for “what if” scenarios.
  5. All the above is to buy you time while you learn and refine how to crawl the web on your own.
  6. Eventually you roll out your own search index as the backbone of the search results with the others maybe on standby for queries you are weak on.

Is Duckduckgo using Bing as it’s backbone search provider while it builds its own index?  We know DDG has it’s own crawler, but we don’t know if it is building it’s own index or just spam hunting.  I certainly do not know.  If it were me, I would not be building a search engine user base successfully, and expect somebody else to provide the search feed forever.  Not when there are only two big indexes.  If I were DDG I would have a Big Plan plus I would have contingency plans about 3 or 4 layers deep.  DDG is pretty quiet about all this which is probably wise.

Qwant seems to be the other player.  Their strategy is more open:  they are actively building an index by crawling the web.  Bing is providing backfill.  It is not obvious where Qwant’s index ends and Bing’s begins.  The results are good and seamless.  With a little luck, marketing and money Qwant will eventually need to use Bing less and less.  This too is a good plan.

Fortunately, Bing is willing to sell it’s search feed to just about anyone that can pay the fee.  For now.  I do not think Microsoft really knows what to do with Bing, except to somehow milk it for all the dollars they can get until they decide what to do with it and how it fits in with the company strategy.

Search engines are important but they are not as important as they were before the social media silos of Facebook and Twitter.  You don’t need Google to find a company’s Facebook page.

If I were trying to build a search engine today those are some of the things I would try. All this could change in a year.

Also on:

I always have thought web directories should have backfill from a search engine when you search them.  Especially, general directories.

Yahoo (dubbed “Mighty Yahoo”) was the first directory to add a backfill provider on it’s search results. First Alta Vista, then Inktomi, then Google, then Inktomi again provided backfill results for Y!.  Snap/NBCi directory had backfill and so did Looksmart.

How backfill worked was somebody would search the directory, when the directory ran out of listings that matched the query, you would see search results from the backfill provider.  (Being Mighty Yahoo’s backfill provider cemented Google’s reputation and proved to be one of the biggest mistakes Yahoo ever made.)  What was important is the search portal needed to provide something in a search that would satisfy the person searching, otherwise they would start going elsewhere. And they could go elsewhere because there was genuine competition in web search back then.

It gets harder for niche directories.  For example: if you go to a Star Trek niche directory and search for “uniforms” it is assumed you are only going to get uniform sites that are related to Star Trek.  In this instance you don’t want a general backfill giving results for police uniforms or nurses uniforms.

For backfill to work on a niche subject directory you almost need a place to put in an extra search “slug” to bring it into context.  So in our example, you need to have a place in your admin panel to add “star trek” as a term to the backfill.  So that somebody searching for uniforms on your Star Trek directory are automatically searching for “star trek uniforms” on the backfill.  The “star trek” is always added to the backfill.

It gets even harder for a local or geographic directory.  Frankly I think backfill should not be used with a local directory.

I did some looking around, Google and Bing are way too expensive to get a search feed from.  Smaller engines like Mojeek.com (or Mojeek.co.uk) or Gigablast.com provide search results API’s.  Both are affordable, but Gigablast’s is really really affordable.

Of course, I don’t know how to actually code this.  That’s what coders are for.  😀

But the trick here is this:

  • The more listings you get in your directory the more users will use the search function.
  • You have to have a search function.
  • No matter how sophisticated your directory search is, you have a finite amount of listings.
  • Mobile devices make the search function even more important.
  • This is why most will need backfill.