If I were building a search engine…
- You need to start building your own index of the web. That means you need a crawler (robot) and it needs to be good. It takes time to build an index and it is not cheap.
- I would gather in info from other providers: Wikipedia, Wolfram Alpha whatever I could cobble together.
- Until your own index is ready, you need to have a search results from a big search engine. There are only two left: Google and Bing in English.
- What happens if Google or Bing refuse to sell you a search feed or if they refuse to renew because you are getting too big? You could probably hammer together a good blended backfill feed by combining Mojeek, Gigablast and Yandex using your own algorithm. You either lead with your own results and use the other three as backfill or you blend your own results in with the three others in a sort of seamless metasearch. (You do need to plan for “what if” scenarios.
- All the above is to buy you time while you learn and refine how to crawl the web on your own.
- Eventually you roll out your own search index as the backbone of the search results with the others maybe on standby for queries you are weak on.
Qwant seems to be the other player. Their strategy is more open: they are actively building an index by crawling the web. Bing is providing backfill. It is not obvious where Qwant’s index ends and Bing’s begins. The results are good and seamless. With a little luck, marketing and money Qwant will eventually need to use Bing less and less. This too is a good plan.
Fortunately, Bing is willing to sell it’s search feed to just about anyone that can pay the fee. For now. I do not think Microsoft really knows what to do with Bing, except to somehow milk it for all the dollars they can get until they decide what to do with it and how it fits in with the company strategy.
Search engines are important but they are not as important as they were before the social media silos of Facebook and Twitter. You don’t need Google to find a company’s Facebook page.
If I were trying to build a search engine today those are some of the things I would try. All this could change in a year.