A couple weeks ago Jason Kottke posted a complaint about Technorati. Its search results are slow, non-comprehensive, of mediocre relevance, and can’t even manage one nine of reliability. Technorati’s competitors all have the second problem and have or will likely have the others as they grow.
Kevin Burton would prefer blog search to aim lower:
I’d rather have a Technorati that was fast and always worked even if that meant only indexing 1M blogs. Even 500k blogs as long as they are the top 500k blogs.
Sounds like a reasonable tradeoff, but it’s completely unacceptable. What if Google had decided to index only 100M web pages in order to stay fast and reliable? Google would no longer exist. (Also pretend you read something about the long tail of the blogosphere here.)
Only one of thirty trackbacks to Kottke’s post states the obvious:
When I first encountered RSS search engines a few years ago while at Yahoo! I wondered how they could survive. The difficult part of RSS search isn’t the RSS, it’s the search. Search is hard. For Google or Yahoo!, adding RSS to search is trivial. It’s just another data source. And yes, setting up a ping server is different from crawling links, but not any harder and once you get the content, it’s indexed in basically the same fashion. But for Technorati, adding world class relevence, freshness, comprehensiveness and scalability to RSS is an almost insurmountable effort.
(Possibly two, but this one is mostly in Chinese. Google’s beta Chinese-English translation says in part “very many people anticipates Google/Yahoo can provide the even better function.”)
I hope Technorati, PubSub, IceRocket, BlogPulse, Feedster, et al do well, but my expectation is for one or more of Google, Yahoo!, or Microsoft to introduce a superior blog search service and eventually for blog search to be an anachronism, subsumed by web search (though I want every site and page to have a feed, so web search should become a bit more like blog search). I want to comprehensively track a webversation starting at any URL, and that requires something that can pass for a comprehensive web index.
Here’s a graph from Alexa showing the “reach” of Technorati and (clearly less popular) competitors:
For comparison Alexa says that Google is used by (only?) a little over one in five browsers a day (over 200,000 per million):