The end of the index size wars (we hope)
By Charlene Li
Speaking of birthdays, Google used the occassion of its 7th anniversary to announce that it's grown its index 1,000 times since it started back in September 1998. It also dropped an intriguing tidbit that it believes its index is three times bigger than the next competitor, but declines to specify exactly how big it is. Here's their explanation on how big their index is:
So how big is Google's index?
Search engines' published metrics for index size measurement vary greatly and are no longer easily comparable. Often, for instance, web crawlers retrieve duplicate entries for one page or links to documents that they haven't crawled, and whose content thus isn't in the index. At Google we believe the essential quality of an index isn't the total number of documents, but its comprehensiveness – which unique documents are in the index. So we don't count duplicate or uncrawled pages. According to our internal testing, our newly expanded search index is more than three times larger than that of any other search engine.
Now I call that pretty clear non-answer. And if you look at the home page, the index size indicator is now gone (and also has seven birthday cake slices)
Now, I applaud that Google has stepped back from the brinksmanship it played in August when Yahoo announced its index was north of 20 billion documents. I've written that index size is only one factor and that many other factors contribute to relevancy. In fact, in a world where personalized search will be the norm, search engine usage, loyalty, and even tagging will be more important than index size.
And so ends an era of index envy and "mine is bigger than yours" comparisons. Thank goodness - we can now just focus on the real issue of which sites are actually serving users. I just find it ironic that the company that made its name on index size finally had to concede to the truth that verifying index size is impossible. And yet, even in this announcement, Google can't help but mention that they *still* think they're search engine is three times bigger. So rather than put out a "number", Google will keep everyone guessing. How long do you think it will be before some journalist does some math and writes that Google's index has 60 billion documents? Hopefully, never.












Charlene,
I agrees, it is not the size which imports: it is the relevance! However, it is tempting to say that Goole reveals ALL the Web - for the idea - but as a user, it is the quality which I assess.
Posted by: Nina | December 11, 2005 at 04:27 AM