Screw Size! I Dare Google and Yahoo! to Report on Relevancy

Ah, summer. Time to play on the beach, head out on vacation, and, if you’re a search engine, announce to the world that you’ve got the largest index. –“Search Engine Size Wars & Google’s Supplemental Results,” Search Engine Watch, Sept. 3, 2003

The quote above is from an article I wrote after Google and AlltheWeb played a game of “who’s biggest” in August 2003. They’d done the same thing in August 2002. Now here we are in August 2005, and it’s a spat over size once again, this time between Yahoo and Google.

I can’t believe we’re going through this again. This is Search Engine Size Wars VI, by my count. It’s absurd. It’s annoying. It’s a friggin’ waste of time. Instead of advancing to a commonly accepted relevancy figure, the search engines keep us mired in the mud of who’s biggest.

Who’s biggest really doesn’t matter, as I and others have written so many times before. There are many reasons:

  • You need the whole haystack! If I dump it all on your head, can you find the needle?
  • If I have lots of documents but they’re all near duplicates of each other, is that good?
  • How much of a document is indexed — 101K, 500K, 1MB?

Pick your metaphor, explanation, qualification (Gary Price gives you even more here). We’ve been through it before.

Nothing’s changed. Size hasn’t suddenly gotten more important. But for the first time, one search engine is strongly disputing the claims of another. Google doesn’t believe the figures Yahoo is bandying about. Yahoo has steadfastly stated it’s not lying.

Let’s do some testing, then! Let’s come up with some standards! Let’s audit the figures! After all, it’s been discussed since 1999, when Northern Light wanted to say definitively it was biggest. Surely it’s time for that to happen, right?

No, it’s not. If the search engines are going to come together to figure out a standard on something, they should move forward. Come up with a way to test relevancy. That’s what matters, not this squawking over size.

I wrote about relevancy back in 2002. I looked at the need for a relevancy figure and why, without it, we’ll continue to have search engines use surrogates such as size for relevancy:

A relevancy figure would free us from search engines playing the size or freshness card to quantify themselves. Yes, a large index is generally good. Yes, having a fresh index is desirable. Neither indicates how relevant a search engine is.

Now it’s 2005, and size is again pushed in our faces. No, Yahoo didn’t issue a release on it. But it knew the reaction it would get by announcing via its blog it’s twice as big as Google. Google, of course, has pulled out all the stops in lobbying Search Engine Watch and analysts to poke hard at the Yahoo numbers. It doesn’t want to be viewed as second best in any area.

The irony is deep. Google never provided any proof when it trumped others on the size front. MSN said it’s at 5 billion in November, and Google magically announced on its home page it’s at 8.1 billion. Though MSN didn’t seriously question whether Google was larger, plenty of other rumblings were heard about how the count might not be correct. But since it trumped everyone else, Google apparently didn’t feel the burning concern it now has that size should somehow be verified. Maybe Yahoo isn’t at 19 billion. But maybe Google isn’t at 8 billion, either.

This game will go on until someone’s brave enough to change the rules. I’m daring either leader, Google or Yahoo, to do just that. Both say size is one of many factors to consider. Both tell you relevancy matters most. Prove it!

Ideally, I want to see the major search engines come together to develop a unified, accepted way to measure relevancy in various ways: Web search, local search, advanced queries, whatever. Establish a research center, a consortium or something, and a methodology all agree upon. Then, test every four to six months and pledge you’ll publicly accept the results. Someone wins? Kudos! Didn’t win? Do better next time.

That’s the challenge. Let’s see if someone steps up. As for size, Gary and I will revisit the various claims and counter-claims in more depth later. In the meantime, some past reading on the subject of size and the complications in measuring it:

  • New Estimate Puts Web Size At 11.5 Billion Pages & Compares Search Engine Coverage” has an estimate of what search engines cover compared to self-reported claims. Despite the Ask Jeeves connection, that service doesn’t come out on top in terms of size.
  • Search Engine Size Wars V Erupts” covers the self-reported figures and battle we had between Google and MSN last November, along with such issues as how much of a page is actually indexed.
  • Search Engine Size Wars & Google’s Supplemental Results” covers more on deconstructing index size claims from 2003.
  • Search Engine Sizes” has more articles than you can imagine covering size issues over the years.
  • How to count URLs” is an archived page of what Excite used to do back in 1996 — 1996! — to explain how it thought counting should be done. Others and I have written how, in many ways, it feels like we’ve gone right back in the 1990s’ big circle of portal features being rolled out, land grabs, and inflated valuations. The size dispute is just another spin of that wheel.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Related reading

Brand Top Level Domains