Part 1 of this two-part series on the search engine size wars discussed why bigger is not good indicator of better when it comes to the major search engines.
What’s a good way to measure comprehensiveness? Look at actual search queries that come back with zero or relatively few results. This is better than “rare word” testing because those searches are artificial.
Real queries reflect actual needle-in-the-haystack hunts. Look at this type of failure or low count queries at Yahoo and see if Google returns matches. Do the same with Google queries on Yahoo
A real life example: A few weeks ago, the kids and I were playing “Lego Star Wars” on our Xbox. We needed one last “minikit” piece.
I turned to search. I honestly can’t remember at this point exactly which search engine I used, nor the exact search terms. As best as I can reconstruct, I ended up with something like [lego star wars defence of kashyyyk minikit]. That’s the U.K. spelling of defense, as we’re playing the U.K. version. I seem to recall getting practically no Google or Yahoo results, but one did get me what I was looking for.
Here’s the search on Yahoo Only three results, and the first page has the answer. That’s finding the needle in the haystack. Here’s the Google search. Over 100 matches, though some duplicates are counted toward the end.
Google did find the page with the right answer, on the first page of results. Needle in the haystack found, at both Google and Yahoo
Another real life example: My mother-in-law was visiting today. Her glasses were bent. She’d sat on them accidentally and fixed them as best she could, but no U.K. eyeglasses place could repair them properly. That’s because she’d bought them on vacation in Northern Cyprus a couple of years ago.
I suggested she send them back for repair, but she couldn’t remember the place’s name. So my wife decided it was time to introduce her to Web search (my mother-in-law has yet to get a computer herself).
They tried a couple of searches for “northern cyprus optician” and more specifically, on Google, for kyrenia northern cyprus optician. It returned 27 results. The second listing had the name, Akay Optik, which rang a bell with my mother-in-law. Success!
The listed page no longer existed. Yet they could still see the optician’s name in Google’s listing. Using the name, “akay optik” got them to the right place. (When I looked, it became clear why they didn’t find the site more easily. It’s a graphical site, largely invisible to search engines).
An initial search with 27 results got the needle in the haystack. Over at Yahoo, the same search came up with six results, missing the needle we needed. In this instance, Google was more comprehensive.
Proof Google is more comprehensive, both in numbers and in quality of actual results? Hold on. If I vary the search, Yahoo pulls through. A search for opticians in kyrenia brought up this page as number two, an excellent overview of services in which Akay Optik was easily found (yes, Google has it too).
How Do You Prove Comprehensiveness?
In the end, I hope some of the above helps illustrate why counts alone aren’t proof of comprehensiveness. They’re prone to all types of errors in terms of how you define a page or duplicate page, the depth of the page indexed, not to mention whether the page really is of enough quality to produce expanded comprehensiveness, rather than just a larger count.
If you can’t rely on counting, how does any search engine definitively prove it’s more comprehensive? I asked Yahoo, in response to the AP article cited above. The response was it was given in the context of counting and self-reported figures. If you believe counts equal comprehensiveness — and you believed the counts both Yahoo and Google provided at that time — they were the most comprehensive by that measure.
That aside, they could simply say they were bigger than they were before and felt they were more comprehensive than they were before. Whether others found them more comprehensive remained to be seen.
Google’s Recent Claim
That leads to Google’s recent claim. First, it’s making the claim as part of its seventh birthday celebration.
Google says it’s now 1,000 times larger than it was when launched. I’m not going to do the math, as part of my “get away from the counts” attitude. But Google isn’t providing a figure, either. It’s only saying it’s three times larger than the closest competitor (though Google’s not naming Yahoo).
Counts Can’t Be Compared
Google’s not releasing a count. Why not? It feels coming up with count comparisons is too difficult. It’s not apples-to-apples. Google doesn’t know exactly how its competitors count documents. For the record, that’s exactly what Yahoo has told me and other analysts.
Yahoo VP of Products, Eckart Walther, told me last month, “We cannot deduce the basic documents they have in their index, and they cannot deduce the number of documents in our index.”
Even the NCSA students’ study covered this: “Although there is no direct way to verify the size of each search engine’s respective index, the standard method to measure relative size was developed by Krishna Bharat and Andrei Broder in 1998.”
No player can tell exactly how big the other is in terms of counts, but they feel they can make some guesses at relative size. Google feels it’s now three times relatively larger than Yahoo But it’s not saying it’s three times more comprehensive.
Reprise: Counts Don’t Equal Comprehensiveness
Even if Google did trot out count numbers, it wouldn’t convince me, nor should it convince others. If Google was upset over Yahoo’s earlier claim, how can it then claim to be most comprehensive without backing?
“We believe the margin of difference is large enough that users should do a few queries themselves and check it out,” Mayer said. “If it’s not a commonly occurring term, chances are they’ll be able to see a difference themselves.”
I agree. The proof is in the pudding. Rather than another round of figures and third parties trying to see if rare word lists bring up more or less than expected, let’s return the focus to the quality of the results. Quality includes comprehensiveness. So if someone devises a test of real queries, things that don’t involve rare words but instead rare information on the Web, that’s of interest.
Here’s one more example from me. My wife and I love the watercolors of a Welsh artist, Annie Williams. Two or three years ago, I tried to learn more about her on the Web. I tried all the major search engines. There was nothing. Believe me, nothing. My search skills came to naught.
I checked today. Here’s a crafted query at Yahoo, where I added and eliminated things to narrow in on Annie Williams, artist. Seven matches, but useful. The first is a gallery I may want to follow up with about where I might find an exhibition of her works. The third listing for spotjockey.com led me to a short but nice bio.
At Google, the same query returns 20 matches. A few are promising prospects; some are dead ends and blank pages. The bio page is directly on the second page of results, and there are other interesting things to explore.
Over at Ask Jeeves and MSN, I got three matches: the page for the exhibition Web site I found at Yahoo and Google, but that’s it.
Gut feeling for the query? Google is slightly more comprehensive than Yahoo, but Yahoo isn’t bad and is ahead of Ask and MSN. For this query only! For others, or other wordings, things may significantly change. That’s the challenge of declaring an overall comprehensiveness winner.
In the end, it comes back to what any long-time writer of search engine advice has long told you. The major search engines are all great resources. They find lots of things. But they may be better for some things than others because they don’t have the same listings. Use different search engines and see what fits best.
A Hearty Goodbye to Counts
Dropping the count from Google’s home page is to be applauded. It’s not been accurate. More important, it takes counts out of the equation and puts more focus on quality. I certainly would like to review any serious study on comprehensiveness. Meanwhile, I won’t miss the time spent counting pages — rather than measuring comprehensiveness. Neither should you!