The ‘Perfect Page’ Test

BY Chris Sherman and Danny Sullivan

How effective are search engines at finding “ideal” search result pages? We asked the readers of SearchDay to suggest their ideal pages. Then, Chris Sherman and I tested the major engines to find out.

We all have our own ideas of what constitutes a perfect search result for favorite queries. In the best of all possible worlds, these “obvious” pages would appear in the top 10 results for the “obvious” search terms.

We asked readers submit the ideal search result pages by sending URLs only, leaving it up to us to guess the queries that would cause these pages to make it into the top 10 results. From dozens of reader submissions, we selected 10 Web pages to use as test cases.

Initially, suggestions were screened for quality and reputable information. We selected well-known sites and relatively obscure pages to see how the engines handled different types of Web content. Then, before testing, we decided which query terms a “typical” searcher would use when seeking information related to each page.

Does this test reveal which search engine is best? Not at all. Search engines can be tested in numerous ways. This test does not measure results’ freshness. It doesn’t examine the overall quality of all results. A search engine can find one of our “perfect pages,” yet deliver nine other matches of poor quality.

So what good is this test? We believe it provides a rough idea of how search engines measure up in relevance. Going forward, we plan to run further perfect-page tests along with other methods of rating the search engines. Over time, we hope this will all add up to a battery of measurements that will assist you in making your searching choices.

The Engines Tested

We tested popular, well-known search engines, including Ask Jeeves, AllTheWeb, AltaVista, Google, Inktomi, Lycos, MSN Search, and Yahoo Google scores are relevant to AOL Search and Netscape Search.

HotBot was not tested, because its owner Terra Lycos has said the search engine is about to undergo a major change in the near future. Thus, we felt it was not worth spending the time to test it now. For the same reason, we also omitted Wisenut, after owner LookSmart informed us it was also about to undergo a major overhaul. We also opted not to include any meta search engines in this test.

We ran the test on Thursday, October 17. Results may not be replicable today, due to recrawling and reindexing by the search engines. That may change the relevance rankings of our selected pages.

The Perfect-Page Test Results

Our goal was to evaluate the main or primary editorial results provided by search engines themselves. We excluded paid-placement listings.

Scoring was simple. We awarded one point if an engine returned the perfect page in the top 10 results for our query. A half point was awarded for a related page from the same site in the top 10 result list. We also awarded half points for a few unique results.

In one case, Lycos returned a “no results found” message, then later returned the expected page in the number two position, so we awarded a half point to accommodate the glitch. In another instance, both Ask Jeeves and AltaVista failed to find the ideal page but gave the top result to a closely related companion Web site operated by the same government agency as the ideal page.

Google, MSN Search, and Yahoo were the top performers, each with a 9.5 out of a possible 10. AllTheWeb was next, with 9. Inktomi and Lycos tied, with 8.5. Ask Jeeves scored 8; and AltaVista scored 6.5.

Based on these results, we assigned the following letter grades to the engines for the perfect page test:

  • A: Google, Yahoo, and MSN Search

  • A-: AllTheWeb
  • B: Inktomi and Lycos
  • B-: Ask Jeeves
  • D: AltaVista

(Test criteria and detailed results can be found here.)

Overall, the search engines did quite well finding the ideal pages suggested by readers. We were pleasantly surprised such a high degree of correlation existed between queries we invented before trying the test searches and the selected perfect pages themselves.

In rechecking our work, we noticed significant improvements in Ask Jeeves results — enough to boost the search engine’s score firmly into the “A” range. The Jeeves/Teoma crew had no knowledge of our tests. Improvements were the result of a quiet upgrade we learned of later. As all scores were calculated from tests run on the same day, we couldn’t change Jeeves’ score, despite observing improved results.

Out of curiosity, we also ran our queries at Overture, to see how the test would work for purely paid listings. None of our perfect pages came up in the paid results. Of course, Overture also provides unpaid listings after its paid results. If these are counted, then 4 of our pages made it into the top10, nonetheless giving Overture what would have been an F grade.

Of course, Overture doesn’t really intend for users to perform searches at its site, where there is no attempt to ensure a strong balance between editorial and paid listings. That’s the reason we didn’t measure it against other search engines on our test scorecard. Overture is really a “search provider,” sending its paid listings to search destinations sites such as Yahoo, which blend the paid listings with editorial picks.

In short, while paid listings are useful, the testing in this case shows why a good search engine will want a careful blend of editorial results as well.

For those of you who have settled on a favorite search engine, this test illustrate other search engines are viable choices. No engine consistently returned the target pages as the number one result.

We plan to run much more rigorous and methodical tests in the future. We’ll also publish comments we receive from search engines included in the test, should they offer any feedback.

Related reading

site search hp