Inktomi just rolled out the latest version of the search engine it provides to partners such as MSN. Is "Web Search 9" better than previous incarnations? Better than its major crawler-based competitors Google, AllTheWeb, Teoma, and AltaVista?
"Inktomi meets or exceeds all other search engines in the key metrics of search performance: relevance, freshness and index size," says Inktomi of the changes.
Inktomi cites figures for two metrics. For freshness, it claims to revisit all documents in the index at least once every two weeks. For size, it claims to index 3 billion Web documents, a new record. Relevance? Inktomi has no figures.
Inktomi's not alone. AltaVista recently had a major relaunch. A press release says AltaVista provides "a new index that delivers unique, highly relevant results." Figures to back this up? Nope.
Where are relevancy figures? Relevancy is the most important "feature" a search engine can offer, but there's no accepted measure of how relevant search engines are. Turning relevancy into an easily digested figure is a huge challenge, but one the industry must overcome for its own good and that of consumers.
Measuring Relevancy: Informal Tests
There are many ways search engines can be tested, some good, some bad, and some that work in particular situations.
"Anecdotal evidence" is when someone reports a general impression of a search engine or results of an isolated search. Many praise Google for result quality. Anecdotally, Google is great. That doesn't mean it's best. When some complained last October Google's relevancy had "dropped," they gave anecdotal evidence. Defenders used anecdotal evidence to dispute the claim.
"Mega search" is a test style I find particularly bad. A search is performed, and the matches each search engine found for the query are totaled. The engine with the most matches "wins." Quality isn't a factor.
"Ego search" is often performed by journalists. You look for your name. If you fail to come up number 1, you conclude the search engine's relevancy is poor.
True in some cases. If I search "bill gates," it's reasonable to expect to find Microsoft Chairman Bill Gates's official Web site. What if you aren't as well known or have a common name? What if your site is in a free hosting service used by spammers? These issues push you down, for good reason. Condemn a search engine based on one search? I've seen it happen.
Search Engines' Tests
Overture performs an internally run relevancy test I describe as "binary search." Users are shown listings for a query, then asked if they like the result and if they consider it relevant. The binary choice answer is yes or no. Of 100 results, if 95 are considered OK, you claim 95 percent relevancy.
Sounds great, but no nuance, no sense of whether things "more relevant" to the topic are missing. It's equivalent to asking people to eat different cakes and asking if each is edible. Edible is fine. You want to know who serves the best cakes, consistently.
AltaVista runs internal relevancy tests in which its listings are compared to competitors'; branding and formatting are stripped away. Users don't know which search engines are involved. They rate which results are better or whether the sets are equal.
There's a sense of "binary" in this testing. Is the user knowledgeable in the subject she's searching? If not, she may be unaware of important sites missing in both result sets. So both might be deemed relevant, although an expert in the subject might consider the relevancy to be poor.
Searcher subjectivity is a big relevancy testing challenges. Two people could search for "dvd players," one looking to buy a player, the other wanting to learn about them. The first person may be pleased to get listings dominated by commercial sites selling DVD players; the second might prefer editorial-style listings of reviews and explanatory pages. What's relevant depends on mindset. It's important to test relevancy for commercial and noncommercial intentions.
In what I call "goal oriented testing," subjectivity is more controlled. A query is done for which a particular page ought to appear, according to most people. "Company name tests" are like this. A "microsoft" search ought to deliver a link to the Microsoft Web site. Few would dispute that.
The problem with company names as goals is they really only test "navigational" aspects. Most people searching for a company by name probably want to reach that company, to navigate to its site. An important function for a search engine to fulfill, but only one type of relevancy.
The "Perfect Page Test" we performed recently is a different type of goal-oriented searching. We came up with a list of Web sites most people knowledgeable about different topics would agree should be present for certain queries. The problem with the test (as we pointed out) is it doesn't measure if other listed sites are good. A search engine failing to bring up the target page for a particular query could get a bad score yet still have nine other highly relevant findings.
Limitations of this test are one reason why we didn't trumpet the winners. The fact Google, Yahoo, and MSN Search got As doesn't mean all their results are A quality, any more than AltaVista getting a D means all its results are D quality. It only means in a limited, particular, narrow test, that's what was found.
Needed: Many Tests and Public Results
Ideally, you need a battery of tests in which tens, hundreds, or thousands of queries are run and examined, tested in different aspects. How did the query perform for a search in product mode? How does this search engine handle a navigational query? What's an ego search bring up for prominent people? And so on.
The solution is for search engines to agree on testing standards, contract to have them regularly performed, and agree to publish the findings -- no matter what. The job should be assumed by a testing organization.
Publishing the data is important. Some companies that contracted for testing in the past refused to allow results to be made public if they did poorly. If search engines want us to take their claims of relevancy seriously, they must agree to release the good and the bad.
Spare Us the Noisy Stats
Why is a relevancy figure so important? First and foremost, it would bring greater awareness of the choices in search.
I love Google's relevancy and sing the praises of its work to improve search standards. It's a driving force in raising search quality and deserves its success. However, Google has very good competitors (AllTheWeb, Yahoo, and MSN Search). Some may -- gasp -- have a search algorithm that works better for some users, or search assistance features Google lacks, or an interface some might prefer.
Some consumers never bother trying other search engines. They've been told or convinced Google's best. Published relevancy figures could change that. If relevancy testing finds Google and its competitors are at roughly the same level, users might be willing to experiment. They may choose a search engine for certain functions, such as search term refinement options, the ability to see more results at once, or a spell checker.
A relevancy figure would free us from search engines playing the size or freshness card to quantify themselves. Yes, a large index is generally good. Yes, a fresh index is desirable. Neither indicate how relevant a search engine is.
Do It, or Have It Done to You
Ultimately, if search engines fail to come up with a means of measuring relevancy, they'll continue to be measured by one-off ego searchers or rated anecdotally.
Don't forget to vote for your favorite marketing technology solutions!
Meet Your Favorite ClickZ Contributors
Many of ClickZ's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Jeremy Hull, Lisa Raehsler, Andrew Goodman, Bryan Eisenberg, Mathew Sweezey, Aaron Kahlow, Stephanie Miller, Simms Jenkins, Jeanne S. Jennings, Dave Hendricks and more!
Danny Sullivan left Search Engine Watch as of Dec. 1, 2006.
March 19, 2014