Two researchers from the NEC Research Institute in Princeton, NJ, Steve Lawrence and C. Lee Giles, have published a paper in Nature Magazine that tells everyone something you already know. That is, search engines don’t find that much.
Not only do most of the popular search engines index less than 10 percent of the web (Yahoo, the index created by human hands, indexes just 7.4 percent), but they say the engines are biased because they are most likely to index popular U.S. commercial sites. (As my teenage daughter might say, “Like, DUUH!”)
Given the huge growth of the web and the concerted efforts of “Spamdexers” to make sure client pages are not only indexed but that they’re found first, this is as surprising as news that the sun comes up in the east. But the wire services are spinning this one as another web horror storyMost of it (presumably the good stuff, most likely your stuff) can’t be found.
I don’t use Spamdexers (although I did once review “WebPosition Gold,” a Spamdex software package), so I decided to conduct a little experiment. I searched for my own newsletter, a-clue.com, on all the major search engines (and a few minor ones).
I wasn’t interested in seeing that I was indexed based on a relevant keyword like “e-commerce newsletters” or “great web stuff.” I avoided the keywords I used in the WebPosition review last year. Instead I just searched for the main part of the URL, a-clue.com. Here’s what I found:
Despite 30 months of weekly publication, and my 1997 submission of the site, Yahoo still hasn’t found a-clue.com. But my fans should not despair. The Inktomi search engine, which finds web pages using a computer, did have it. It was even the number three listing. This makes me wonder why Yahoo hasn’t used Inktomi to improve its indexing capabilities, but their profits beat street estimates this week. The stock is up. So who cares?
Next, Excite. I like Excite. They not only found a-clue.com, but the home page was the number one hit. And I’ve never submitted a-clue.com to Excite.
Some engines couldn’t find a-clue.com at all; not on the first page of listings anyway. By my reckoning, Alta Vista, Dogpile, Direct Hit and LookSmart all failed this test. But overall results were pretty good. I’m number one with Lycos and Northern Light, third with Google, and fifth with Thunderstone.
Perhaps the most surprising finding was that a page of feedback from Chris Tyler (previously online November 30 of last year) is apparently easier for search engines to find than my main page. Hotbot has got that as its number three hit, as did GoTo.Com, and MSN found that page at number two. A feedback item from September 14 was found first by Infoseek, listing my home page as sixth.
What did I learn from all this? If you’re in e-commerce, it doesn’t really matter whether search engines have indexed the whole web, so long as they can find your site. Check your own site in this way and let us know what you find…you may be surprised.