Make Room for Teoma

After the loss of Go earlier this year and the expected departure of NBCi, you might have thought it was all over in the search engine game. However, just as consolidation seemed inevitable, new player Teoma has stepped up with an impressive debut of its new search service.

Opened to the public last month, Teoma leverages link structures from across the Web to not only provide relevant results but also automatically present different views of information.

Currently in beta, the site is primarily intended to demonstrate Teoma’s technology to potential partners or buyers.

“We’re in discussions with many of the major portals and also the major technology companies,” said Paul Gardi, Teoma’s president and chief operating officer.

The idea of running Teoma as a standalone search engine for the public hasn’t been ruled out. Even though the current site is designed as a demonstration, it’s already powerful enough that searchers may want to add it to their research arsenal.

Teoma is a crawler-based service and has a collection of about 100 million URLs. Of course, to be a serious contender in the search engine space, Teoma will need to grow, and it is planning to do so.

Database size is an important factor, of course, but without good relevancy ranking, a large index isn’t necessarily useful. Teoma hopes its own style of link analysis will give it the ability to take on the widely acknowledged relevancy leader, Google.

To understand what Teoma is doing, it makes sense to summarize the Google system first. Google examines link structures all over the Web. By doing so, it can give every page a popularity rating known as “PageRank” (named after Google cofounder Larry Page). When you do a search, URLs with high PageRanks are more likely to be listed first. However, this will only happen if the pages also match other criteria, such as containing your search terms or being relevant to your search terms (determined by analyzing the context of links).

Teoma operates in an opposite fashion. When you do a search, Teoma looks across the entire Web to find pages that contain your search terms or that are considered relevant to those terms based on link context. After finding a matching set of documents, which it calls a “community,” Teoma then examines the links just within this set to determine which are the most popular.

“At the end of the day, we are ranking sites based on other sites that are on the subject,” Gardi said. “We don’t only use all the sites that are pointing at a site, we also use [those] that are on the subject.”

The implication is that Teoma’s community-generated results will be more relevant than those from Google or others that use a global system that examines the entire Web, because links from irrelevant pages are excluded.

However, this description understates what Google does. Yes, PageRanks at Google are computed from examining the entire Web, but link context and the content of Web pages are also taken into account and ostensibly reduce the impact of “irrelevant” pages in Google’s system. “Topic-specific PageRank versus general PageRank, I’m not sure how much of a difference there is,” said Urs Hvlzle, a Google Fellow and the company’s former vice president of engineering. “Suppose you search for something about ice hockey. The sites that come up, where are they getting their PageRank from? Most likely, other ice hockey sites.”

Teoma also uses its link analysis system to create its unique “Experts’ Links” and the autoclassification of pages into topics.

Let’s take Experts’ Links first. When you search at Teoma, a list of Experts’ Links appears along the right side of the page. These listings are pages that provide links to a wide range of resources on particular topics. In other words, these are “link links” or Weblogs for a particular subject.

Here’s another way to think of it. If you go to Yahoo and search for something, you’ll usually be lead to a matching category that lists a variety of Web sites on your search topic. Other people create these type of topic-specific lists, and Teoma’s Experts’ Links area is designed to help you easily find these types of resources from across the Web.

Teoma’s other special feature is the autoclassification of Web pages. At the top of Teoma’s results page is a section called “Web Pages Grouped By Topic.” Underneath, all the pages found that match your query have been grouped into broad categories. You can click on a category link to narrow your focus, and you can drill down further, as desired.

Fans of Northern Light will see similarities between this and Northern Light’s “Custom Search Folders” feature, which also groups results into categories in real time. A key difference is presentation. Northern Light’s folders, which have constantly been a useful alternative way to scan results, have always been tucked off to the side of the main results. Teoma’s categories are front and center, which will likely increase their use.

To perform the categorization, Teoma looks at the results set, then seeks out “clusters” or communities of pages that link to each other. When these clusters emerge, the link text is analyzed to find the most common words, which are then used to describe the category. This use of link analysis is also different from the pure text analysis that Northern Light does, Teoma says.

How about Teoma’s main results, the “Web Page” section — what’s there? These are the pages that are more likely to answer your questions, in contrast to the Experts’ Links pages, which don’t provide answers but may lead you to pages that do.

Teoma grew out of a federally funded project in 1998 at Rutgers University. The Teoma technology team is led by Professor Apostolos Gerasoulis, who now serves as Teoma’s chief technology officer, and Associate Professor Tao Yang, from the University of California, Santa Barbara, who is chief scientist and vice president of research and development. Now a private company with funding from Hawk Holdings, Teoma hopes that it will establish some portal partnerships within the coming months. If not, then the Teoma site itself is likely to be expanded beyond the current demo.

The company is also considering providing enterprise and site search services in the future as well as licensing its categorization tools to those who want to create their own directories or vertical portals.

Related reading

penguin-4-0
bbc
click
/IMG/581/253581/amazon-logo-com-uk-320x198
<