Most people like Google because it’s easy to use, it’s fast, it has a huge database, and — most important — it works. Remember when Google hit the scene? It was 1998. Stanford computer science Ph.D. candidates Sergey Brin and Larry Page were working on a class project to identify meaningful patterns in Web link structure. They became fascinated with analyzing “backlinks” (pages linked back to a site) and realized these backlinks could help build a better mousetrap.
What’s in a Name?
“Google” is a play on the word “googol,” which was coined by Milton Sirotta, nephew of American mathematician Edward Kasner, to refer to the number represented by one followed by 100 zeros. Google’s use of the term reflects the company’s mission to organize the immense amount of information available on the Web. When Brin and Page presented their idea to the first angel investor, this investor wrote the check to “Google Inc.” After thinking about it for weeks, they figured they’d better open an account in the name of Google Inc. to be able to cash the check. So the legend goes…
The PageRank Phenom
It’s been said that Google changed the face of Internet search with an algorithm known as PageRank. The PageRank algorithm was definitely a technological breakthrough, as most major search engines now use link popularity as part of their relevancy algorithm. So how does it work?
“Google’s PageRank search technology works by first identifying the link structure of the entire Web, then ranking individual pages based on the number and importance of pages linked to them,” said Google software engineer Matt Cutts. My perception when talking to Cutts was that importance (the popularity and relevance of the backlink) counts more than the number of backlinks.
Is There a Weak Spot?
If any, it is that Google works better on searches for specific information (such as “rainfall in Hong Kong”) than for general information (such as “Bible”), because search results aren’t categorized, making the results a bit unwieldy for broad search terms. The Google directory helps, as the directory results appear above all search results.
Newer search tools, such as AllTheWeb, Teoma, and WiseNut, classify their results by category. For instance, Teoma divides search results for “Bible” into the following folders: Bible Study; King James Version; Holy Bible; Virginia Textbook; Bible Prophecy; Versions, Search; Biblical Resource; and First Letter. Most searchers, as a rule, don’t narrow down their queries properly because they’re not used to conducting research.
Can you get greater relevance by categorizing results, and, if so, will Google follow the trend toward categorization? “Google is in its second generation of experimenting with category-based results,” explained Cutts. “Users apparently do not like having too many category options, but presenting clear and concise categories is important to users.”
The Road to Success
Google achieved its success and profitability through two sources of revenue: advertising and search services. The AdWords program is targeted and effective, currently yielding up to five times the average click-through rate (CTR) for traditional banner ads. Cutts reiterated the Google mantra: We do not offer paid inclusion.
For additional revenue, Google provides search services to major Web portals and corporate Web sites. It has over 130 customers in more than 30 countries. These customers include Yahoo and its international properties, Sony and its global affiliates, AOL/Netscape, Cisco Systems, and others. These partners pay Google an upfront search service fee and per 1,000 results delivered to power search on their respective portals or corporate Web sites. For every search conducted on partner sites, Google receives a fee.
The Enhanced Google Toolbar
Since Google released a beta version a few months ago, several million people have downloaded the Google Toolbar. The toolbar allows users to vote on site popularity. This could give Google a reading on site popularity based on opinion rather than link structure alone. However, selective bias is a problem.
You can download the beta version, which allows you to rank search results with a voting button. When asked about incorporating this info into the algorithm, Cutts said, “Rather than using the votes to tinker with the specific rankings of particular pages or sites, the feature would most likely be used to bolster the relevance of overall results.” Cutts indicated that data collected so far is promising, but it would take months before the preliminary data could be of conclusive value.
How Does Google Rank Web Sites?
Basically, it ranks sites by the words listed on each page and the key phrases used in the page’s title and description. The spider looks at about 25 factors, including the keyword and description meta tags. It also ranks the page’s popularity, which is determined by the number and importance of sites linked to the page.
When asked how to gain high rankings, Cutts replied, “The guidelines are pretty simple: Stay away from hidden text, hidden links, cloaking, sneaky redirects, lots of duplicate content on different domains, and doorway pages. Webmasters should also stay away from programs that send automatic queries to Google. The worst thing you can do is try to cheat: Shortcuts to boost PageRank or rankings usually do more harm than good. Even if an SEO [search engine optimizer] does think he’s found a shortcut, about two-thirds of the time it may be a sting operation. Don’t bother with link exchanges, signing guest books, or other tricks — the best use of a Webmaster’s time is building good content — and honestly promoting their [sic] site. When Google punishes spam like cloaking, we sometimes take out not only the cloaked domain but the SEO’s client as well.”
A Look Into the Future
Google is working toward providing a deeper, fresher, and more personalized index. “The future will be about features and more about the overall usefulness of an engine,” said Cutts. “We believe users want relevancy, but they also want quick, clean results with proven integrity,” he added. When asked about XML, Cutts replied, “Not any time soon. The main benefit of HTML is that anyone can write it. That’s part of why the Web had such meteoric growth. XML is great for machine-to-machine communication, but it’s much more difficult for a person to produce by hand.”
During the coming year, Google hopes to increase its lead across the board. “We’ll be introducing new ways to search. We don’t want to give away any secrets, but Google will provide many helpful surprises in 2002,” volunteered Cutts. I understand the company’s focus will be on search and the user experience.
Google does its share of indexing the deep Web by rolling out support for hundreds of file formats found there: PDF, RTF, PostScript, Word, Excel, PowerPoint, and more. It crawls millions of dynamic pages. Google indexes 3 billion Web documents every 28 days and conducts a fresh crawl of more than 3 million important Web pages each day. Google’s news crawl provides up-to-the-minute headlines for news queries, and a subset of its fresh news content is available here.