SEM's Hidden Science

  |  March 20, 2006   |  Comments

Search engine marketing may be an art and a science, but the science part is greatly misunderstood. Educate yourself.

When I was studying marketing at university, there was always a lively, ongoing debate about whether it was an art or a science. Eventually, the marketing industry adopted the idea of it being both.

And there's a healthy serving of both under the marketing umbrella these days. With SEO (define), however, an extraordinarily rich and frequently complex mixture of scientific disciplines is hidden below the surface of the major search engines. It's this science I find is so frequently misunderstood, misrepresented, or just plain ignored by many in the SEO community.

The science of information retrieval (IR) predates search engines by a very long time. It's at the heart of search engine algorithms. It's emerged as the third subject, along with logic and philosophy, that deals with relevance -- a very elusive human notion.

In 1976, library scientist Tefko Saracevic traced the notion of relevance to problems of scientific communication. Relevance, he said, is considered a measure of the effectiveness of a contact between a communication's source and destination. This perfectly sums up a search engine's job for end users.

Classic IR models take nothing about HTML code, dynamic information delivery, or barriers to being crawled or indexed, into consideration. These are, in the main, minor issues when a search engine builds its index (or tiered index, as it is in fact).

As many readers are aware, I'm noted for separating the reasonably straight-forward SEO task of eliminating crawling barriers from the far more important issue of understanding ranking mechanisms. Without a decent rank nobody's ever going to find you, so there's really not much point in being in a search engine index.

IR, including ranking algorithms, is a fascinating field. I've become ultra-absorbed. My interest and research in it is purely from an online marketer's point of view, not as a researcher or scientist in the field.

I find incredible the number of people I meet at industry events who simply don't get the importance of understanding the real challenge of applying marketing communications to IR on the Web.

I have to prevent my jaw from dropping when people ask such extraneous questions as, "Can a search engine understand CSS code, Mike?" I'm dumbfounded by the number of times I hear people (often conference speakers) mention IR elements with more than mildly erroneous explanations. "Latent semantic indexing" is one term bandied around by all and sundry. Rarely do I hear it explained in its true context.

Latent semantic indexing (LSI) has been around for some time. Loosely described, it tackles the old IR problem of vocabulary diversity in human-computer interaction. Specifically, that people use different words to describe the same object or concept. At the same time, some words can have more than one meaning (and these can be semantically very different).

At times, LSI can improve the conventional vector space model (define). However, LSI's run-time performance is a major concern to search engines wishing to provide results to end users in less than a second.

With LSI, an inverted index isn't possible, as the end user query is represented as just another document. It must, therefore, be compared with all other documents. And that would take a long time for every user query. It's difficult to discuss the various methods used by search engines to index and rank documents without going into the science behind it, at whatever level.

I've often overheard SEO experts talking to potential or existing clients at a conference using snippets of IR terminology, such as LSI, in some of the most out-of-context ways: "Yes, it's a symptom called the 'sandbox.' It's because Google uses latent semantic indexing. Now..."

What the notion of a sandbox could have to do with LSI is beyond me. Not to mention the fact LSI isn't a Google thing. It's an IR thing. It belongs to the entire IR research community.

In my experience, having a general understanding of IR techniques and how they can be applied to commercial search engines (an entirely different proposition to the homogenous collections they were originally conceived for) can save an awful lot of wasted effort and mind clutter in SEO.

Such understanding also lets you see through a lot of the BS that's pitched at poor clients who are still scratching their heads trying to come to terms with the perceived technologically advanced concept of a meta tag.

There's tons of information about IR models and techniques in the literature. Much of the classic information still stands up today. In the realm of document space, however, much more research continues.

I'm extremely fortunate my friend Dr. Edel Garcia, who attended a recent workshop held by the applied mathematics community, was able to give me personal insight into the proceedings. He's allowed me to publish his report and share it with those who would like a high-level overview of what researchers in the field are currently engaged in.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

ClickZ Live Chicago Join the Industry's Leading eCommerce & Direct Marketing Experts in Chicago
ClickZ Live Chicago (Nov 3-6) will deliver over 50 sessions across 4 days and 10 individual tracks, including Data-Driven Marketing, Social, Mobile, Display, Search and Email. Check out the full agenda and register by Friday, Oct 3 to take advantage of Early Bird Rates!


Mike Grehan

Mike Grehan is Publisher of Search Engine Watch and ClickZ and Producer of the SES international conference series. He is the current president of global trade association SEMPO, having been elected to the board of directors in 2010.

Formerly, Mike worked as a search marketing consultant with a number of international agencies, handling such global clients as SAP and Motorola. Recognized as a leading search marketing expert, Mike came online in 1995 and is author of numerous books and white papers on the subject. He is currently in the process of writing his new book "From Search To Social: Marketing To The Connected Consumer" to be published by Wiley in 2013.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!



Featured White Papers

IBM: Social Analytics - The Science Behind Social Media Marketing

IBM Social Analytics: The Science Behind Social Media Marketing
80% of internet users say they prefer to connect with brands via Facebook. 65% of social media users say they use it to learn more about brands, products and services. Learn about how to find more about customers' attitudes, preferences and buying habits from what they say on social media channels.

Marin Software: The Multiplier Effect of Integrating Search & Social Advertising

The Multiplier Effect of Integrating Search & Social Advertising
Latest research reveals 68% higher revenue per conversion for marketers who integrate their search & social advertising. In addition to the research results, this whitepaper also outlines 5 strategies and 15 tactics you can use to better integrate your search and social campaigns.



    • Digital Marketing Analyst
      Digital Marketing Analyst (GovLoop) - Washington D.C.Are you passionate about audience acquisition? Love effective copy and amazingly effective...
    • Product Specialist
      Product Specialist (Agora Inc. ) - BaltimoreDescription: The Product Specialist is hyper-focused on the customer experience and ensures that our...
    • Partnerships Senior Coordinator
      Partnerships Senior Coordinator (, Inc.) - Las VegasZappos IP, Inc. is looking for a Partnerships Senior Coordinator! Why join us? Our...