SEM's Hidden Science

  |  March 20, 2006   |  Comments

Search engine marketing may be an art and a science, but the science part is greatly misunderstood. Educate yourself.

When I was studying marketing at university, there was always a lively, ongoing debate about whether it was an art or a science. Eventually, the marketing industry adopted the idea of it being both.

And there's a healthy serving of both under the marketing umbrella these days. With SEO (define), however, an extraordinarily rich and frequently complex mixture of scientific disciplines is hidden below the surface of the major search engines. It's this science I find is so frequently misunderstood, misrepresented, or just plain ignored by many in the SEO community.

The science of information retrieval (IR) predates search engines by a very long time. It's at the heart of search engine algorithms. It's emerged as the third subject, along with logic and philosophy, that deals with relevance -- a very elusive human notion.

In 1976, library scientist Tefko Saracevic traced the notion of relevance to problems of scientific communication. Relevance, he said, is considered a measure of the effectiveness of a contact between a communication's source and destination. This perfectly sums up a search engine's job for end users.

Classic IR models take nothing about HTML code, dynamic information delivery, or barriers to being crawled or indexed, into consideration. These are, in the main, minor issues when a search engine builds its index (or tiered index, as it is in fact).

As many readers are aware, I'm noted for separating the reasonably straight-forward SEO task of eliminating crawling barriers from the far more important issue of understanding ranking mechanisms. Without a decent rank nobody's ever going to find you, so there's really not much point in being in a search engine index.

IR, including ranking algorithms, is a fascinating field. I've become ultra-absorbed. My interest and research in it is purely from an online marketer's point of view, not as a researcher or scientist in the field.

I find incredible the number of people I meet at industry events who simply don't get the importance of understanding the real challenge of applying marketing communications to IR on the Web.

I have to prevent my jaw from dropping when people ask such extraneous questions as, "Can a search engine understand CSS code, Mike?" I'm dumbfounded by the number of times I hear people (often conference speakers) mention IR elements with more than mildly erroneous explanations. "Latent semantic indexing" is one term bandied around by all and sundry. Rarely do I hear it explained in its true context.

Latent semantic indexing (LSI) has been around for some time. Loosely described, it tackles the old IR problem of vocabulary diversity in human-computer interaction. Specifically, that people use different words to describe the same object or concept. At the same time, some words can have more than one meaning (and these can be semantically very different).

At times, LSI can improve the conventional vector space model (define). However, LSI's run-time performance is a major concern to search engines wishing to provide results to end users in less than a second.

With LSI, an inverted index isn't possible, as the end user query is represented as just another document. It must, therefore, be compared with all other documents. And that would take a long time for every user query. It's difficult to discuss the various methods used by search engines to index and rank documents without going into the science behind it, at whatever level.

I've often overheard SEO experts talking to potential or existing clients at a conference using snippets of IR terminology, such as LSI, in some of the most out-of-context ways: "Yes, it's a symptom called the 'sandbox.' It's because Google uses latent semantic indexing. Now..."

What the notion of a sandbox could have to do with LSI is beyond me. Not to mention the fact LSI isn't a Google thing. It's an IR thing. It belongs to the entire IR research community.

In my experience, having a general understanding of IR techniques and how they can be applied to commercial search engines (an entirely different proposition to the homogenous collections they were originally conceived for) can save an awful lot of wasted effort and mind clutter in SEO.

Such understanding also lets you see through a lot of the BS that's pitched at poor clients who are still scratching their heads trying to come to terms with the perceived technologically advanced concept of a meta tag.

There's tons of information about IR models and techniques in the literature. Much of the classic information still stands up today. In the realm of document space, however, much more research continues.

I'm extremely fortunate my friend Dr. Edel Garcia, who attended a recent workshop held by the applied mathematics community, was able to give me personal insight into the proceedings. He's allowed me to publish his report and share it with those who would like a high-level overview of what researchers in the field are currently engaged in.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

ClickZ Live Toronto Twitter Canada MD Kirstine Stewart to Keynote Toronto
ClickZ Live Toronto (May 14-16) is a new event addressing the rapidly changing landscape that digital marketers face. The agenda focuses on customer engagement and attaining maximum ROI through online marketing efforts across paid, owned & earned media. Register now and save!


Mike Grehan

Mike Grehan is Publisher of Search Engine Watch and ClickZ and Producer of the SES international conference series. He is the current president of global trade association SEMPO, having been elected to the board of directors in 2010.

Formerly, Mike worked as a search marketing consultant with a number of international agencies, handling such global clients as SAP and Motorola. Recognized as a leading search marketing expert, Mike came online in 1995 and is author of numerous books and white papers on the subject. He is currently in the process of writing his new book "From Search To Social: Marketing To The Connected Consumer" to be published by Wiley in 2013.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!



Featured White Papers

ion Interactive Marketing Apps for Landing Pages White Paper

Marketing Apps for Landing Pages White Paper
Marketing apps can elevate a formulaic landing page into a highly interactive user experience. Learn how to turn your static content into exciting marketing apps.

eMarketer: Redefining Mobile-Only Users: Millions Selectively Avoid the Desktop

Redefining 'Mobile-Only' Users: Millions Selectively Avoid the Desktop
A new breed of selective mobile-only consumers has emerged. What are the demos of these users and how and where can marketers reach them?


    • Contact Center Professional
      Contact Center Professional (TCC: The Contact Center) - Hunt ValleyLooking to join a workforce that prides themselves on being routine and keeping...
    • Recruitment and Team Building Ambassador
      Recruitment and Team Building Ambassador (Agora Inc.) - BaltimoreAgora,, continues to expand! In order to meet the needs of our...
    • Design and Publishing Specialist
      Design and Publishing Specialist (Bonner and Partners) - BaltimoreIf you’re a hungry self-starter, creative, organized and have an extreme...