SEM's Hidden Science

  |  March 20, 2006   |  Comments

Search engine marketing may be an art and a science, but the science part is greatly misunderstood. Educate yourself.

When I was studying marketing at university, there was always a lively, ongoing debate about whether it was an art or a science. Eventually, the marketing industry adopted the idea of it being both.

And there's a healthy serving of both under the marketing umbrella these days. With SEO (define), however, an extraordinarily rich and frequently complex mixture of scientific disciplines is hidden below the surface of the major search engines. It's this science I find is so frequently misunderstood, misrepresented, or just plain ignored by many in the SEO community.

The science of information retrieval (IR) predates search engines by a very long time. It's at the heart of search engine algorithms. It's emerged as the third subject, along with logic and philosophy, that deals with relevance -- a very elusive human notion.

In 1976, library scientist Tefko Saracevic traced the notion of relevance to problems of scientific communication. Relevance, he said, is considered a measure of the effectiveness of a contact between a communication's source and destination. This perfectly sums up a search engine's job for end users.

Classic IR models take nothing about HTML code, dynamic information delivery, or barriers to being crawled or indexed, into consideration. These are, in the main, minor issues when a search engine builds its index (or tiered index, as it is in fact).

As many readers are aware, I'm noted for separating the reasonably straight-forward SEO task of eliminating crawling barriers from the far more important issue of understanding ranking mechanisms. Without a decent rank nobody's ever going to find you, so there's really not much point in being in a search engine index.

IR, including ranking algorithms, is a fascinating field. I've become ultra-absorbed. My interest and research in it is purely from an online marketer's point of view, not as a researcher or scientist in the field.

I find incredible the number of people I meet at industry events who simply don't get the importance of understanding the real challenge of applying marketing communications to IR on the Web.

I have to prevent my jaw from dropping when people ask such extraneous questions as, "Can a search engine understand CSS code, Mike?" I'm dumbfounded by the number of times I hear people (often conference speakers) mention IR elements with more than mildly erroneous explanations. "Latent semantic indexing" is one term bandied around by all and sundry. Rarely do I hear it explained in its true context.

Latent semantic indexing (LSI) has been around for some time. Loosely described, it tackles the old IR problem of vocabulary diversity in human-computer interaction. Specifically, that people use different words to describe the same object or concept. At the same time, some words can have more than one meaning (and these can be semantically very different).

At times, LSI can improve the conventional vector space model (define). However, LSI's run-time performance is a major concern to search engines wishing to provide results to end users in less than a second.

With LSI, an inverted index isn't possible, as the end user query is represented as just another document. It must, therefore, be compared with all other documents. And that would take a long time for every user query. It's difficult to discuss the various methods used by search engines to index and rank documents without going into the science behind it, at whatever level.

I've often overheard SEO experts talking to potential or existing clients at a conference using snippets of IR terminology, such as LSI, in some of the most out-of-context ways: "Yes, it's a symptom called the 'sandbox.' It's because Google uses latent semantic indexing. Now..."

What the notion of a sandbox could have to do with LSI is beyond me. Not to mention the fact LSI isn't a Google thing. It's an IR thing. It belongs to the entire IR research community.

In my experience, having a general understanding of IR techniques and how they can be applied to commercial search engines (an entirely different proposition to the homogenous collections they were originally conceived for) can save an awful lot of wasted effort and mind clutter in SEO.

Such understanding also lets you see through a lot of the BS that's pitched at poor clients who are still scratching their heads trying to come to terms with the perceived technologically advanced concept of a meta tag.

There's tons of information about IR models and techniques in the literature. Much of the classic information still stands up today. In the realm of document space, however, much more research continues.

I'm extremely fortunate my friend Dr. Edel Garcia, who attended a recent workshop held by the applied mathematics community, was able to give me personal insight into the proceedings. He's allowed me to publish his report and share it with those who would like a high-level overview of what researchers in the field are currently engaged in.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

ClickZ Live New York Want to learn more?
Attend ClickZ Live New York March 30 - April 1. With over 15 years' experience delivering industry-leading events, ClickZ Live brings together over 60 expert speakers to offer an action-packed, educationally-focused agenda covering all aspects of digital marketing. Register today!


Mike Grehan

Mike Grehan is currently chief marketing officer and managing director at Acronym, where he is responsible for directing thought leadership programs and cross-platform marketing initiatives, as well as developing new, innovative content marketing campaigns.

Prior to joining Acronym, Grehan was group publishing director at Incisive Media, publisher of Search Engine Watch and ClickZ, and producer of the SES international conference series. Previously, he worked as a search marketing consultant with a number of international agencies handling global clients such as SAP and Motorola. Recognized as a leading search marketing expert, Grehan came online in 1995 and is the author of numerous books and white papers on the subject and is currently in the process of writing his new book From Search to Social: Marketing to the Connected Consumer to be published by Wiley later in 2014.

In March 2010 he was elected to SEMPO's board of directors and after a year as vice president he then served two years as president and is now the current chairman.

COMMENTSCommenting policy

comments powered by Disqus

Get the ClickZ Search newsletter delivered to you. Subscribe today!




Featured White Papers

A Buyer's Guide to Affiliate Management Software

A Buyer's Guide to Affiliate Management Software
Manage your performance marketing with the right solution. Choose a platform that will mutually empower advertisers and media partners!

Google My Business Listings Demystified

Google My Business Listings Demystified
To help brands control how they appear online, Google has developed a new offering: Google My Business Locations. This whitepaper helps marketers understand how to use this powerful new tool.


    • Website Optimizer - SEO, CRO, Analytics
      Website Optimizer - SEO, CRO, Analytics (Marcel Digital) - ChicagoMarcel Digital, an award winning interactive marketing agency established in 2003...
    • Director of Marketing
      Director of Marketing (Patron Technology) - New YorkDirector of Marketing We are seeking a Director of Marketing to manage and build our marketing...
    • Senior Interactive Producer
      Senior Interactive Producer (Ready Set Rocket) - New YorkWhat You'll Do As a member of our team, the Senior Producer reports directly to our...