SEM's Hidden Science

Search engine marketing may be an art and a science, but the science part is greatly misunderstood. Educate yourself.

When I was studying marketing at university, there was always a lively, ongoing debate about whether it was an art or a science. Eventually, the marketing industry adopted the idea of it being both.

And there’s a healthy serving of both under the marketing umbrella these days. With SEO (define), however, an extraordinarily rich and frequently complex mixture of scientific disciplines is hidden below the surface of the major search engines. It’s this science I find is so frequently misunderstood, misrepresented, or just plain ignored by many in the SEO community.

The science of information retrieval (IR) predates search engines by a very long time. It’s at the heart of search engine algorithms. It’s emerged as the third subject, along with logic and philosophy, that deals with relevance — a very elusive human notion.

In 1976, library scientist Tefko Saracevic traced the notion of relevance to problems of scientific communication. Relevance, he said, is considered a measure of the effectiveness of a contact between a communication’s source and destination. This perfectly sums up a search engine’s job for end users.

Classic IR models take nothing about HTML code, dynamic information delivery, or barriers to being crawled or indexed, into consideration. These are, in the main, minor issues when a search engine builds its index (or tiered index, as it is in fact).

As many readers are aware, I’m noted for separating the reasonably straight-forward SEO task of eliminating crawling barriers from the far more important issue of understanding ranking mechanisms. Without a decent rank nobody’s ever going to find you, so there’s really not much point in being in a search engine index.

IR, including ranking algorithms, is a fascinating field. I’ve become ultra-absorbed. My interest and research in it is purely from an online marketer’s point of view, not as a researcher or scientist in the field.

I find incredible the number of people I meet at industry events who simply don’t get the importance of understanding the real challenge of applying marketing communications to IR on the Web.

I have to prevent my jaw from dropping when people ask such extraneous questions as, “Can a search engine understand CSS code, Mike?” I’m dumbfounded by the number of times I hear people (often conference speakers) mention IR elements with more than mildly erroneous explanations. “Latent semantic indexing” is one term bandied around by all and sundry. Rarely do I hear it explained in its true context.

Latent semantic indexing (LSI) has been around for some time. Loosely described, it tackles the old IR problem of vocabulary diversity in human-computer interaction. Specifically, that people use different words to describe the same object or concept. At the same time, some words can have more than one meaning (and these can be semantically very different).

At times, LSI can improve the conventional vector space model (define). However, LSI’s run-time performance is a major concern to search engines wishing to provide results to end users in less than a second.

With LSI, an inverted index isn’t possible, as the end user query is represented as just another document. It must, therefore, be compared with all other documents. And that would take a long time for every user query. It’s difficult to discuss the various methods used by search engines to index and rank documents without going into the science behind it, at whatever level.

I’ve often overheard SEO experts talking to potential or existing clients at a conference using snippets of IR terminology, such as LSI, in some of the most out-of-context ways: “Yes, it’s a symptom called the ‘sandbox.’ It’s because Google uses latent semantic indexing. Now…”

What the notion of a sandbox could have to do with LSI is beyond me. Not to mention the fact LSI isn’t a Google thing. It’s an IR thing. It belongs to the entire IR research community.

In my experience, having a general understanding of IR techniques and how they can be applied to commercial search engines (an entirely different proposition to the homogenous collections they were originally conceived for) can save an awful lot of wasted effort and mind clutter in SEO.

Such understanding also lets you see through a lot of the BS that’s pitched at poor clients who are still scratching their heads trying to come to terms with the perceived technologically advanced concept of a meta tag.

There’s tons of information about IR models and techniques in the literature. Much of the classic information still stands up today. In the realm of document space, however, much more research continues.

I’m extremely fortunate my friend Dr. Edel Garcia, who attended a recent workshop held by the applied mathematics community, was able to give me personal insight into the proceedings. He’s allowed me to publish his report and share it with those who would like a high-level overview of what researchers in the field are currently engaged in.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Subscribe to get your daily business insights

Whitepapers

US Mobile Streaming Behavior
Whitepaper | Mobile

US Mobile Streaming Behavior

5y

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource
Winning the Data Game: Digital Analytics Tactics for Media Groups
Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

5y

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource
Learning to win the talent war: how digital marketing can develop its people
Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

2y

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource
Engagement To Empowerment - Winning in Today's Experience Economy
Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

1m

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource