What Search Engines Know and You Don’t

In the SEO (define) field, we can get a little starved of genuinely useful information regarding search engines’ ranking algorithms. Sure, the search engine reps don’t mind talking at length about Web crawling, server issues, and indexing. But it levels out pretty quickly when you ask very pertinent and probing questions about the ranking algorithm.

One of my marketing objectives is still very much about visible positioning, as opposed to the purely technical and functional process of getting pages into a search engine index. That, after all, isn’t the really difficult part of the job.

I’ve been involved in online marketing since I formed my first consultancy back in 1995 and been fascinated with the way search engines work from way back. I’ve always believed that for my clients’ sake, I must have a thorough understanding (or as best I can, given the starvation factor) about the underlying principles of what really makes one Web page rank higher than another.

“Do we really need to know this scientific Web mining and information retrieval stuff?” you may ask. I firmly believe we do. When your client wonders why her competitors always rank higher than she does, she’ll ask, “How do search engines really work?” If you answer with, “I dunno, they just do,” don’t expect to hang on to that client for too long.

I remember watching a Department of Defence news briefing some time ago, when Donald Rumsfeld answered a question with this now-classic riff:

As we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know.

Rumsfeld could just as well have been describing the combined knowledge base of the SEO community, insofar as search engines keep us informed about their ranking algorithms.

Data mining and warehousing of structured data have been around for some time. Business decision makers understand the benefits the knowledge that can be gained and used in many areas, ranging from scientific research to market analysis.

But the newer field of Web mining is multidisciplinary. It draws on areas such as data mining, information retrieval, pattern recognition, statistics, machine learning, artificial intelligence, and more.

I talked recently with scientist Apostolos Gerasoulis, founder of the Teoma search engine (which powers Ask Jeeves). He explained the concept of search marketing’s two galaxies: the galaxy of the content creators (developers and search marketers) and that of end users.

Search marketers control the first galaxy by developing semistructured, optimized pages for search engines to crawl. The search engines control the second galaxy by accumulating data on what and why people search, as well as user behavior.

This concept indicates a deeper layer of knowledge accessible only by the search engines and identifies another gap between what the search engines know and what you know.

Last week, I saw a presentation by Dr. Usama Fayyad, chief data officer at Yahoo He also heads its research lab. Fayyad gave the most fascinating presentation, based on the 30 terabytes of data Yahoo slices, dices, rotates, rolls up, and drills down every day. The amount of end user information he finds is startling, particularly the data he’s able to process relating to Yahoo search.

After the presentation, I talked with Fayyad in the bar (as is my wont). I ran Gerasoulis’ two galaxies concept past him. Of course, Fayyad knows Gerasoulis and doesn’t find the concept strange at all. He’d touched on something very similar in his presentation.

We discussed some of the great researchers in information retrieval on the Web and future developments at Yahoo Labs (including the new Mindset beta test).

Fayyad very kindly agreed to shed more light on what’s likely to happen with search in the future, learning machines, artificial intelligence, and stuff already being cooked up in the labs.

Undoubtedly, SEO as we currently understand it is about to go through some major changes. I’m sure the search engines will help shed more light on changes they’re already incorporating, so we don’t end up going through systems and processes that really don’t add a great deal to search marketing endeavors.

So long as I find myself in the same vicinity as margaritas, peanuts, and the odd search engine scientist, I promise to keep you posted from this end.

Vote for your favorite products, services, and campaigns! The ClickZ Marketing Excellence Awards recognize ClickZ readers’ choices for achievement and innovation in online marketing technology, solutions, and execution. Voting runs until Wednesday, June 22 (EOB, EST).

Related reading

Brand Top Level Domains