How Search Engines Work

Download this PDF, which covers the basics of the science of information retrieval on the web.

Author

Mike Grehan

Date published January 11, 2012 Categories

It’s been 10 years since I wrote the second edition of a book about search engines called, “Search Engine Marketing: The Essential Best Practice Guide.” It was a very big seller and, in fact, it carried on selling through to the beginning of 2010 when I took it offline.

I’ve decided to start this year by revisiting the chapter in the book about how search engines work. I’ve said many times over the years that, most books about SEO have a section called how search engines work. But rarely (if ever) do they describe the interdisciplinary approach to information retrieval (IR) covering mathematics, computer science, library science, information architecture, cognitive psychology, linguistics, and statistics – to name but a few.

Previously, I had written mainly about methods of manipulating rankings by keyword stuffing and other black hat type techniques of the time. But as I began to realize the importance of linkage data and even more so, link anchor text, I became more and more inquisitive as to what it was exactly that search engines used in their ranking technologies.

After talking to one of the pioneers in web search (Brian Pinkerton of WebCrawler), I was introduced to the work of foremost information retrieval scientist, Gerard Salton. This was a major breakthrough for me.

Salton’s work was cited in just about every IR research paper I read at the time. So I tracked down and bought a copy of his seminal work “Modern Information Retrieval” (written back in 1983, however Salton’s work in the field goes back to the early 1970s).

As a marketer, not a scientist, this was no easy read. Yet, as I began to grasp the basic concepts and drivers behind information retrieval (and the way it is applied to the web), the more I was able to understand the major challenges involved. And that led me to change, not just the amateurish and spammy techniques I’d used previously, but to thinking about SEO in an entirely different way.

And to this day, I still firmly believe that a basic understanding of the science of information retrieval on the web goes a long way toward helping search marketers dispel myths and do their jobs more professionally and proficiently.

Of course, 10 years later my personal library has grown to include a very large section of information retrieval and data mining texts as more and more become available. This is also largely due to the fact that the subject matter is so fascinating it’s hard not to become engrossed.

As I revisited the chapter I’d written on how search engines work from a decade ago, I expected it to be a bit stale, but it wasn’t at all. Although, I dare say, to an IR scientist, if not stale it probably seems about as elementary as it gets. I wrote the chapter placing great emphasis on trying to make it non-mathematical. By that I mean highlighting concepts and background theory rather than matrices and formulae. That said, it’s extremely hard to cover the subject without references to linear algebra and other mind-numbing math.

Anyway, if you’re genuinely interested in how search engines work (but really, not the anecdotal stuff generally bandied around), then it’s as good a place as any to start. I mention in the introduction that it is totally unchanged from the very quirky, very British flavor it had when it was first published. A few pages were eliminated purely because they were totally irrelevant a decade later. There are a few little gems in it that I’d forgotten about.

? !

No, the subhead above isn’t a typo or spelling mistake. It’s actually a conversation.

When the French author Victor Hugo had “Les Miserables” published, he was not living in Paris at that time. He was waiting to hear news from his publisher about the kind of reception his new book was having. When he could wait for news no longer, he sent a letter to his publisher that contained only the character: ?

On receiving this, his publisher knew exactly what it meant and he returned a note to him containing only the character: ! This let Victor Hugo know that his book was a huge success. It is said that this is the shortest correspondence in history.

What’s that got to do with anything? It was actually a good analogy I used relating to the length of the average query at search engines at the time and how difficult it is to deal with short queries.

I’m seriously thinking about trying to find the time to update the entire book this year and make it available free to Search Engine Watch and ClickZ subscribers. More recently I’ve been reading about a feature-centric view of information retrieval and also learning to rank for information retrieval and natural language processing (very hot research topics). I’ll be writing a couple of follow-up columns covering these subjects combined with fascinating insights into the strength of end user data and, of course, weaving that into the update of the book once I get time to make a start.

But right now, feel free to download the PDF of “How Search Engines Work.” If nothing else, I hope it acts as a very basic introduction.

This article was originally published on searchenginewatch.com.

Subscribe to get your daily business insights

More about:

Read the next article

Explore Tech Talks

Lucy

Lucy helps organizations leverage knowledge for in... View Tech Talk
TVSquared

TVSquared is the global leader in cross-platform T... View Tech Talk
Grata

Grata is a B2B search engine for discovering small... View Tech Talk

Whitepapers

US Mobile Streaming Behavior

Whitepaper | Mobile

US Mobile Streaming Behavior

Streaming has become a staple of US media-viewing habits. Streaming video, however, still comes with a variety of pesky frustrations that viewers are ...

View resource

Winning the Data Game: Digital Analytics Tactics for Media Groups

Whitepaper | Analyzing Customer Data

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Data is the lifeblood of so many companies today. You need more of it, all of which at higher quality, and all the meanwhile being compliant with data...

View resource

Learning to win the talent war: how digital marketing can develop its people

Whitepaper | Digital Marketing

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

This report documents the findings of a Fireside chat held by ClickZ in the first quarter of 2022. It provides expert insight on how companies can ret...

View resource

Engagement To Empowerment - Winning in Today's Experience Economy

Report | Digital Transformation

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Customers decide fast, influenced by only 2.5 touchpoints – globally! Make sure your brand shines in those critical moments. Read More...

View resource

Maximize your video SEO in 2023: Proven strategies for success

SEO

Maximize your video SEO in 2023: Proven strategies for success

1y Benjamin Broomfield

Maximize your video SEO in 2023: Proven strategies...

A fresh take on the best practices for video content, common SEO mistakes, the impact of video on a brand’s broader strategy, and more Read More...

View article

The importance of accurate keyword difficulty scores

Data

The importance of accurate keyword difficulty scores

3y Semrush

The importance of accurate keyword difficulty scor...

The dark horse your business needs to outperform competition and win at search Read More...

View article

Making technical-SEO accessible to everyone

Digital Marketing

Making technical-SEO accessible to everyone

3y Tereza Litsa

Making technical-SEO accessible to everyone

Interested in boosting your SEO skills? Ever wondered how technical SEO works? OnCrawl's new guide has got you covered. Read More...

View article

Tech Talk with Pi Datametrics: Controlling your content performance with the right data

Content

Tech Talk with Pi Datametrics: Controlling your content performance with th...

4y Team ClickZ

Tech Talk with Pi Datametrics: Controlling your co...

Insights from data are only precious if they are actionable and value-led. Read More...

View article

How to adapt your SEO strategy to navigate through the pandemic

Digital Marketing

How to adapt your SEO strategy to navigate through the pandemic

4y Tereza Litsa

How to adapt your SEO strategy to navigate through...

How can you use SEO to boost your business? Here are some practical tips. Read More...

View article

What a digital-only world means for your marketing strategy

Data insights

What a digital-only world means for your marketing strategy

4y Luke Richards

What a digital-only world means for your marketing...

DeepCrawl's Alex Schaefer recently spoke to the ClickZ peer network about his observations during Coronavirus. According to him, brands need to be inv...

View article

How to improve your SEO after Google’s spot-zero-termination

Digital Marketing

How to improve your SEO after Google’s spot-zero-termination

4y Tereza Litsa

How to improve your SEO after Google’s spot-zero-t...

Google has changed SEO and the way featured snippets show up. Here’s how to adjust your SEO strategy. Read More...

View article

How Alexa and Siri are changing SEO: AI and Voice Search

6y Tom Desmond

How Alexa and Siri are changing SEO: AI and Voice ...

Alexa, how is AI-assisted voice search changing the SEO landscape? It’s putting more emphasis than ever on conversational content, integration with Go...

View article

Follow us

How Search Engines Work

? !

Subscribe to get your daily business insights

Read the next article

Explore Tech Talks

Whitepapers

Whitepapers

US Mobile Streaming Behavior

US Mobile Streaming Behavior

Winning the Data Game: Digital Analytics Tactics for Media Groups

Winning the Data Game: Digital Analytics Tactics f...

Learning to win the talent war: how digital marketing can develop its peopl...

Learning to win the talent war: how digital market...

Engagement To Empowerment - Winning in Today's Experience Economy

Engagement To Empowerment - Winning in Today's Exp...

Related Articles

Maximize your video SEO in 2023: Proven strategies for success

Maximize your video SEO in 2023: Proven strategies...

The importance of accurate keyword difficulty scores

The importance of accurate keyword difficulty scor...

Making technical-SEO accessible to everyone

Making technical-SEO accessible to everyone

Tech Talk with Pi Datametrics: Controlling your content performance with th...

Tech Talk with Pi Datametrics: Controlling your co...

How to adapt your SEO strategy to navigate through the pandemic

How to adapt your SEO strategy to navigate through...

What a digital-only world means for your marketing strategy

What a digital-only world means for your marketing...

How to improve your SEO after Google’s spot-zero-termination

How to improve your SEO after Google’s spot-zero-t...

How Alexa and Siri are changing SEO: AI and Voice Search

How Alexa and Siri are changing SEO: AI and Voice ...