Part one looked at some of the tweaks and changes Google makes roughly once a month and how they affect sites in the search engine’s index.
Does Google have an index out there that’s missing some spam penalties? Similar to the link analysis situation, Google has some new spam filtering systems it’s preparing to release. References to spam filters yet to be applied in the index relate more to integrating scoring from the new systems, once ready, not Google’s lack of spam penalties in the index, Google’s Matt Cutts, a Google software engineer who deals with Webmaster issues, said.
Cutts readily admits it’s possible to find pages in the current index that use tactics Google dislikes, such as hidden text and hidden links. It’s hoped new filters will help eliminate this. Cutts added the presence of such pages doesn’t necessarily translate into bad relevancy.
“For a long time, these things have been annoying Webmasters rather than users,” he said. “Scoring already takes care of this stuff, but we have seen posts like, ‘Why isn’t Google handling this?'”
How can a search at Google bring up different results if it’s rerun just a few seconds later? Here, understanding the dance part of the Google Dance may help. Let’s flash back to a different search engine, the AltaVista of old.
In the early AltaVista days, all the pages resided in one big, powerful, mainframe-style computer. Eventually, one computer wasn’t enough, so the index was spread across four mainframes. That helped with storage but not query load. As a result, AltaVista made a duplicate of its index, a mirror kept in a different physical location.
As the system got more complicated, the chance something could go wrong increased. If one of AltaVista’s four computers went down, essentially a quarter of its index was unavailable to searchers who were unknowingly directed to that mirror. If searchers were suddenly switched to a different mirror when trying again, they might hit the entire index and get different results.
Fast-forward to now. Google (like other search engines) distributes its index across hundreds of computers with processing power similar to that of your desktop. That solves the storage problem. What about query load? To help, Google has multiple copies of its index in various locations. When you search, you might hit a copy of the index located on the U.S.’s West Coast, on its East Coast, or in Europe, to name a few.
If the mirror you hit has a few computers down (fairly common), some pages might not be available. It’s not as bad as in the old AltaVista days. If 10 or 20 computers aren’t working, that’s a tiny amount compared to hundreds that still are. Nevertheless, having some computers down at one mirror could cause results to be slightly different if you’re directed to a different mirror on your next search.
Now to the “dance.” When Google updates its index, it must spread the new information across these hundreds of computers in various locations. It takes a day or two until the new information is seeded and stable. So some results seem to “dance” around with slight changes, especially to Webmasters who monitor positions like hawks.
If you’ve done a search, then repeated it and gotten different results, two things are likely. You may have hit a different mirror of the index on your repeat search where the copy isn’t perfectly in line with the first index. Or (more likely) you’ve seen the Google Dance in action. To confirm, consider visiting the Google Dance Tool.
Blogs to Stay
One thing not in the cards for future index changes are plans to pull blog content out of Google’s regular search results. During my recent interview Google made a point of stressing blogs are staying.
The idea blogs were to go came out of a recent Register article. The piece suggested if a blog tab was eventually added to Google, blogs themselves would be removed from the main Web page index to increase relevancy. As proof, the Register said this is what happened to Usenet posts after Google “acquired Usenet groups from Deja.”
Google didn’t acquire Usenet groups. No one owns them any more than anyone owns the Web. Deja had archives of posts made in those groups. Google acquired these and began crawling Usenet to add to the archives. As Usenet information was never part of the Web index, there was nothing to “pull.”
If a blog index is created, it’s not a given blog content would be pulled. Google has not deleted directory or news listings from the Web index, even though both types of content can be found via their own tabs.
Will a blog tab really come? Eventually, sure. But it’s not in immediate plans, says Google.
GDS: Canaries or Chicken Littles?
I’ve said before, in the search engine coalmine there are two types of canaries to spot danger: research professionals and Webmasters/Web marketers. Both groups study search engine results intimately. They notice changes before ordinary data miners, your average Web searchers.
When Webmasters report GDS, concerns must be seriously considered, especially during an epidemic. Some GDS sufferers do indeed reflect changes at Google that may be bad or imperfect for searchers. Google knows this. “Is Google perfect? Of course not,” wrote GoogleGuy.
Despite Google’s imperfections, GDS reports do not necessarily mean the sky is falling for Google. How to know for certain if it is? Searcher abandonment is a sure sign, though a long-term trend.
In the short term, an outcry from researchers and search engine optimizers (SEOs) lends credence that something’s gone awry. I’d watch for really major, growing GDS outbreaks among SEOs for several months in sequence before deciding Google made some sort of terrible mistake.
When you’re just starting out as a business owner it’s easy to become wrapped up in the seemingly endless number of metrics ... read more
Visual search on the web has been around for some time. In 2008, TinEye became the first image search engine to use ... read more
We’ve written an awful lot about Google’s open source accelerated mobile pages project (better know as Google AMP) over that last 12 ... read more