Search vs. the Splogosphere

E-commerce sites are genuinely complex because they’re routinely rendered from multifaceted technological systems, diverse data centers, and advanced software support. These site types often require organic optimization, way beyond using Google Sitemaps, to make content search-engine-friendly. Corporations operating commerce sites spend billions promoting their products and services in the search engines each year.

To a certain extent, some major search engines help big brands protect their hard-earned organic Web real estate. Most of us have observed Google’s tinkering with its interface to allow for more than two indexed listings, especially when big brands and major e-commerce sites are concerned. But you don’t have to search far to find big brands ravaged by bloggers riding on the tailcoats of popular, brand-name goods and services.

How Big Is the Problem?

Technorati, an authority on what’s going on in the blogosphere, is currently tracking 19.9 million sites. As of October 2005, it’s seeing an average 70,000 new blogs created each day. A new blog created every minute, on average.

Perseus Development randomly surveyed 3,634 blogs on eight leading blog-hosting services to develop a blogosphere model. Based on this research, Perseus estimates 4.12 million blogs have been created by the eight providers.

Perseus found 66 percent of blogs surveyed hadn’t been updated in two months. This represents 2.72 million blogs that are essentially abandoned. Of these, 1.09 million were one-day wonders and never updated. At this rate, a new blog is abandoned every 2 minutes, 24 seconds.

If trends remain constant, nearly 13.1 million Technorati-tracked blogs are abandoned by their authors every two months, of which 5.3 million blogs are abandoned after a single post. What percentage of these blogs become spam remains an elusive figure.

One thing’s for certain: 100 percent of abandoned blogs add to the cyber clutter crawled and indexed by search engine spiders. Small wonder the major engines have stopped focusing on index growth, with the blogosphere doubling in size every five months.

Relief From Tags and Plug-Ins?

In 2003, link spammers (define) began taking advantage of the open nature of blog software, such as Movable Type (MT), by continually placing commercial links in comments and TrackBacks. Jay Allen created a content filter, dubbed the MT-Blacklist plug-in, to help free MT users from link spam.

In late 2004, Google, MSN Search, and Yahoo embraced the nofollow attribute (define) as the primary means for preventing comment spam from reader-submitted links. HTML nofollow tags are a simple, albeit voluntary, measure for bloggers to employ.

Ten months down the road, results are mixed. Matthew Mullenweg, head of bug creation for WordPress, said nofollow tags were never really sold as the ultimate remedy for blog spam.

“It was meant to remove the economic incentives for spam in the long term. I think it’s on its way to doing that,” Mullenweg said. “WordPress is a huge target because of its popularity, but we’ve integrated a lot into the core, and there are plug-ins out there that really make it a nonissue for most people.”

John Keegan, president of Rackshare, the company behind BlogHarbor services, believes nofollow attributes provide no help in thwarting blog-born link spam.

“Nofollow will not significantly reduce comment or TrackBack spam,” Keegan wrote on his own blog. “That typical blogger just wants to know if using nofollow will reduce the amount of spam heaped on his blog. Unfortunately, the answer is no. Won’t help at all.”

Block Results, Remove Results

Fortunately splogs (define), spam blogs created by scraping RSS (define) for automatic content posting, have very different characteristics from human-created blogs. The frequency, magnitude, and manner of postings differ greatly from a typical human blogger. This makes this type of blog spam stand out when analyzed algorithmically.

Technorati estimates it snares about 90 percent of the splogs in the blogosphere and removes offenders from its search index. Technorati currently reports some 2 to 8 percent of new blogs are fake or spam. Extrapolated, Technorati has already tracked and eliminated some 400,000 to 1.6 million splogs from its index.

Increased collaboration between blog service providers will help combat this particular type of spam in Technorati’s index. What can the major search engines learn from Technorati’s lead in de-indexing splogs? Enter blog-only search services currently in beta by the major search players.

Nowadays, links can’t be trusted, largely due to the burgeoning, spam-ridden blogosphere. Browse MSN Search results for nearly any popular brand-name product and you’ll see what I mean. Google and Yahoo, meanwhile, have responded with link dampening and link filters, along with testing options that remove or block certain search results for individual users in personalized (not general) search results.

Added personalization features in the personal search service mean Yahoo users can save search results they like, block results they don’t, share saved results, and search saved items.

Recently, Matt Cutts published an overview of how Google users can remove results by tweaking their personal search settings. Cutts notes it’s too early to say if Google will use the remove results personalization feature to improve general search results.

The Bottom Line

The blogosphere caused search engine indexes to rapidly expand. As the blogosphere itself continues to grow dramatically, with both credible bloggers and self-propagating splogs, personalized search communities are making their results more relevant on an individual basis.

For the average surfer, the message is clear. If you seek more relevant, splog-free search results from the two most popular search engines, you must set up an account with those engines and do it yourself.

For commercial sites whose brand names and popular products are frequently the focus of splogs, the message is equally clear. If you want to continue to reach the Web audience, you can buy top visibility with ads that can’t be removed from search results, public or private. Small wonder Google reported record earnings last week.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Related reading

Brand Top Level Domains