The Duplicate Content Debate

Duplicate content is a much-discussed (more like debated, harangued, and maligned) SEO (define) issue that just won’t go away. That’s because more duplicate content is being generated on the Web by distributive formats than ever before. This challenges search engine algorithms to be smarter and faster when presenting users with definitive, relevant search results — usually the original content source.

The major search engines know there are many valid reasons for creating duplicate content. Affiliate sales channels need to be fed content from a centralized source. Syndication services must earn their keep by distributing similar content to dissimilar sites. Content management systems render page after page of mildly realigned content as a matter of cataloging efficiency. RSS feeds distribute another rendition of the same-old content to new venues. The list goes on.

Search engines also understand there are many not-so-valid reasons for duplicate content, like hallway pages, doorway pages, and multiple-domain microsites. Then there are the unscrupulous scrapers that snag someone else’s content, duplicate it repeatedly, and interlink it in a wasteland of Web sites. These made-for-AdSense type of sites are just the sort of thing Google likes to keep out of its indices via penalties, filters, and dampening.

Google in particular understands the inherent differences between valid and invalid content duplication. Because there are many valid reasons for creating valid duplicate content, the effects aren’t readily penalized, unless one considers Google’s supplemental results a penalty.

Make no mistake — landing in supplemental results shouldn’t be considered a penalty. After all, the page is still indexed. Granted, supplemental results are a labyrinth of auxiliary Web pages that won’t generate much search-referred traffic. But it’s better to be supplemental than not be indexed at all, isn’t it?

In Google, supplemental results usually appear as regular search results if the regular index can’t serve up a myriad of relevant results. You’ll often find Web pages from supplemental results lingering on page 10 and beyond. Of course, if a searcher is performing a highly targeted query that should only return genuinely specific results, then supplementals can be on page one.

Supplemental results are a way for Google to extend its search database while preventing low-quality Web pages from getting high levels of search-referred traffic. Since supplemental results aren’t as trusted as regular results, they don’t get as much love and attention from the Google bot. Supplementals tend not to be crawled as frequently as pages in regular search results. So pages found in the supplemental index tend to be a bit stale in terms of cache, but being there isn’t usually the result of a penalty.

Most robust, highly search-referred Web sites have at least some pages in supplemental. If you’re concerned about your site’s performance in Google, keep some historical data around that can help you define whether duplicate content is contributing to poor levels of search-referred traffic.

Know how many pages you have in your site and do some domain drilldowns (site:www.domain.com) to determine when your pages go supplemental. Keep a running monthly tally of the percentage of pages in supplemental. Watch for changes, but always review content quality. Maybe duplicate content is a primary contributor toward receiving low-quality page scores from Google; maybe it’s low-value inbound links or an unrelated technical issue. You won’t know until you do some digging, and even then it will take time to understand the trends that would indicate duplicate content at the root of all evil.

Duplicate content creates a real problem for site owners because you’re allowing a search engine to select which pages are important and which aren’t. This also wastes crawl time that could be spent in more pertinent site areas.

There are simple solutions for minimizing duplicate content, such as leveraging robots.txt to keep the spiders out of print-only and PDF versions. Doing so will dedupe your content and enhance crawling efficiency. If dynamic URLs are at the root of your site’s duplicate content, rewrites to static URLs are in good order. If you’re merging two sites into one, permanent redirects offer a timely fix. No matter what’s creating the duplicate content, there are ways to minimize its effects on search-referred traffic.

Unfortunately, the effects of duplicate content are subject to debate in nearly mythical proportion and most certainly from a myopic perspective. Duplicate content penalties are very rare. It takes a great measure of dubious intent to earn a duplicate content penalty. Better to create one great site filled with unique content that naturally earns inbound links than worry about a couple of Web pages that land in supplemental results.

Join us for Search Engine Strategies on June 12-13 in Toronto.

Want more search information? ClickZ SEM Archives contain all our search columns, organized by topic.

Related reading

seo myths
11degree-content-and-seo-should-collaborate-conductor-580x358
long form ranking
google-changes
<