Managing Duplicate Content on Your Website

Over the past year or so, much has been made over search engine optimization (SEO), thanks in large part to Google rolling out their sophisticated Panda and Penguin algorithm updates. And while most of the conversation among tech sites and bloggers has centered on how websites can avoid being penalized by Google for engaging in blackhat SEO techniques, little has been said on a key aspect that affects content merchandisers every day.

Namely, how do search engines treat duplicate content on individual sites?

Defining Duplicate Content

Product page for a KEEN shoe

You need to tell Google which color shoe you want to show up in search results.

Duplicate content, to use Google’s definition, constitutes “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.” A common misconception exists that Google arbitrarily penalizes websites for using such content, which isn’t true.

For many content merchandisers, using duplicate content is a business necessity, given how similar many items are within a product line.

For example, if your company produces six identical laptop carrying cases–the only difference among them being their color–why bother writing six unique pieces of product text? It’s simply more efficient and expedient to write a single product description–sans color information–to accommodate each product.

Google’s View on Duplicate Content

With the likes of Panda and Penguin helping root out uninformative, redundant, and spam-ridden content, one might assume that Google frowns on duplicate content. To an extent, that’s true, as the company has taken the aforementioned measures to combat artificially spiked search results.

But that doesn’t mean Google considers all duplicate content to be deceptive.

For instance, “store items shown or linked via multiple distinct URLs” is considered non-malicious in nature. But there’s a catch. Google does recommend minimizing similar content to provide an optimized user experience.

As a content merchandiser, there are a couple of ways you can go about addressing this issue. One is to substantially tweak the text of your duplicate content to make it more unique in nature–no doubt a challenging prospect if you’re dealing with more than a few products.

The other is to use canonicalization, or indicating your preferred URL to Google. As the search engine giant explains, “It’s common for a site to have several pages listing the same set of products. If Google knows that these pages have the same content, we may index only one version for our search results. Our algorithms select the page we think best answers the user’s query.”

With canonicalization, you, the site merchandiser–not Google–selects which of these duplicate pages should get preference in a search. Though it may seem trivial who determines which page gets preference, if you know that your gun metal gray laptop case is your biggest seller, why give Google the chance to arbitrarily select honey mustard yellow as the default search listing?

A Different View from Bing

Not to be left out of this conversation, Bing, the world’s second largest search engine, offers its own solution to duplicate content. Championing it as “better than canonical,” Bing touts URL normalization as both a complement and alternative to canonicalization.

URL normalization involves indicating to Bing which URL parameters on your site (in this case, those featuring links to duplicate content) can be ignored by their search crawler. In other words, this is the inverse of Google’s canonicalization. Instead of indicating your preferred URL, you’re telling Bing which URLs it can wholly disregard–i.e., those with duplicate content that may be distinguished by a color field or item number in the URL address.

Doing so not only helps prevent duplicate content from showing up in the Bing index, but also results in “less pointless crawling which reduces resource loads on your server, fresher copy in our index of the canonical destination URLs, and less out-links being discovered with extra parameters,” says Bing.

While both measures offered by Google and Bing adequately address duplicate content on individual domains, they are completely insufficient at addressing how duplicate content is managed across various e-commerce channels. For more on that, please take a look at this recent post by SEO expert Kate Moris, who advocates eliminating duplicate content whenever possible, an argument we’ve been making for years.

The Takeaway

You know better than Google and Bing which version of your content should show up in search results on your site. So take the initiative to specify which duplicate content gets precedence.

Let's work together.