Duplicate content: the effect on SEO and how to solve it

You’ve run an SEO test and the results are shocking: you have more than 200 duplicate content errors!

You look at your website bewildered and wonder how you’re going to get rid of all these errors. Just the thought alone makes you tired.

The question is, however, is this cause for panic well-grounded? Very briefly, no. Generally speaking, duplicate content does not have a major impact on SEO. With a few exceptions.

There are a lot of misconceptions about duplicate content. Time to clear them up once and for all. In this article we tackle the following four topics:

What exactly is duplicate content?
How duplicate content comes about
Duplicate content is bad for SEO
Detecting duplicate content
How to prevent duplicate content

What is duplicate content anyway?

Duplicate content is a term you often come across in SEO land. But what is it really?

Duplicate content, also known as duplicate content, is website content that appears in more than one place on the internet. In other words, content that appears on multiple web pages or URLs.

The question that is often asked is where the line is drawn. In other words, when is something duplicate content and when is it not? Google says it’s about “substantial blocks of content” that are either completely the same or significantly similar.

What this means: a few copied texts from a website are therefore not duplicate content. It really has to be about large substantial blocks. So you can copy and rewrite small pieces of text without being penalized for it.

However, there is still much uncertainty about whether it is exactly the same content, or whether it is content that is a certain percentage similar. Google leaves us in the dark with this issue.

Types of duplicate content

There are roughly two types of duplicate content. Two identical pieces of content that can be found on two different websites are called ‘external duplicate content’. Copied content within one domain, is called ‘internal duplicate content’.

External duplicate content

Because a lot of content is stolen and reused on the Internet, this is a form of duplicate content that occurs regularly. Many ‘copy pasters’ think: “if I copy this piece of content, then my website will be a bit more filled out, and I can quickly benefit from a better ranking”.

Copying a piece of content from another website is allowed, as long as you ask permission. So are you planning to copy a piece of content from another website? Then be smart enough to ask the author for permission :). If you don’t do this and the author ever makes an issue of it, it can even get you a lawsuit or an expensive damage claim.

Unfortunately for these copy pasters, copying content does not mean that you also build an SEO advantage. Which is what is often thought. A search engine only shows the original piece of content. And this is logically the piece of content that comes from the original source. This means that your copied piece is not or hardly shown in the search results.

To be fair everyone copies a piece of content for their own website from time to time. However, the extent to which and the way in which you do this is of great importance. If you blindly copy a large piece of text, the copied content has no chance of ranking well. What works better is to copy a small part of the text, then rewrite it and complement it with your own content. This way Google will see that the new piece of content is original and will be awarded with a better ranking.

Internal duplicate content

Internal duplicate content is generally caused, usually unconsciously, by yourself. When Google crawls your website and discovers that two web pages are similar, confusion can be caused. As a result, Google then ranks the page they think that is the most relevant. It could be that this is the very page you do not want to be ranked higher at all.

For example: on the website of an web agency there is a services page with information about developing a website and a blog with information about why to develop a website. The text on both pages is almost identical and therefore there is a good chance that Google sees it as duplicate content.

This phenomenon is also called internal competition or keyword cannibalization. The pages cannibalize each other’s ranking, so to speak.

It occurs at various websites, but especially at web shops. It is often the case with web shops that you can take different paths to the same product page. How does this look like? Just look at the fictitious example below:

www.lawnmowerwebshop.nl/lawnmower/budget/lawnmower/edition-a

www.lawnmowerwebshop.nl/lawnmower/brands/moizasie/edition-a

Even tough the URLs are different, Google still sees this as duplicate content as the slugs overlap too much.

How duplicate content comes about

Actually, most website owners don’t intentionally create duplicate content. But that doesn’t mean it’s not there. In fact, more than 29% of content published on the web is considered as duplicate content.

Let’s look at some of the most common ways duplicate content is unintentionally created:

Copy and paste

This occurs at both internal and external levels. Internal duplicate content occurs by creating multiple pages with the same text. External duplicate content occurs by literally copying the content of another website onto your own website. This happens a lot with web shops. They receive standard texts from suppliers which are then copied unthinkingly. The result is that many web shops have the same texts in the search results. This is not beneficial for their SEO.

URL variations

URL parameters, such as click tracking and certain analysis code, can also cause duplicate content problems.

For example:

www.wpupgrader.com/blue-widgets?c… is a duplicate of www.wpupgrader.com/blue-widgets?c…&cat=3 “class=” redactor-autoparser-object”>

www.wpupgrader.com/blue-widgets is a duplicate of www.wpupgrader.com/blue-widgets?cat=3&color=blue

Session IDs can also be a common reason of duplicate content. This happens when each user visiting a website is assigned a different session ID stored in the URL.

Printer-friendly versions of content can also cause duplicate content problems when multiple versions of the pages are indexed.

HTTP vs. HTTPS or WWW vs. non-WWW pages.

If your website has two different versions (‘www.site.com’ and ‘site.com’, with and without the ‘www’ prefix), and the same content appears in both versions, you may also encounter duplicate content problems. The same goes for sites that maintain versions on both http: // and https: //.

Duplicate content is bad for SEO

As we told you in the introduction, duplicate content technically can’t get you a penalty. But that still leaves the question: how bad is duplicate content for SEO? There are a number of less-than-pleasant scenarios that can occur. We’ll discuss them below.

Google shows the wrong web page

When there are multiple pieces of similar content in more than one location on the Internet, it can be difficult for search engines to determine which version is most relevant to a particular search. In many cases, Google will have to choose between the duplicate content pages and show only one.

The search query has a big impact on how Google handles duplicate content. Imagine you have both an American and Canadian web shop and a potential customer wants to know your delivery costs. Accordingly, they then look up delivery costs with your website name. The only problem is that two pages are exactly the same and therefore Google will have to make a choice. Because the potential customer has not included a country name in the search query, Google will choose the page with the highest domain authority. And it may then happen that this is exactly the wrong page.

Weakened link strength of backlinks

In most cases, Google handles duplicate content very well, and it does not have a negative effect on your ranking. Still, it can affect link strength if another website links to your duplicate content. This is because the links that refer to your website don’t know where to link to. They get confused that the same content can be found in multiple places.

Instead of all links pointing to one unique page, the links are distributed among all duplicates. As a result, you will rank lower than if there had only been one unique page on the website.

Keyword cannibalization

We’ve given it as an example before, but another negative effect on SEO from duplicate content is keyword cannibalization. Because search engines are forced to choose between two pages, the one with the most authority is shown. So in the case of the example of the web agency, only the services page or the blog will be ranked high in the search results. It is very unfortunate when you have relevant information on both pages and only one is shown.

You are then literally competing with yourself. To avoid this problem, it is better to use unique content and a unique keyword.

Wasting your crawl budget

Every so often, Google goes through your website. This is called crawling. In this way Google knows what can be found on your website and what you have to offer. Using this information, Google can match a search query with the content on your website. Because Google uses a crawl budget, it is important that the right pages of your website are crawled. This is because only a maximum number of pages can be crawled. This is especially important for large websites as their crawl budget is more scarce.

In the extreme: a penalty

Duplicate content does not lead to a penalty, unless you are really out of line. Only in very rare cases, when you are deliberately manipulating the ranking and misleading users for instance, it can have an impact on the ranking of your website. In extreme cases, a website may be removed from Google’s index and therefore no longer appears in the search results. However, this can only occur if, for example, your entire website consists of duplicate content.

Detecting duplicate content

Solving duplicate content is not very difficult in most cases, but you need to detect duplicates first. There are several ways and tools to do this.

The easiest way: search in Google

A simple way to search for duplicate content is to take a piece of text from a page of your website and search for it in Google. If you put the piece in quotes, you can see if there is a website that has literally copied your text.

Duplicate content tools

If you want to do (large scale) research on internal and external duplicate content, it is wise to use a tool for this. There are many different tools to detect duplicate content. The ones we use ourselves are:

External duplicate content check

A handy tool that can help you check for external duplicate content is Copyscape. In their tool, enter the URL of your website and it will investigate for you if there are duplicates of your texts. Do you see in the results that someone has copied your texts? Then send the owner of the website a request to remove the texts.

Internal duplicate content check

With the tool Siteliner you can check your own website for duplicate content problems. However, you should make a distinction between duplicate content and content that you obviously repeat in multiple places on your website. Like your menu, footer and contact page for example. This type of content is also called common content and does not pose a threat to SEO.

Google Search Console: extensive audit duplicate content

If you want to do a more extensive audit for your duplicate content, we recommend the tool Google Search Console. With this tool, you can see that certain pages are not indexed because they are seen as duplicate content or if a duplicate page was found without indicating what the original is. You can also see which of the duplicates Google has selected as canonical pages. The handy thing is that when you click on these error messages, the location of the problem is presented.

SE Ranking: duplicate keywords

Another very useful tool for detecting duplicate content is SE Ranking. In addition to showing you which keywords your website is shown for in Google, this tool also shows you whether a keyword is linked to one or more URL(s). For example, it can happen that a keyword is linked to three different web pages. As a result, Google does not know which page should rank highest because all three pages are very similar.

How to prevent duplicate content

In the best case scenario, of course, you have no duplicate content on your website at all. It is better to prevent than to cure. This saves a lot of work. What is the fastest way to achieve this?

Creating unique content!

The only way to really rank well in Google is by creating original and relevant content. Content that is unique and relevant to both the search engine and the search engine user is really valued. Obviously, duplicate content is not.

When creating texts for your website, try to copy as little content from other websites as possible. Otherwise, your website will never rank high in the search results. This is also the best way to avoid duplicate content.

Has the damage already been done and have you discovered duplicate content? Not a problem! With the following tips, you’ll have your duplicate content fixed quickly:

Adjust content

Have you discovered a duplicate content error on two or more pages? Try to make them all unique by picking a different keyword and rewriting the text. It can take quite a long time for Google to assign new pages a certain ranking. Therefore, adapting the content of already existing pages is definitely beneficial. Existing pages are also already indexed and have a ranking, so Google will notice the changes faster.

Use redirects

Are multiple pages on your website linked to the same keyword by Google? Then create a redirect from all duplicate pages to the main page. If a visitor lands on one of the duplicate pages, it will then be redirected to the original source.

The big advantage is that because of the redirects, a large part of the value of the duplicate pages is passed on to the original source. This gives the original source a higher SEO score.

You can create a redirect with a plugin like Yoast SEO or through your htaccess file.

Use canonicals

With a canonical tag, you can let a search engine know that one or more pages are duplicates. Just like with a redirect, all the value is basically passed on. The big difference however is that pages with a canonical tag are still viewable. Meaning that a visitor is not automatically redirected to the main page, as is the case with a redirect.

A canonical tag can be placed in the source code of your website. Just like the redirect, you can easily add it with Yoast SEO.

No index

You can also choose not to have certain pages indexed. As a result, they will then not be shown in the search results. This can come in handy for ‘thank you’ or ‘old job application pages’, for example. You can choose to have the page still crawled but no longer indexed by using noindex + follow to maintain link strength. Do you want Google to ignore the page completely and all links as well? Then choose noindex + nofollow. Again, you can set this up with Yoast SEO.

Conclusion

Despite the fact that duplicate content in itself is not a reason for a severe penalty, it is too important to ignore. Therefore, we encourage you to work on it to improve your ranking and the user experience of your visitors. We wish you the best of luck.