Google Panda now forms part of the main Google algorithm which updates on a continuous basis. That means it is possible to be hit at any time at all.
Panda looks for websites than contain:
By the way, if you are wondering why some sites with thin content never seem to get hit by Panda skip to How sites with thin content avoid Panda.
Panda's intention is to rid the search results of web pages and websites which are really no more than a bunch of advertising banners and otherwise contain little of their own content, or little content of any merit.
However it is possible for eCommerce websites or sites that contain mostly graphics to get caught up in the cross fire if they are weak in other areas such as user behaviour metrics (bounce rate, average time on site, etc). In other words if a website appears to have thin content and it appears that internet users don't like it very much it is open to a Panda hit.
Obviously if you are taking content from someone else's website and putting it on yours Google feels no reason to index your pages when it already has the same information from elsewhere.
But it should be recognised that copying is not always carried out in an evil way (like scraping). Legitimate copied content can occur because:
Webmasters of sites like those above who are hit by Panda often ask, "Why me and not them?". The answer to that is in user behaviour - internet users respond better to the other sites and so Google feels they can stay but you should go. You are not bringing anything new to the party and the party is a great deal more vibrant on other sites!
Google also has a strong leaning to index whoever posted the information first and discard those who copy later. It isn't perfect at this and there have been cases where the copier is indexed and the originator dropped but they are rare occurrences.
Whatever the circumstance skip to Panda recovery options for ways to address copied content.
It is possible to serve up the same content on multiple pages within one website without realising it. This is most commonly found when the .htaccess file is used to make search engine friendly URLs creating duplicate links. In Wordpress this can happen without the webmaster every touching the .htaccess file because of the way the system is structured. Here's an example:
In this case we have three URLs all showing the same content. To a search engine like Google this is duplication and possibly a website trying to make itself look larger than it is.
To find out if this has happened to you check in Google Analytics under Behaviour -> Site Content -> All pages. If necessary download these into a spreadsheet and work through the pages looking for duplication issues.
While the above might be an innocent mistake many sites are blatant in duplicating their content, calling the same (or very similar) information from a database on multiple pages in the hope of fooling search engines into thinking their website has a far greater number of pages than it actually has.
Whether or not you are innocent or guilty the method to repair the website is the same. Skip to Panda recovery options.
Before you direct too much time and effort to recoverying your site from a Panda hit it is worth taking a moment to check it is actually Panda you are trying to recover from!
A sudden loss of rankings and traffic over 1-2 days could suggest a Panda hit, or a Penguin hit or both. (Skip to Have you really been hit by Penguin? to consider if this is what happened instead.
However a drop in rankings and traffic coupled with the fact that you know your website's content is thin or copied or duplicated is enough to conclude that you may well have been a target of Panda. Skip to Panda recovery options.
If this particular event was not Panda, but you know your website's content is thin or copied or duplicated you could be a target in the future. Skip to Panda proofing your site.
It is critical to understand that Panda penalise websites just because of thin, copied or duplicated content in isolation. It also looks for confirmation that internet users themselves do not like the site. Such signals may take the form of:
In other words the Panda goes through a two step process. Firstly suspecting the website is not worth ranking via its own mathematical calculations but then cross checking this with how internet users react to the site.
Hence it is possible for a site with very thin content to rank extremely well. Say the website of an artist or cartoonist that is mainly graphics but has:
Before you can recover from Panda you need to know why you have been hit. Is it thin content, copied content, duplicated content or a mixture of all three?
You also need to consider the "Panda Scales". You have not just been hit because of a content issue, but because user behaviour metrics on your site were very poor. Visitors did not stay for long, backed out quickly, failed to find your content worth linking to or sharing, etc. so you will need to consider the design of your site during its repair.
Could you add some clearer calls to action so visitors stay longer? Is your website showing up correctly in all the major browsers or is it a jumbled heap in Internet Explorer and Safari so visitors end up pressing the back button in an instant? Is the layout clear and the navigation easy to use?
Your answer to many of these questions will be found in Google Analytics where you can study the behaviour of your visitors in great detail. Here you can identify the weak points of your site and address them for the future.
To recover your site you should follow the steps in the section Panda proofing your site.
One of the worst pieces of advice given out to a Panda hit site is that "content is king" and so the webmaster should add more "content". Valuable, original and engaging content is king, waffle for the sake of waffle is a killer.
Better than content is engagement (sometimes called Web 2.0) where the user can actually interact with the site - forums, comments, asking questions, contributing to content, etc. However these techniques need to be handled with extreme care to avoid:
The ways to recover from a Panda hit are covered below. The intensity in which you apply them is one factor in how long it will take to recover but ultimately it will depend greatly on whether or not you can make your site more engaging to internet users than those of your competitors. Here's an example:
Here's a more positive example:
Whether you have been hit, or you just don't want to be hit, the techniques are the same and the main focus here is on improving user behaviour on your site. In all cases making them stay longer and persuading them to go deeper. Ideally getting them to engage be this through commenting, newsletter sign up, purchasing a product, etc.
We're following a logic here: People who spend longer on websites must by definition like the content and therefore they are more likely to share it on social media or link to it in other ways.
Now many make the error of thinking, "Well why don't I just build the links and social media shares myself and that will prove how good my website is."
I'm not talking here about buying 500 links for $50 or 1,000 Facebook likes for $20. Those days are gone and you're more likely to get hit by Google Penguin for doing that. But you could hire a professional link builder to help you gain some good quality links and social media mentions.
Why don't you? Because search engines can see through that as well. They will be thinking, "Why are there links appearing to a website that no one actually seems to like when they arrive there?" And they would be right to think that ... it doesn't stack up.
Your key to monitoring your success in improving user behaviour is firstly within Google Analytics. This will not only allow you to track your bounce rate and average time on site for the website as a whole, but you can also drill down to find the weakest points and address them. Importantly look at:
You'll find strong overlaps. Many of the pages with the worst bounce rate also have the worst average visitor time. Overall the question is "How to make these visitors stay longer / go deeper than they do on competitor websites?"
Secondly you'll need to be looking at your Google Webmaster Tools, specifically at your click through rate for different keywords and phrases. Here your question will be "How can I make my entry in the search results more attractive so that more people choose to click through to my website than those of my competitors?"
If your rankings are gone because you have been hit then you will want to use Google Adwords to appear in the paid results and then ask the same question of your advertisements. This is because if an advertisement proves very popular (lots of people click on it) for a certain search query then the landing page of that advertisement will rise in the organic results. A couple of definitions:
Now you know what you are monitoring it's time to address the key areas of weakness your website may have.
The kneejerk reaction is to add "content". The amount of pointless waffle I have seen added to websites is phenomenal and this actually makes things worse. Internet visitors confronted with huge blocks of dull text reach for the back button just as fast as those who do so when they see no content.
Never think that "adding content" to please the search engines will solve your issues. Only adding valuable, original and engaging content in an easy to read and navigate design will do the trick.
To this end you have two choices:
The first option sounds the easiest but it is far from it. Even if you do not have a large list of users (such as Facebook fans) that you can get to and engage with fast most people are reluctant to spend their time adding a comment where no one else has or joining a forum which seems deserted.
In the meantime you may find yourself overrun with spam from robots and people trying to get links onto your site because you have allowed some sort of contribution system.
Creating a two way system where users can add content to your site requires a careful build up, launch and very strict monitoring. Very often it also required you to add some of the content yourself to get the ball rolling. Even Yelp! in the early days had to pay people to write reviews as a way of persuading ordinary users that this was a happening site worth contributing to.
So this might leave you realising that it would be better to create on page content yourself. But can you?
Some people can write, and some just can't. If you can't you will need to outsource.
Outsourcing content is not easy. You need to find a talented writer that has some passion in your field. Gone are the days when you could pay an India based firm to spiel out "stuff" about "stuff".
Let me give you an example. I was recently approached by a company in Qatar that lists information about places to eat out. They were looking for a ghost blog writer to give their site more content. I turned them down.
That's not because my team or I are too good for this work, the truth is we would not be good enough. What they needed was someone who could connect with people who live in Qatar and like eating out. That someone would need to be local in order to share the in jokes and make comments only a local could (the weather last Thursday, how crowded that street was on Tuesday, and so on). That someone would need to have been in some of these restaurants to make comments and observations that only a customer could have made.
Then, and only then, would readers of that blog connect and say "this person is one of us, he makes me smile and I find much of what he says turns out to be true". This builds a loyal following and positive user behaviour. Bland ramblings from a content house half way round the globe doesn't.
Panda does not mind copying in itself but if your page contains the same information as someone else's, and they wrote it first, what is the point indexing yours?
Copied content is not evil if it is useful. You could, for example, have created a site which gathers together all the information that comes out about fly fishing (referencing the original sources - that's important by the way). OK, it is all copied content but that is no issue if it is useful and user behaviour will tell you if that is so.
Thus if you can give this copied content a twist then your not going to have a Panda issue.
Earlier on this page I gave the example of a website that contained lyrics of songs but allowed users to comment on what they thought the lyrics meant. This gave the site original content to balance the copied content. More than that it encouraged visitors to stay on the site longer to read the interpretations of others and add their own thoughts.
This is not an idea that came out of my head by the way - it has been done. But it is that kind of innovative thinking that takes a website full of copied content and makes it absolutely fine in the eyes of Google's Panda.
Now with our Fly Fishing website we could allow commenting, or start a blog with the webmasters thoughts on the news coming through, or start a forum, etc. But first we need to decide if it is necessary. If Google Analytics shows people come to the site (both new visitors and returning ones) and stay then it is clear they appreciate the way you have pulled various pieces of information from the web into one handy location and you are unlikely to get any issues from Panda.
On the other hand if you have a very short visitor time on site, a high bounce rate and few people return a second time then you do have a very big issue. At the core internet users don't appreciate what you have done so you had either better knock the whole idea on the head or, as mentioned above, find a way to add something that people do appreciate.
Duplicated content issues are fairly easy to address although they can take a fair amount of time to resolve depending on the size of your site. Here's the step by step guide, remembering you need to follow all the steps:
STEP ONE: on each page, in the header, place a canonical tag. It looks like this:
<link ref="canonical" href="http://mysite.com/thispage/">
This is supported by all the major search engines and says "No matter what it says in the URL remember this page as http://mysite.com/thispage/. So if they found the page through http://mysite.com/pages.php?page=thispage they know you really want it to be indexed and remembered as http://mysite.com/thispage/.
For large sites you may need to create some scripting to achieve this, especially if you are drawing large amounts of information from databases.
STEP TWO: make sure you have a sitemap.xml file and go through it looking for cases where you may have listed the same page twice as different URLs. Remove the one you don't want.
Automated sitemap.xml generators are notorious for doing this, I recommend you avoid them like the plague. Either write the code to make your own sitemap, get a plugin (wordpress) or make it manually.
Matt Cutts from Google has said thaty when Google finds two URLs offering the same content it will reference the sitemap to see which one the webmaster actually wants indexed. Hence if you say you want both you are in trouble!
STEP THREE: finally you need to seek out any links in your website that are pointing to incorrect URL formats. You can use the Canonical tag all you like and you can have your sitemap.xml as clean as a whistle but if you, yourself then link to a different URL for the same page on your own website, search engines will get suspicious.
From Google Analytics you can download a complete list of pages from your website that Google is aware of. When you see ones with bad URL formatting search out on your own site where those are being linked from.
STEP FOUR: Only when you have completed steps 1 - 3 should you reach for your .htaccess file and insert 301 redirects. A 301 redirect tells search engines a page has moved permanently.
So with a 301 redirect in your .htaccess you would say http://mysite.com/pages.php?page=thispage has now been permanantly moved to http://mysite.com/thispage/.
Again, make sure you back this up by having no links on your own website pointing to pages that then 301 redirect or you will have the search engines frowning at your activity - "why do you say a page has permenantly moved and yet still link to it ... suspicious!"
If you have read and digested much of the above you will understand clearly that thin content and copied content are not always sins while duplicated content issues can be cleaned up technically.
Google's Panda won't blink an eye at a site with thin or copied content if humans say not to. Googles aim is about providing the best possible user experience. If it does not there are plenty of other search engines ready to fill that void. So a website that has positive human behaviour stays, no matter what Panda thinks!
Let's go back to my Fly Fishing example above - a website that just contains copied articles from here, there and everywhere (with the original article references - important!) about fly fishing. On the face of it that should be a prime target for a Panda slap. But let's say:
That's enough for Google's Panda to shrug its shoulders - people love it, copied or not - time to go and slap someone else!
In a sentence, for websites with thin or copied content: The better your user behaviour, the less likely you will be hit by Google Panda.
I've given a few examples above but here are some other ideas to keeping your visitors on your site and doing the right things:
Not only do these measures all help make sure your visitor stays longer, the also make it more likely they will share your content on social media or create links from their blogs/websites back to yours. Yes, ultimately link building is a part of user behaviour so build the right content and design so your visitors will take care of your backlinks for you.