Are you getting different results when you search “site:www.yourdomain.com” and “site:yourdomain.com”? Maybe you have 100K pages indexed, but only 18K that should be in Google or your www version has 50K and the non www has 200K. Here is how to find, trouble shoot and fix these SEO issues with “site:” searches and a couple of tools to make life easy.
- Which Issue Are You Having
- Different amounts of pages
- More pages indexed than there should be
- Devalued content/site visibility
- Finding the issues:
- Use your keyword tracking tools
- Google analytics
- Search your CMS
- Solve the problems:
- Add www and non www to search console
- Set up redirects to your preferred version (extra bonus tips in this section)
- Canonical links
- No index queries
- Figuring out which pages should rank for what keywords
These site: issues can result in numerous things from devaluing your site to thin content, duplicate content and an overall reduction in search engine visibility. That’s why it’s important to fix them. Although there is not a one way fixes all solution, here are some starting points. The post is broken out by the three most common issues I see. After I go into determining the cause of the issues and ways to help resolve them.
As we’re jumping into this post, some of the links below are affiliate links (because duhhh) and I will earn a commission if you shop through them. If any of them aren’t and a program opens in the future, I can assure you I will be joining and changing them out. It won’t affect your prices at all.
Which Issue Are You Having?
Different amounts of pages:
The first thing to do here is to go to Google and type “site:yourdomain.com” and “site:www.yourdomain.com” into the search box. If one is giving you more than the other, site:www.yourdomain.com gives 400K and site:yourdomain.com gives you 200K, then you have a page indexing issue. These are extremely common so don’t worry. I go into a lot of detail below to trouble shoot and resolve this.
More pages indexed than there should be:
The next thing to look at is how many pages are indexed on your preferred version (www or non www) compared to what should be indexed. The important thing to know here is that MORE DOES NOT EQUAL BETTER. In fact, in many cases, more indexed pages could actually be worse for you and your site.
If you only have 18K pages with good content like blog posts, category/product pages, how to guides, etc… but you have more pages indexed, your site can become “thin” and may not appear as well in the indexes. You could also have pages competing for visibility which may reduce your ability to rank.
The important thing here is that MORE DOES NOT EQUAL BETTER.
Devalued content/site visibility:
If you had stable rankings, have not gotten a penalty, but have noticed your pages or specific categories start to slip, this could be something that you trouble shoot in a similar way to the “site:” SEO issues. Some of the time when I see rankings slipping but there’s no explanation and the competitors aren’t doing anything new, it’s because of competing pages. I’ll share a tool to find this and how to tackle it in the post.
Finding the issues:
Now that you’v determined your site has issues with site: searches, it’s time to find what is causing these issues. There’s a few places to look. The first thing to do is to figure out the issue, then start using the strategies and resources below.
Here is how to trouble shoot why you have more pages or too many pages showing up on a site: search in Google:
- Figure out how many pages are indexed
- Get an estimate of how many pages should be indexed by auditing your CMS and databases to include:
- product pages
- blog posts
- important company info (press room, about us, homepage, etc…)
- compare that to what is showing up (and see if it is more or less)
- if its more, you need to start narrowing down the excess pages
- if its less you need to begin finding out why you aren’t being crawled and indexed right
- now compare that to the alternate version (www vs. non www)
- read below to begin fixing the issues
Use your keyword tracking tools:
I cannot stress how much I love this tool in particular. They’re also going to be launching some new features which will help with even more of these issues as well as negative SEO in the hopefully near future. (I link to it again at the bottom of the post so you don’t have to re-find it). This one can be used as well, but the other one makes it a lot easier…if you don’t have either then I recommend you try them both out. It isn’t as good for these solutions, but I do recommend it for many other things.
When looking at the ranking reports in the first tool you want to see how many pages you have showing up for specific keywords. If there is more than 1 page per keyword, you have an issue. If they are duplicated pages, look at the xrobots and robots.txt sections below. If they are unique but similar pages, look at the “figuring out which pages should rank for what terms” section.
Go to Aquisition > Channels > Landing Page > Source then sort by Google.
Now click on the open in a new window box that is showing up and see if it is the right URL that should be indexed. Sometimes you’ll discover that other versions have been created and could be causing anything from duplication issues and competing pages to thin content. I come across this fairly often when doing site audits.
Search your CMS:
You’re probably thinking what does my CMS have to do with this? It only hosts the content. That’s the exact issue. From custom built systems to wordpress, your CMS spits out the URLs, uses tags and creates the pages that show up in the wild. That is why it is important to know how and why it is producing pages.
Tags for example could be creating duplicate or thin content if they aren’t blocked. User generated content (UGC) that is controlled or placed by a CMS within your website or blog (forums, guest blogging accounts, reviews, feedback, etc…) could be issues. The excessive pages that aren’t being blocked could also be leftover from years ago or a previous person at your company.
Want more potential indexing issues based on your CMS?
You could have category copy being pulled over to new pages on search queries without canonical links. If search queries ae indexable, this could create thin content as indexed pages (i.e. product mixes with ugly URLs that have no unique content or actual purpose to the site). Allowing these to be found and crawled at random may be wasting the amount of crawl time and bandwidth your site gets which could also lead to less indexing on your most important pages. This could also be one of the reasons why not all of your pages are not showing up. Although I’m not going to go into how to create a better crawl for your site to find the right pages (read this post on internal linking structures and other posts on this site), I am going to cover the rest of it to get you started on finding solutions to these issues.
Solve the problems:
Now is the fun part. You hopefully now know what the issue is, so we need to create a solution that works specifically for this issue. Below you are going to find multiple ways to either block pages from showing up, guide Google to finding the right version that should show up and also how to help prevent specific pages, categories and parameters from being crawled. At the end of the post I’ll share a couple of tools I love that make life easy with this stuff.
Add www and non www to search console
The first thing you need to do is see what Google sees for both the www and non www versions of your website. Not only will search console give you different information for both, but you can find different backlinks, crawl rates, impressions and a lot of other things. If you don’t have this, go do it now. I’m repeating it multiple times for a reason.
Set up redirects to your preferred version
Now go to your tech team and set up a 301 redirect to the preferred version of your website. If your branding is YourDomain.com or www.YourDomain.com, choose the one that you prefer. Everything else needs to automatically render to it. This includes https vs http.
Bonus: Make sure the redirects cover uppercase/lower case on the main domain, trailing “/” and other URL endings. If your site has duplicate pages that can be indexed because one has a “/” at the end or versions that use capital letters and lower case are inside Google’s indexes, set the redirects on these too. They could cause duplicate content and competing page issues. i.e. YourDomain.com should redirect to yourdomain.com and yourdomain.com/Blog/I-Loves-Me-Kitty should redirect to yourdomain.com/blog/i-loves-me-kitty if you prefer all lower case to caps.
Canonical links are how you tell Google and other search engines which version of a page to index. Go to your excess pages (tags, search queries, etc…) and look at the source code to see if canonical links are set.
Bonus: Make sure that if you’re on a secure site that the canonical is also https. If you’re on the mobile version and prefer the desktop as the main (until mobile first launches officially), then make sure it is pointing to the right place.
Canonicals can also help to reduce the amount of excessive pages, especially if you have multiple versions of a product (same product with different sizes but not enough variation to justify a separate page…especially if you have size and variation options on the product page).
If you’re able to get your tech team to update your Robots.txt, you can disallow folders here like /search/ (replace search with the folder that is generated for search queries) to help block Google from crawling that folder and wasting your available crawl space. You can also blog tags from blog posts, secure areas, thin content areas (if you have categories for UX but they are thin and shouldn’t be indexed), UGC, etc…
If you want to see if a specific folder is indexed, look at your URL structure, go to Google and type in the base url up to the folder site:yourdomain.com/blog/folder/
If you can get your tech team to go one step futher, have them set xrobots on pages that are being indexed but shouldn’t be. xrobots let you tell Google to do things like no index this page but follow the links to the important ones. Important ones could be the proper version to show, they could be important pages within your site like high level categories and also help to increase your crawl efficiency.
No index queries
One thing I learned from Alan Bleiweiss a while back is a trick to no index parameters. I’m linking to his site here so you can check with him (and contact him for an SEO audit if you don’t want to use me since he is amazing!). It was also part of his State of Search presentation a couple of years back (slide 24).
Figuring out which pages should rank for what terms
Now we’re back to one of the tools my agency could not survive without. Click here to go buy it…because its seriously worth it. What I love about this tool in particular is that it not only gives you accurate ranking data (including when search boxes, answer boxes, local packs, etc… are included), but it also expands to show you which pages on your website are actually showing up. Not many people realize how valuable this is.
Here’s a common issue. If you always ranked for “blue widgets” or another fictional term, but now started to slip, this could help resolve it. Many marketers blame it on competitors or amazon, but the reality is it could be your own efforts. Go into your rankings and see if other URLs are showing up and competing with your own site.
Maybe your copy team decided to go after multiple version of blue widgets (travel size vs. family size) and created multiple pages with very similar content. Sometimes you could be adding additional content or refreshing category content and accidentally created something that could compete with other pages. Now Google thinks these pages are similar and is making them compete with each other. All of these could cause rankings to slip or for you to fall off the first page.
Once you know which pages are showing up for the queries, you can look at each and begin determining which should show up for what terms. If they’re all similar (same shirt but one is in red and one is in blue), then my recommendation in most cases is to combine them and set the proper links for Google. This also goes for sizes and other smaller variations, especially when you have them available or in a drop down/select alternative options on the same page.
Now you want to determine which page should be showing up for those queries and either use canonical, no index, etc… from above or figure out what the second pages should be showing up for.
The original page should be showing up to buy the product and the new one is better geared towards using the product (instead of selling a sub-pump for your basement, it’s how to install it). Take that page and adjust it (title tag, content, code, etc…) so that it makes sense to show for different queries and so that it does not compete with the page that should be showing up for the original query. Now go into search console and have Google fetch the page again.
Before or while you’re changing the copy, etc… go into your keyword tracking tools and add the new queries so you can monitor what happens to the pages. While Google and other search engines begin finding and indexing this new content, you should hopefully see it begin to drop off of the old queries so they can recover and hopefully start to climb for the new terms.
Issues with site: searches seem tricky, but they’re actually fairly basic. The thing to remember is a proper diagnosis. More does not equal better or more opportunity and then determine what should be in the search engine indexes and what is excessive. Now block the excess pages, set the proper redirects and everything else from above and watch to see them adjust to what should be there. You can contact me if you’re having issues with this and don’t forget to buy the tools to help if you are able.
Tool 3 (check your canonicals and .htaccess)