Meta Robots Tag Audit Guide


A meta robots tag is a short snippet of HTML code at the head section of each webpage that instructs search engine robots what they can or can’t do.

There are many types of meta robots tags, but the most commonly used are the noindex, noarchive, and nofollow tags. If you do search engine optimization, particularly technical SEO, you will already be familiar with what these tags do. If you’re a webmaster, understanding what these tags do can be critical for taking your website’s search visibility to the next level.

These common meta robots tags instruct search engines to do the following:

  • This tag instructs search engines to not index the page, regardless of its contents or construction. This means that you absolutely do not want a noindex tag on a page that you want to be found on Google Search.
  • This tag instructs search engines to not cache the page. Noarchive tags are rarely used these days and have little bearing on modern SEO. If you see this tag on a modern webpage, chances are that it was added there by mistake.
  • This instructs search engines to not follow any links on the page. Links will remain functional for human users but will no longer pass on link equity. Because of the way they restrict link equity throughout a whole page, these tags are unusual for most SEO purposes.

Noindex vs Robots.txt

Robots.txt files and noindex tags both prevent pages from being indexed. So is there any real difference? Technically, yes. Robots.txt files prevent crawling while the noindex meta tag allows crawling (to allow robots to read the meta tags) and prevents inclusion in the search engine indices.

Where Do You Typically Add Noindex Tags?

Noindex tags are often added by SEOs to website pages that don’t make sense to show on search engines. Typically these pages will still be accessible through direct navigation or in-page links. Noindex tags are usually added to the head section of the following types of pages:

  • Tag archives
  • Author archives
  • Date archives
  • User-generated content
  • Image attachment pages

Why Do We Deindex Pages?

Once a website gets to a certain size, you’ll want to optimize the distribution of PageRank to avoid low-value pages from being indexed in Google’s search results. To ensure that search engine users only get relevant hits from your site, you only want to higher-value pages to be indexed.

Another reason is Google’s crawl budget. Google’s bots will only crawl a limited number of pages. If you have a medium or large website (10,000 or more pages) or a page with daily updates, you want to maximize that budget and make sure that only high-value pages are crawled and indexed.

How We Audit Meta Robots Tags

If you want to ensure the best possible search engine visibility and user experience, you’ll want to periodically audit your meta robots tags. Here, we’ll show you the typical meta robots audit process we use at SearchWorks.

We use Screaming Frog to do our meta robots audits, but other similar SEO tools should work broadly the same way. Be sure to follow the video starting at 12:32 to get a better idea of what’s going on.

Step 1: Download and Open Screaming Frog

Screaming Frog is a traditional downloaded app that could be used on Windows or Mac OS devices. There’s a free trial version that limits the number of pages you could audit as well as a paid version that lets you audit as many as you need. We recommend getting the paid licensed version of Screaming Frog or similar apps if you plan to do technical SEO at a scalable level.

Step 2: Set Mode to “Spider”

Spider mode simulates the crawling of search engine robots, allowing the app to return site data similar to what real search engine robots may gather. In this case, we want the app to simulate the crawling and indexing behavior typical of search engine robots.

Step 3: Crawl Your Target Website

On the field labeled “ Enter URL to spider”, enter the top-level URL of the website you want to audit. Hit “Enter” or click on “Start” and wait for the app to crawl your website. In a few moments, it should return your data. If your site is especially large, it may take several minutes to complete the crawl.

Step 4: Set Filter to HTML

Even before the crawl is completed, you should have some website data to look at. To make things easier to understand, set the filter to HTML so that you only see webpages. Like any search engine robot, Screaming Frog’s spiders will crawl through a variety of online assets, including download files and PDFs. Thus, setting the filter to HTML will make your data easier to manage.

Step 5: Export Your Data

Data Information Report Statistics Strategy Concept

Once the app has finished crawling your target website, it’s time to export your data. Exporting your data to a spreadsheet like Excel or Google Sheets will make it easier to filter out the data you don’t need, speeding up your audit.

Find and click the “Export” button. You can check 15:37 for reference so you know where to find it. Save your data as a .CSV file and give the file a name that makes it clear what it is. Make sure to save your file somewhere you can immediately find it.

Step 6: Open Your File in Excel

Open your file in Excel or on your preferred spreadsheet app. You should be presented with a lot of data. While all of this is useful, for a meta robots audit you only need a few data columns.

Step 7: Filter Your Data

Filtering your data will reduce the odds of errors and make your audit much simpler. Delete or hide all the data columns except for the following:

  • Address
  • Indexability
  • Indexability Status
  • Meta Robots 1

Step 8: Sort the Indexability Status Column

You can sort the data any way you want to but we prefer to just sort this column alphabetically. This will group your data according to their indexability status and make it easier to flag problem pages.

Step 9: Check Each Non-Indexable URL to Ensure That Noindex Tags are Intentional

Currently non-indexable URLs should be checked to make sure that they are really intended to be invisible to search engine users. To reiterate what we mentioned earlier, noindex tags are most often applied to tag, author, and date archives, user-generated content, and image attachment pages. You can noindex whichever pages however you want but generally you will want to noindex those page types and remove noindex tags from pages that you want to be crawled. Flag any URLs that need their noindex tags removed.

Step 10: Check Currently Indexable URLs

Your job does not stop at checking non-indexable pages. Sometimes, you’ll have indexable pages that need to be given a noindex tag. You’ll want to make sure that this tag is properly applied to account for Google’s crawl budget and to optimize your site’s indexability. Again, flag any URL that needs to be updated.

Step 11: Log in to Your CMS

Log in Secured Access Verify Identity Password Concept

Now that you’ve flagged the problem URLs, it’s time to start editing the robots tags. To do this, you’ll need to get into your CMS. This process will be different depending on the CMS you use and the modifications and plugins it has. In any case, you’ll want to log in to your CMS and start editing the HTML tags.

The site we audited in our example on the video is on WordPress, the most popular CMS out there by far. The great thing about WordPress is that you can use a variety of free plugins to easily optimize your website’s SEO.

Step 12: Check If Your Noindex Settings Are Correct

In our tutorial starting at 22:06, we went into the example site’s WordPress CMS, which already has the Yoast SEO plugin installed. This plugin will make it easier to sort and edit the URLs we flagged earlier.

If Yoast SEO is installed, find the “SEO” option on the left-side menu, click on it, and click “Search Appearance”. Click the “Taxonomies” tab. From here, you should be able to specify whether certain pages are searchable on Google and other search engines according to categories, tags, or formats.

After you verify that the Taxonomies settings are correct, you can move on to the “Archives” tab. From here you should be able to specify whether you want the author and date archives of your site to be visible on search engines. Whether you want these pages indexed or not all depends on your preferences and overall SEO strategy.

Step 13: Edit Individual Problem Pages

If you have flagged individual problem pages that can’t be batch edited with Yoast SEO, you can still edit them individually. First, make sure you’re logged in to WordPress. Next, find the page and find the “Edit Post” option. Alternatively, you can go to the WordPress dashboard and find the page you need to edit through the “Posts” option on the left-side menu.

Once you’re in the post editing screen for the problem webpage, scroll down past the page content editor box until you find the Yoast SEO section. Find the “Advanced” drop-down panel.

Once you’ve expanded this, you should see an option labeled “Allow search engines to show this Post in search results?”. Select “Yes” if you want the page to be indexable and “No” if you want it to be tagged as noindex. Note that immediately below this option will be to a way set nofollow tags for the specific post as well as options for adding other meta robots and canonical tags.

Final Thoughts

We run a meta robots tag audit as part of our initial website audits before doing other SEO activities. This simple yet sometimes overlooked step can help ensure that a website stays within Google’s crawl budget and that its publicly-facing pages are visible on Google Search. Thus, getting meta robots tags in order is often a requisite for better search visibility as well as more relevant results for search engine users.

While there are a fair number of steps involved in auditing meta robots tags, they’re actually very intuitive and straightforward. If you have an SEO-friendly CMS and tools like Screaming Frog, then the job can be made even simpler.

If you want to learn more about technical SEO or need the Philippines’ most respected SEO agency to work on your site, we’re ready to help. Set up a meeting to begin dominating your niche on Google Search.

Glen Dimaandal
Glen Dimaandal
Glen Dimaandal is the founder and CEO of SearchWorks.Ph. He has been doing SEO since 2008 and is consistently featured in mainstream media and industry conferences. His core skills include SEO, SEM, data analytics and business development.
Glen Dimaandal
Glen Dimaandal
Glen Dimaandal is the founder and CEO of SearchWorks.Ph. He has been doing SEO since 2008 and is consistently featured in mainstream media and industry conferences. His core skills include SEO, SEM, data analytics and business development.