Good implementation of proven technical SEO techniques is one of the foundations of a successful SEO campaign. A well-performing site that judiciously allows the indexing of the right kinds of pages in its domain will almost always have a leg up over non-technical SEO savvy competitors in Google’s SERPs.
Far too often, SEOs overlook the value of technical SEO and focus more on areas such as on-page SEO and link acquisition. While those SEO facets are crucial in their own ways, doing technical SEO correctly maximizes the impact of every other thing you do in a campaign.
If you’re looking for an up-to-date guide on how to perform a comprehensive technical SEO audit, this post is for you. We’re listing all the areas that you need to look into, what exactly to look for and how to fix common issues in any given website:
Traffic History Check
When you’re starting an SEO campaign, it’s only natural to have a future-oriented mindset. After all, your efforts will be judged primarily on the basis of organic traffic growth, organic-driven conversions and rankings for target keywords. It’s also natural for you to plan your campaign on what you’re seeing right now. In theory, fixing all the issues in front of you should make for greater search visibility when Google catches on to what you’re doing.
In reality, that works most of the time but not always. In some cases, significant events in the website’s past can negatively affect the site’s current performance in the SERPs. Your client might not be aware – or isn’t telling you – that something they did prior to hiring you negatively impacted their search visibility. Penalties, redesigns, domain migrations and other major events could have led to a major drop in organic traffic which, in turn, prompted the client to hire you in the first place.
Knowing exactly how the website performed historically will help you uncover problems which may still be haunting the site to this day. You can do this by:
- Examining Google Analytics organic traffic data. A 2-year data range is ideal. Look for peaks and valleys in the graph and determine when they happened. Often, these traffic fluctuations are caused by major changes in the website or changes in Google’s algorithms.
- Interview the client on any changes that might have been done to the site at that time. Redesigns usually produce significant organic traffic fluctuations. However, domain migrations, penalties and the launch of new sections can also have the same effect.
- Check for Google Updates. If no significant changes were made to the site at the time, check for possible Google updates and algorithm rollouts that might coincide with the traffic drops and spikes. I recommend using Moz’s Google algo tracker for that as it’ one of the most complete on the web and Moz tends to have supplementary information about the nature of the updates.
- Correlation Between Updates and Site’s State at the Time. If organic traffic fluctuations coincide with an update, read up on what the update is about and see if your site might have been among those affected.
- Check the Link Profile. If the update is link-related like a Penguin refresh, you can examine your link data using Google Search Console’s Links to Your Site report. If you have subscriptions to Moz Pro, Ahrefs or Majestic, their link data could also yield valuable clues.
- Investigate the Website. If the update is more technical or on-page in nature, you can manually examine affected pages. If you suspect that a change in the past triggered the traffic fluctuations and the website no longer looks like it did at the time when the update impacted the client’s site, you can use archiving tools like Wayback Machine to try and see what the site was like when it potentially got hit by an update.
Here’s an example of a website from a client we recently on-boarded:
If you look at this organic traffic graph from November 2018 to January 2021, it looks somewhat normal given the annual US holiday slowdown. You could reasonably figure that you can start working on it and get some results going in a couple of months or so.
However, a broader look at its Google Analytics organic traffic data will show that in the past couple of years, this website has had a series of traffic declines which indicate a far more serious problem than what the initial data range would have led you to believe:
It turns out that a series of redesigns have thrown the site’s technical and on-page SEO out of whack. In this case, the client did not mention it prior to on-boarding and it took a bit of digging up for us to uncover and start fixing.
Google Search Console Setup
Google Search Console is a free and useful platform from Google that provides a variety of different reports related to your website’s visibility in the SERPs. You’ll want to ask for access to it from your client so you can perform a more thorough website audit. If it hasn’t been set up yet, ask the client if you can do it for him. The process is similar to that of Google Analytics where a string of code just needs to be inserted to verify website ownership.
You can use this guide from Google to get it up and running.
Robots.txt Validation
The robots.txt file is a tiny document used by ethical web spiders to determine which pages they can crawl and which ones are off limits. While it may seem like a good idea to have search engines index as many pages in your site as possible, this just isn’t a good idea. Pages that are of no interest to the general public should be kept off the search engines. The same goes for pages that you want to keep under wraps: you don’t want your admin login pages floating around on the web, inviting hackers to take a crack at it.
The most common types of pages that you’ll want to see blocked off using robots.txt would be:
- Dynamic URLs – These are easy to spot as they have the unmistakable “?” and/or “=” characters. These are usually generated by your CMS when internal searches or filters are used by users.
- Checkout Path Pages – These are usually dynamic as well and are generated when users make their way through your payment system.
- Backend Pages – These include login pages to your CMS’ back end.
If you don’t already know how to write disallow parameters on the robots.txt file, you can pick up the basics here.
Blocking bot crawls on these pages allows you to conserve your site’s crawl budget for more important pages within its domain.
Generally, we recommend using robots.txt over the noindex tag on these types of pages because a robots.txt disallow parameter prevents a page from being crawled in the first place. A noindex tag may prevent indexing, but it still allows a page to be crawled and expend some crawl budget in the process. Since these pages are not part of a typical website’s navigation paths, it makes more sense just to block them altogether with robots.txt.
Note that not all pages which you don’t want indexed should be blocked by robots.txt. There are cases where noindex tags make more sense as we’ll discuss in the next section.
Meta Robots Tag Audit
A meta robots tag is a simple string of code that’s inserted into the <head> part of a webpage. It gives search engine bots instructions on how to treat the page from indexing and crawling standpoints. As stated earlier, these tags don’t prevent bots from crawling a website but they can tell the bot that a particular page shouldn’t be shown in the SERPs, among other things.
The most common meta robots tags you’ll see on webpages are:
- Noindex – Prevents the indexing of a page and its listing in search results. This means that all pages it links to will not receive link equity. However, if the tag is modified to “noindex,follow” the page still isn’t indexed but will pass on whatever link equity it has to the URLs it links to.
- Nofollow – Prevents all the hyperlinks in the webpage from passing along link equity to their destination URLs.
- Noarchive – Prevents Google from storing a cached version of the webpage.
You can audit the meta robots tags of a website by using site crawling tools. There are plenty of options out there but we prefer the Screaming Frog SEO Spider. If you have the full version, you can try crawling your site using spider mode and filter the data to HTML so you only see webpages.
If you don’t have the full version, you can just extract all the URLs in the site’s XML sitemap and crawl them using Screaming Frog’s list mode. Here’s how:
- Go to your site’s XML sitemap. If the main XML sitemap contains a sitemap index, open each child sitemap using your browser.
- In each child sitemap page, hit CTRL+S to save the sitemaps as XML files.
- Open the XML files using Microsoft Excel. You should see something like this:
- The URLs on the left side are the URLs listed in the sitemap. Just copy all of them to your clipboard.
- Open Screaming Frog and set it to List mode.
- Click Upload List and choose Paste. Wait for the crawl to finish.
- Export the data as a CSV file and open it on Excel.
- Go to the Meta Robots column to see the meta robots tags on each page.
Sort the column from A-Z to see which pages contain what types of tags. Generally, you’ll want to watch out for the following:
- Tag, Author and Date Archives – While these archive-type pages are part of a typical blog’s navigation scheme, they don’t really hold a lot of unique value. However, they might receive an inbound link here and there which you’ll want to put to good use. Make sure these pages have the “noindex,follow” tag so they can pass link equity while still being kept off the SERPs.
- User-Generated Content – If you allow UGC on your site but don’t have tight controls on what’s being put out there, you might want to add noindex tags to those pages by default. Keep in mind that spammers, hackers and perverts are all over the web and you might find your site flooded by UGC containing pornographic, promotional, or offensive content. Your site can also become a vector for the spread of malware or a host of link spam.
If you have people who diligently check the material being published, you can allow indexing. However, be very careful with the number and type of outbound links you allow in UGC. The last thing you’ll want is to get plenty of UGC that’s primarily geared towards passing link equity to other sites.
In cases like this, you may want to add a nofollow meta tag to all UGC pages which allows them to get indexed but not to pass link equity though it would be ideal to just add nofollow attributes only to individual outbound links and not a meta tag which covers all links in a page.
- Blog Pages – As long as the blog articles are unique, original and contain substantial value that’s contextually relevant to the website’s themes, they shouldn’t have meta robots tags restricting indexation.
- Static Pages – The same conditions apply as those for blog pages. No meta robots tags are necessary as long as the content has value and is original.
- Cart and Checkout Pages – If they’re being kept off the index using just noindex tags, add a robots.txt parameter that disallows the crawling of these pages entirely.
- Image Attachment Pages – Some CMS platforms like WordPress automatically generate these pages which exist for the sole purpose of publishing media assets. These should be disabled as much as possible. If for whatever reason you can’t do that, add noindex tags to them instead.
As a general rule, any page that’s within your site’s main navigation tree but happens to be low on content or has non-original material should have noindex tags.
Canonical Tag Audit
The canonical tag, also known as rel=canonical, is also a string of code in the <head> of a webpage which helps webmasters communicate to search engines how to index pages with very similar content. This helps you avoid content duplication issues while allowing you to have your preferred organic landing pages represent your site in the SERPs.
In smaller websites, duplicate content is rarely an issue. In larger ones such as ecommerce sites, content duplication can get out of hand if you don’t know exactly what you’re doing.
For instance, if you’re selling a certain type of shirt which comes in 10 different colors, you might have 10 different webpages for each SKU. When that happens, it’s not hard to imagine how the content for each page would be identical except for the color details and the images. That kind of distinction is usually insufficient for search engines and the pages may be viewed as duplicates of each other. As a result, only one of the pages will be indexed and Google will arbitrarily choose which one that is.
Despite not being indexed, the remaining pages will continue to be crawled and will consume both your crawl budget and internal link equity. This can stunt your site’s overall search visibility and organic traffic performance. Fortunately, you can tell Google that these pages are indeed very similar and that you have one page that you prefer to represent all of them on the SERPs.
That’s where the canonical tag comes in. This tag allows you to point duplicate pages to one “master” version that you prefer. To easily view the canonical tags of your site’s pages, simply crawl the URL using Screaming Frog and look under the Canonical Link column heading. The URLs you see there are the pages where the pages’ canonical tags are pointing to. Seeing no URL simply means that the page has no canonical tag, which is fine if it isn’t a duplicate or a very similar page to another one.
What to Look For:
- Pages with Self-Referring Canonical Tags – If you see pages that have canonical tags pointing back to themselves, that’s fine. As long as these pages are unique or are the master versions in a set of similar pages, that’s perfectly okay and will have no adverse effects on rankings.
- Pages with Canonical Tags Pointing to Other Pages – If a page has a canonical tag pointing to another URL, verify that the destination URL is indeed the right master version of the page. Make sure that the destination page isn’t yielding a 404 Not Found responses. If the target of the canonical tag is a URL that’s already been redirected, update the canonical tag URL to the destination of the redirect.
- Pages with No Canonical Tags – Very similar or identical pages should be gathered and you’ll need to decide which one you want to be the master version. Once you’ve made a choice, apply canonical tags on the rest of the URLs and point them to the master version. Meanwhile, the master version should have a canonical tag that refers to itself.
On a related note, self-referring canonical tags are used as a countermeasure to content scrapers. Sometimes, unethical webmasters copy your site’s content from a code level and put up another site that mirrors it just for the purpose of running ads. Having self-referring canonical tags in place allows the pages to point back to your website and prevents the copycat from getting indexed by search engines.
Pagination Tag Audit
Note: Google’s John Mueller recently stated that they were no longer supporting pagination tags and have not done so in years. This was a little strange because Google actually encouraged the use of these tags just a few weeks prior to that statement. Take from it what you will. For our part, we’ll continue to use these tags just for good measure.
Most websites have sections that need to be divided into several pages to avoid placing too much content in just one page. Blog index pages, category pages or even lengthy articles can be chopped up into segments and put on a series of pages to avoid overloading one URL with everything in the series.
Ideally, you’ll want only the first page in the series to be indexed and ranked. However, it’s not as simple as implementing a paginated series and leaving things to bots. You have to “tell” search engines that the pages belong to a series and should be treated as such. If you don’t, Google will index each page it can find in the series and will most likely view them as duplicates of each other or thin content. That’s because most CMS platforms assign identical or very similar title tags and meta descriptions to each page in a series.
What to Look For:
To see if this is a problem for you, simply do a “site:www.example.com <name of the series>” query on Google and see if other pages in the series come up. If they do, that’s a problem. Alternatively, you can check Google Search Console’s Duplicate Content section and look at the Duplicate Title Tags report. If there’s available data, you might find that some title tag duplication instances emanate from pagination indexing issues.
If you find that URLs in a series of pages are being indexed individually, implement rel=”prev” and rel=”next” header tags and link attributes. We won’t discuss exactly how to do that as Google itself already wrote a guide explaining it which you can find here.
Some Notes:
- Rel=”prev” and rel=”next” are compatible with self-referring rel=canonical tags.
- Don’t use rel=canonical tags to point succeeding pages in a series to the first one. Rel=”prev” and “next” are the ways to do it.
- Only a rel=”next” tag should appear in the first page of a series while only a rel=”prev” tag should appear in the last page (duh).
XML Sitemap Audit
XML sitemaps are important – but optional – parts of modern websites. These are XML documents that list down all the URLs that you want search engines to crawl and index. They also provide search engines some clues as to the hierarchy of your site’s pages and its general taxonomy.
Having a properly written XML sitemap allows search engines to crawl your site more efficiently while discovering URLs more easily. Though a high level of visibility on search engines is achievable without an XML sitemap, you’ll have a much easier time doing it if you use one.
What to Look For:
- Presence of the XML Sitemap. Check if an XML sitemap is up and running. Keep in mind that this and the site’s HTML sitemap are two very different things. Below is an example from Yoast’s SEO plugin, which looks neat.
- Other XML sitemap generators will generate sitemaps that look more codey, but as long as the syntax and format are correct they should work just fine.
- Submission to Search Console. Check if the client has submitted the XML sitemap to Google Search Console. If not, feel free to do it for him.
- Indexing Ratio. Once Search Console reads the sitemap and provides you data, you’ll see the actual number of URLs that Google indexed labeled as “valid” and URLs that have not been indexed which are labeled as “excluded.” This is a feature that was introduced along with the new version of Search Console. In the old version, you will only be shown the ratio between the indexed and non-indexed URLs in your sitemaps but you won’t be told which pages have been left off the index and for what reason.
This time, Google is more generous with the information as it cites the issues found in excluded URLs and identifies which URLs are affected.
- Sitemap Errors. Search Console can detect and report several different types of sitemap errors including listed URLs that are blocked with robots.txt and sitemap rendering issues.
- 404 Not Found and Redirected URLs – Webmasters also have the tendency to leave pages listed after they’ve already been deleted or redirected to another location. You can detect this by crawling the XML sitemap with List mode on Screaming Frog and checking under the Server Status and Status Code columns. Sort the values from largest to smallest on Excel and see which URLs are producing responses other than 200 OK.
All URLs producing 404 Not found errors should be removed from the XML sitemap unless the page is due to be set live again. URLs that have been redirected also need to be removed and replaced by the destination URLs of the redirects.
- Meta Robots tags – Listing URLs on an XML sitemap signifies to search engines that you want them crawled and indexed. Therefore, having pages in there that are blocked by robots.txt or have canonical tags that refer to other pages is contradictory and illogical. Make sure to take URLs that have these elements off of your XML sitemap if the meta robots and canonical tags were placed there deliberately.
Preferred Domain Check
You may not be aware of it, but your site has at least two different versions: the one with www at the beginning and the one that hasn’t. If your site moved from non-secure (http) to secure (https), you likely have four versions. All of these are viewed by search engines as different websites and it’s up to you to communicate to them which one you prefer them to index and rank.
What to Look For:
- In Google, enter a query for “site:yoursite.com” and see which URLs appear.
- Create Search Console accounts for each version of the site.
- Enter the accounts one by one and look for the Gear icon and click Settings>Preferred Domain.
- Choose the one you’d like Google to treat as the “real” version of your site.
- Do this for all versions of your domain.
Making your preferred domain clear ensures that users will only find the version of your site that you intend to show to the public. It also prevents your site’s authority from being divided between two versions of the same thing.
URL Structure Check
How the slugs of your site’s page URLs look matters in SEO. As much as possible, they need to be real words instead of alphanumeric strings. For example:
www.example.com/1234-abcd.html doesn’t give humans or bots any ideas on what the page is about. On the other hand, www.example.com/african-safaris.html sets the expectation that the page is about African safaris. URLs that have readable and sensible slugs are called “canonical URLs,” which are unrelated to canonical tags in case you’re wondering.
If you see mostly non-canonical URLs in your client’s site, you can recommend moving the pages to URLs with canonical slugs. This will be a major effort especially for bigger websites but it will be worth it in the end. Fortunately, most modern CMS platforms do use canonical URLs out of the box based on the pages’ titles or file names.
Crawl Anomaly Audit
Crawl anomalies occur when search engines crawl URLs in your website successfully, then come back and find that the same URLs can no longer be accessed. The most common reason is due to the pages now returning 404 Not Found server responses. However, soft 404s and server errors can also be classified as crawl errors.
To tell whether your site has crawl errors or not, you need to set up Google Search Console for your website. Search Console is similar to Google Analytics but instead of traffic stats, it provides you data that’s relevant to the search visibility of your online property. If you’re using the new version of Search Console, you can find crawl anomaly data under Index > Coverage and it should look something like this:
The URLs listed can be tested using Screaming Frog and other crawling tools to validate the server response codes that they’re giving off. Search Console is more up to date than ever with the data in its reports, but the data still isn’t real-time. There are many cases where crawl anomalies have already been fixed but will still appear in Search Console. To save you time and energy, validate the URLs with crawl anomalies listed first before attempting to apply fixes.
What to Look For:
- False Positives on 404 Not Found URLs. Don’t rely on just the crawl anomaly data of Google Search Console. There are lots of cases when Google crawls coincide with times when sites have temporary issues. To verify the validity of the crawl errors, download the report on Search Console and have Screaming Frog crawl all of them in list mode. You may be surprised that some of them now yield 200 OK or 301 Moved Permanently responses. In cases like those, no further action is necessary.
- Soft 404s. A soft 404 happens when a bot visits a live page with a 200 OK response code but happens to have a 404 Not Found message displayed. Like 404 Not Found crawl errors, these need to be verified with a Screaming Frog crawl.
- Site Errors. Site errors are rare in modern, soundly-developed websites. However, you’ll find some clients who have websites plagued by sitewide crawl errors caused by server problems and poor coding. SEOs typically report problems like these but will rely on web developer cooperation to get things fixed.
How to Handle Crawl Errors
- txt Blockage. A lot of websites accumulate crawl errors due to the fact that they allow the crawling and indexing of dynamic URLs. These URLs are specific to user sessions and in time, they’ll explore and turn into 404 Not Found URLs. To prevent these from showing up on Search Console as crawl errors, simply add a disallow parameter on your site’s robots.txt file for all URLs that have “?” and “=” characters in their slugs.
- Leave Them As Is. If a page was deleted deliberately and it will neither be brought back or get replaced with a new version of itself, Google recommends leaving it alone. Search engines tend to flush URLs out of their caches within a few months’ time.
- 301 Redirect Them. If the URLs listed in Google’s crawl error reports have been replaced by new versions on different URLs, you’ll have to 301 redirect them to the new pages. Crawl errors like these are common after websites undergo redesigns or when they have constantly evolving webpage libraries like those in ecommerce sites.
Keep in mind that having crawl anomalies is normal. Most websites have these and they never experience profoundly negative impacts on their rankings. However, if the number is significant, you’ll have to address them immediately. If the number of crawl errors is at 5% or more versus the total number of pages in your site, consider implementing the suggested actions above.
International Targeting
Google places a premium on providing search results to users that are geographically relevant to them. That means that a search for plumbing services in Reykjavic, Iceland shouldn’t yield listings for plumbers based in Singapore. Having a country-specific domain (.ph, .au, .jp) certainly helps Google get a better idea of which local version of its SERPs to prioritize you in, but a lot of websites use geographically neutral top level domains (TLDs) such as the classic .com, .net and .org.
If you have a neutral TLD, Google will determine which country you should rank for using your website’s content and geographic signals based on the inbound links that your webpages receive. However, this isn’t always reliable as widely spoken languages can be used in websites all over the planet. Link signals also aren’t a reliable way of determining which country your site should rank best for due to the fact that these can be easily manipulated.
The best way to do it is by setting your target country in Search Console’s International Targeting section. As long as your TLD is neutral, the option to target any nation will be open.
Conversely, if your site has a country-specific TLD, this option will not be available.
Special Note
If you have a website that targets several different countries within the same domain, leave the International Targeting setting alone. Don’t specify a specific country and let Google figure out which country each section of the website targets through hreflang tags and the content of the pages.
Structured Data Audit
Structured data is one of the most significant milestones in the evolution of Google’s algorithms. This is a convention on marking up certain entities in a webpage so search engines can gain better insights on what they are. Nouns such as people’s names, addresses, phone numbers, events and a lot more can all be marked up by structured data to help Google understand that the information isn’t just a bunch of alphanumeric strings, but rather very specific nouns.
What to Look For
To check if a website has structured data, you can use this free tool from Google which examines a page’s code and lists whatever structured data it finds:
The most common type of markups used is based on Schema.org. There’s a bit of a learning curve involved, but you can go over this guide to make it a little easier. If you’re not a coder, you can still apply Schema markups to a website using plugins and built-in features in CMS themes.
Internal Linking Audit
Making sure that your internal links are properly set up can have a profound impact on your ability to rank for your target keywords. A good internal link should have the following attributes:
- Easy to see
- Dofollow
- Has anchor text that’s relevant to the destination page’s context
- Does not point to a redirected URL
- Does not point to a URL with a 404 Not Found response
You can easily scan each link in your site and diagnose them using the ever-dependable Screaming Frog tool. Just follow these steps:
- Set Screaming Frog to Spider mode.
- Enter the home page URL of your website
- Hit start and give your crawl some time to finish.
- Once the crawl is 100% complete, go to Bulk Export > All Anchor Text. This will allow you to export all link data to a CSV file. Name it whatever you want and save it.
- Open the CSV with Excel. It should look like this:
- Use conditional formatting to mark the internal links.
- Sort the data according to cell color. Remove the non-colored rows.
- Sort the remaining data set by status codes from largest to smallest.
- Look for links that have 5xx, 4xx and 3xx response codes. Highlight their rows as these will be the ones that will need fixing. These codes basically mean:
- 5xx – The link is pointing to a page afflicted by server issues. This means the link is broken and is a dead end for bots.
- 4xx – The page is unavailable and may have been deleted. This means the link is broken and is a dead end for bots.
- 3xx – The link points to a page that has been redirected. The link is not broken but puts unnecessary stress on your servers due to the fact that it passes through a redirect.
Handling Internal Link Issues
- 5xx Errors – Your first recourse is to consult the website’s developer on possible server issues affecting some link destination pages. These errors might also be emanate from pages that forbid certain IP addresses from accessing them. In most cases, links that yield these responses don’t need individual action.
- 4xx – Pages that yield 404 Not found responses were either deleted or have expired. Check whether the destination page was deleted deliberately or not. If the deletion was accidental, set the page live again. If the deletion was deliberate, either update the destination URL of the link or remove the link altogether.3xx Redirects – Collect the destination URLs of all the links that yield 3xx responses in your Screaming Frog crawl. Crawl them again with Screaming Frog but this time with List mode. This will allow you to see their redirect destination pages. Update the URLs of the links with the final destination URLs of the redirects.
Follow/Nofollow Usage
The nofollow link attribute was conceived as a countermeasure against web spam by search engines. Basically, adding it to a link’s HTML prevents the link from passing PageRank to its destination UURL whether it’s an internal page or an external one. Websites that rely heavily on user-generated content such as Wikipedia, Facebook and YouTube automatically add the nofollow attribute to external links to discourage low-level SEOs from dropping links as they please.
To human users, a link with a nofollow attribute will look and work just like any other link. Only someone with knowledge of its use will even recognize it when they see a page’s code. To search engines, though, pages are read at a code level and nofollow links will send a clear signal that destination pages of links that bear nofollow attributes should not benefit in the SERPs because of that link.
To audit the follow/nofollow status of all the links in your site, simply use the same All Anchor Text report that you exported from Screaming Frog when you audited for broken and redirected links. The rightmost column with the heading “Follow” is the one to look at.
What to Look For
- Dofollow Links – These are basically the normal links in your site. Denoted by the value “TRUE,” these are the type of links that you should be using on your internal pages. If you find any FALSE values pointing to internal pages, simply remove the nofollow attributes. There may be exceptions to this here and there, but they’re pretty rare.
- Nofollow Links – If you spot any FALSE values, that indicates that the link has the nofollow attribute. See what the destination pages for these links are: if they’re internal, you’ll likely have a good reason to remove the nofollow attribute. If the destination URLs are external, it’s your prerogative on whether you want those pages to benefit from your links or not.
Personally, I keep most of my sites’ links Dofollow unless the destination pages are promotional. That’s because I like giving credit to sites that I reference. However, it would be wise to keep user-created links in comments and forum sections nofollow. If your site has ads or affiliate links, I recommend adding nofollow attributes to those as well. Basically, Dofollow links are reserved for editorially cited pages and internal pages.
Information Architecture (IA) Audit
Information architecture (IA) is generally defined as the structural design of shared information environments, including websites. How the website is put together from technical, functional and design standpoints defines the IA that a website uses.
While that may not sound like it has a lot to do with SEO, IA inevitably sets the tone on how well your site is crawled and indexed by Google. Good IA also produces good usage signals which can indicate to search engines that visitors’ intents are being satisfied.
Things to Look Out For
- Use of Subdomains – A subdomain is a domain that is a part of a larger domain under the Domain Name System (DNS) hierarchy. It’s often used to separate content that’s distinct from what the main domain is intended for. With human users, subdomains will not look or feel apparent as they navigate to it unless the webmaster designs it in a noticeably different way. To search engines, however, subdomains are an entirely different thing.
Most modern search engines view each subdomain as completely different websites from the main domain and the other subdomains under it. Hence, each subdomain will require separate Google Analytics and Search Console accounts if you want to receive SEO-related data.
In large sites like Blogger or WordPress.net where users are issued their own subdomains when they create free properties, the authority of the main website does not flow to the subdomains. This was purposely done because these are platforms that run on user-generated content that aren’t regulated by the companies behind them. As such, subdomains will have to optimize their own pages if they want to attain a certain level of search visibility.
Google claims that it treats subdomains and subdirectories in websites the same way, but every test and case study run by many credible SEOs seem to indicate otherwise. Whatever the case, it’s generally the best practice to use subdirectories in a website’s hierarchy then subdomains. Unless there’s a burning reason to use subdomains, we suggest staying away from them.
There’s no need to risk your site’s SEO success by using subdomains when it can be avoided. Some sites, like this one, made this misstep by creating one subdomain for every product or service that they offer. There was no real reason to do it and subdirectories would have worked fine. For whatever reason, they did it and SEO hasn’t worked out nearly as well for them as it should have. This is a case where the IA got in the way of SEO
- Fancy Navigation Schemes. Some webmasters want to promote good user experience trough ease of navigation – and that’s a great thing. It’s why IA schemes such as filtering, search-driven navigation and Parallax were conceived. If Google’s claim of simply building a good website that users enjoy is indeed the key to good SEO, you should be able to gain search engine prominence with the aforementioned IA schemes.
But that’s not really the case, is it? As a matter of fact, websites that rely solely on filter, search and infinite scrolling navigation schemes are notorious for their poor performance on search results. Just take this website for example:
The site was designed with no traditional navigation menu and internal linking pathways. In its place was a search box and a few filters. While the intentions for ease of use deserve credit, the setup meant that the site would rely primarily on dynamic URLs which would be unique to each user session. This meant that there would be thousands of URLs indexed which wouldn’t have unique content. These same URLs would also expire in time, generating tons of crawl errors in the process.
If that wasn’t enough, the lack of a traditional navigation menu tree meant that the flow of internal link equity to pages beyond the home page would be limited. Since there aren’t any static category pages to link to anyway, all the site’s link equity tends to stay in the home page and the dynamic URLs generated by internal searches and filters. Needless to say, the site did not succeed in SEO and is now undergoing an overhaul.
- Cascading Navigational Links. Speaking of navigation menu, you need to make sure that your site’s IA allows both human users and bots to find pages in your site using internal links. These links are still the surefire way to facilitate effective bot crawls on your site and they’re also the conduits by which PageRank flows.
In smaller sites, internal linking is pretty straightforward and is done almost without a thought. Since there are only a few pages in the site, it’s unlikely that anyone will forget to link to them via the navigation menu. In larger properties such as ecommerce sites, it’s an entirely different thing. If the online store has hundreds or thousands of pages, it’s easy to neglect proper internal linking, which can deprive your innermost pages vital PageRank that’s required for good rankings.
For starters, it’s tough to have a menu link for all your pages in a big site just because the size of the menu will get out of hand. Therefore, you’ll likely be able to link to just level 2 or level 3 category pages at most like this site:
But what about subcategories deeper than level 2 or level 3 when adding links to them in the main menu is not an option?
Simple: you don’t link to those pages through the menu. You link to them via the subcategories they’re found in like this:
These are all subcategories. Only after you click on them do you go to the individual product pages:
In a scenario like this, the 3rd and 4th-level category pages don’t receive link equity from the home page directly via navigation menu links. However, they will still be found by bots and they will still receive link equity thanks to the links that they get in the bodies of the submenus.
Generally, the classic pyramid/silo site architecture still works best. That is, you need to recognize that the highest concentration of PageRank is in your home page and it has to cascade from there to the inner pages through internal links. As long as you can follow the model in the diagram below, you should be fine:
Redirect Audit
Redirects are essential to both SEO and day-to-day content management. They patch the “holes” left behind by deleted pages and they route both human users and bots to webpages that have replaced the deleted ones. This maintains the good flow of PageRank within your site and prevents the emergence of crawl errors.
However, overdoing redirection and not keeping track of them can also create SEO issues. Daisy-chaining redirects or having redirects that point to dead pages will create SEO issues for your site if you allow them to accumulate.
To audit the redirects on your site, follow this simple process:
- If Your CMS Has a Redirect Plugin. Go to the plugin’s settings at the back end of your site. More often than not, the plugin will allow you to export a CSV or a TXT file containing all your site’s redirect information.
- If Your Redirects Were Set Up Using .htaccess. If you prefer to set up redirects using .htaccess, that’s fine too. You can extract your list of redirects from this part of the site in the absence of a plugin. Web developers typically prefer the use of this method, though plugins are often handier for SEOs.
- Export the Source URLs. Gather the list of source URLs and crawl them with Screaming Frog on list mode. Export the report to a CSV file. The response codes should all come out as 301. If not, you’ll need to investigate what’s causing the issue.
- Crawl the Destination URLs. Next, crawl the destination URLs on list mode as well. This will give you the status of the URLs where the redirects are pointed. These should all come out as 200 OK. If it’s anything other than that, there’s a cause to investigate.
What to Look For
- Redirect Chains. If the destination URL yields a 3xx response, the redirect is pointing to another redirect, which is bad news. Check where the destination URL is leading to and update the redirect to make that the new redirect destination. Redirects should always point to live pages.
- Redirects to 4xx and 5xx URLs. This means that the redirect is pointing to a dead page. You will need to find another destination page for the redirect and update it. If there’s no suitable page for the redirect, simply disable it.
- 2xx OK. This response in the destination URL of the redirect means that the redirect is working properly. However, if this is the response you get from the source URL, it means the redirect is not working at all.
SSL audit and setup
A few years back, Google declared that the use of secure URLs (https) was a positive ranking signal, sending SEO-conscious webmasters on a frenzy. After the smoke cleared, a greater percentage of websites were using SSL, but the ranking benefits were anecdotal at best.
Today, most SEOs see SSL as a minor ranking factor which you should leverage if it isn’t too much trouble for you. After all, you’ll want to have every little SEO advantage you can get. At the hands of an experienced developer, SSL setup is pretty straightforward. However, doing the process wrong can cause your rankings more harm than good.
What to Look For:
When doing a technical SEO audit, these are the things you should keep an eye out for:
- Redirects from http to https. Google and other search engines view the http and https versions of a domain as separate websites. When you switch over from http to https, you need to make sure that all the non-secure URLs are redirected to their secure counterparts on a 1:1 basis. This not only helps Google recognize the changes, it also facilitates the passing of link equity from the old pages to the new one.
If you’re auditing a website that you feel might have mishandled this transition, you can take the following steps to make sure everything’s working fine:
- Ask for the web developer’s notes on how the migration process went.
- If you can get a list of all the URLs in the old website, do so. Crawl the URL list with Screaming Frog on List mode. If everything turns up with 301 Moved Permanently server responses to their counterpart pages, you’re good to go.
If you see 200 OK responses, it means that some pages from the non-https version are still live. If you see 404 Not Found responses on some URLs, it means that the old pages were deleted but not redirected.
- 301 redirect all pages with 200 OK response codes immediately to their https counterparts. Do the same for all URLs with 404 Not Found responses except when the deletion was deliberate and there’s a good reason for deleting it i.e. it has no counterpart page in the https version of the site.
- Clerical Mistakes. In some cases, websites that transitioned to secure URLs don’t get the SEO boost they’re looking for due to technical or clerical mistakes made during the acquisition of a SSL certificate and its subsequent use in the development process. According to SSL Dragon, these are the typical ways people mess up the SSL implementation process:
- Browsers don’t trust the SSL Certificate of a particular website
- Intermediate SSL Certificate is missing
- The website has a Self-Singed Certificate
- Mixt HTTP and HTTPS content error
- The SSL Certificate name mismatch error
Make sure that the person implementing the migration from http to https has the knowledge, resources and tools necessary to pull off a decent job.
Site Speed Test and Improvement
The speed at which your pages load has become one of the most important technical ranking factors on Google over the past few years. Multiple case studies have shown that faster load times – especially on mobile devices – correlates to better performance on Google’s SERPs. This has forced the hand of web designers, developers and SEOs to shoot for the highest speed scores they can possibly attain without sacrificing the vision that they have for their respective site designs.
When doing an audit, you can gauge your site’s performance through several tools. The most popular are Google’s own PageSpeed Insights and Pingdom’s site speed test. Both tools are free and they provide both performance grades for page load times and recommendations on how to improve them.
What to Look For
There are several factors that could be contributing to slow load times on your webpages. When doing a technical SEO audit, be sure to check the website for:
- Code Bloat. As the name of the issue implies, code bloat happens when there’s unnecessarily long programming code running on your website. This not only wastes server resources, it also wastes search engine bot bandwidth. Have a seasoned web developer examine your website’s code if you’re not a coder yourself and make sure that every line of HTML, JavaScript and CSS in each page’s source actually has a function. If code can be shortened to achieve the same results onscreen, by all means go for it – that’s called minification.
- CDN Usage. A Content Distribution Network (CDN) is a network of spatially separate proxy servers which distribute tasks to several data centers in order to achieve greater stability and better performance for a website. With a CDN, several servers work together to deliver content to browsers, allowing for faster load times and greater resiliency against traffic spikes.
When auditing a website, inquire with the developer on whether or not a CDN is in use and which one is in place. Smaller websites that receive hundreds to just a few thousand visits per month can probably go without a CDN and have decent load speeds. However, higher-traffic sites may suffer from slowdown if all their resources are stored at and loaded from a single server.
- Hosting Quality. Not all webhosting providers are built equal and some are just plain faster than others. If you’ve exhausted all the site speed optimization techniques that can possibly be applied and you still haven’t accomplished the level of speed you need, it might be time to start shopping for a new web hosting plan. In our experience, Amazon Web Services (AWS), SiteGround and BlueHost are all reliable and fast. I’m sure there are other great options out there, but we haven’t tried them yet.
- Image Optimization. Images are often the most resource-intensive components of webpages and thus, contribute the most to the slowdown of pages if they aren’t optimized for use on the web. Thankfully, there are plenty of ways to add a reasonable number of images to a webpage while still maintaining a healthy level of speed. These include:
- Making images only as big as they have to.
- Using lighter image file formats such as JPEG, PNG and, when necessary, GIF.
- Enabling the caching of images
On a related note, using a CDN to load your images can also help reduce server strain and speed up the serving of these visual assets.
Mobile Usability Test
A few years ago, Google announced that mobile-friendly designs would become a ranking factor for its search results on mobile devices. In a day and age when phones have overtaken desktop computers as the primary means of getting on the Internet, making your website legible even to people who are using smaller screens has become a must-do.
If you’re auditing a website and you’re not quite sure if it’s mobile-friendly, use Google’s own mobile-friendliness tool. Simply enter your site’s URL and wait for the results to come up. If your site design isn’t deemed mobile-friendly, it might be time to consider a redesign towards a mobile-responsive or adaptive scheme.
Mobile Web Crawling
In March of 2018, Google announced that it was rolling out its mobile-first indexing. The change is precisely as the name suggests: the search engine now reads websites with the mobile version as the baseline while the desktop version comes after it.
This is a dramatic paradigm shift as desktop versions of webpages were once considered the primary (canonical) versions of themselves while the mobile versions were treated with a more supplementary view. The reversal was likely due to the fact that smartphones and tablets have overtaken desktop devices as the primary devices for search usage.
What to Look For
When doing a technical SEO audit, you’ll want to make sure that the mobile versions of your webpages are as optimized for search. For sites that use responsive and adaptive designs, this isn’t a huge deal and there may not be much to do. The code essentially remains the same for both desktop and mobile use. You’ll just have to keep an eye out for the following:
- Mobile Page Load Speeds. Optimizing for speed on mobile pages is often more challenging than doing it for desktop ones.
- Load Prioritization. Make sure that above the fold content loads first, particularly the text part. Images will have to be optimized for mobile devices to help them load quickly enough.
- General Usability. Some webmasters assume that simply using a responsive theme makes their websites “ mobile friendly.” While that may be technically true from a strict code perspective, the human eye test may say otherwise. It’s not enough for a webpage’s text to be legible on a smartphone screen to consider it mobile-friendly. The quality of the user experience that it delivers when it’s displayed on a smaller screen ultimately decides whether a webpage is well-suited for mobile or not.
On the other hand, having a dedicated mobile version (m.yourdomain.com) of your website might make things a little more hectic for you. These are the elements that you have to look at when doing an audit for a site like that:
- Switchboard Tags. Switchboard tags are HTML elements that act similarly to canonical tags, but instead of pointing the bot to a very similar webpage within the desktop version of a site, it points the bot to the mobile version of the webpage. When doing an audit, make sure that the rel=alternate HTML element is present and the mobile declaration is made along with the mobile counterpart URL. Search Engine Land has a lengthy piece explaining how these tags work and how to implement the which I suggest you read.
- Search Console Verification. Most webmasters only verify their websites’ desktop versions on Search Console and neglect the mobile version of it. Google actually views the two as separate domains, so you’ll want to have both in there or optimal crawls and indexing. You’ll also get more accurate reports on the mobile site’s status this way as having only the desktop version verified will provide you with just desktop site data.
- Structured Data URLs. When implementing structured data markups such as Schema.org tags, it’s important to make sure that any marked up information containing URLs are not identical between the mobile and desktop versions of the website. When marked up code is lifted directly from the desktop to the mobile version of the page, the tendency is for marked up desktop URLs to be present on the mobile site.
For most websites, there’s no escaping the fact that you need to optimize for search using a mobile device lens. There are a few niches that can afford to ignore mobile devices due to the desktop-heavy nature of their target users but those are few and far between.
Conclusion
At the end of the day, technical SEO comes down to these basic principles:
- Website availability
- Website performance
- Crawlability
- Cleanliness of code
- Proper use of server resources
As long as you can apply the best practices under each, you should do just fine in the search results.