# SEO & SEM GUIDE
> Google release the best document for SEO practices here: https://developers.google.com/search/docs. There are 2 sections:
> 1. [Beginner SEO](https://developers.google.com/search/docs/beginner/get-started)
> 2. [Advanced SEO](https://developers.google.com/search/docs/advanced/guidelines/get-started)
> If you're just interested in performances, please refer to the [Web Vitals document](https://gist.github.com/nicolasdao/fad8bb808970805ff2fef6a84ee61af0).
# Table of contents
> * [Concepts](#concepts)
> - [SERP](#serp)
> * [Keywords](#keywords)
> - [Optimizing a web page for keywords](#optimizing-a-web-page-for-keywords)
> - [Finding the keywords ranking for a domain](#finding-the-keywords-ranking-for-a-domain)
> - [Finding the historical keywords ranking for a domain](#finding-the-historical-keywords-ranking-for-a-domain)
> * [Crawl budget or how to use `robots.txt` `sitemap.xml` and `noindex`](#crawl-budget-or-how-to-use-robotstxt-sitemapxml-and-noindex)
> - [What is the crawl budget](#what-is-the-crawl-budget)
> - [Factors that improve the crawl budget](#factors-that-improve-the-crawl-budget)
> - [Understanding how the crawl budget is spent](#understanding-how-the-crawl-budget-is-spent)
> - [Non-marketing pages](#non-marketing-pages)
> - [Dealing with duplicate content](#dealing-with-duplicate-content)
> - [Canonical URL](#canonical-url)
> - [`robots.txt` or how to block pages with no marketing value to waste your crawl budget](#robotstxt-or-how-to-block-pages-with-no-marketing-value-to-waste-your-crawl-budget)
> - [Making Google aware of the robots.txt](#making-google-aware-of-the-robotstxt)
> - [sitemap.xml](#sitemapxml-overview)
> - [Making Google aware of the sitemap.xml](#making-google-aware-of-the-sitemapxml)
> - [`X-Robots-Tag` with `noindex`](#x-robots-tag-with-noindex)
> - [`robots.txt` vs `noindex` or the difference between crawling and indexing](#robotstxt-vs-noindex-or-the-difference-between-crawling-and-indexing)
> * [Website to-do list](#website-to-do-list)
> - [Standard](#standard)
> - [Metadata in the `head`](#metadata-in-the-head)
> - [Pathname and information architecture](#pathname-and-information-architecture)
> - [JSON-LD](#json-ld)
> - [Culture](#culture)
> - [Images](#images)
> - [Content](#content)
> - [`robots.txt`](#robotstxt)
> - [sitemap](#sitemap)
> - [`sitemap.xml`](#sitemapxml)
> - [`sitemap.html`](#sitemaphtml)
> - [`sitemapimages.xml`](#sitemapimagesxml)
> - [Multi-languages](#multi-languages)
> * [Progressive Web Apps aka PWA & SEO](#progressive-web-apps-aka-pwa--seo)
> * [Videos](#videos)
> - [Key moments](#key-moments)
> * [Google Search Console](#google-search-console)
> - [Understanding how your pages are performing](#understanding-how-your-pages-are-performing)
> - [Anlysing a specific URL](#anlysing-a-specific-url)
> - [Tips](#google-search-console-tips)
> * [UX and SEO](#ux-and-seo)
> * [Tools](#tools)
> * [Tips and tricks](#tips-and-tricks)
> - [Red flags](#red-flags)
> - [Red flags - Google Search Console](#red-flags---google-search-console)
> * [How to](#how-to)
> - [How to test how your page is seen by Google?](#how-to-test-how-your-page-is-seen-by-google)
> - [How to check keywords ranking history for a domain?](#how-to-check-keywords-ranking-history-for-a-domain)
> - [How to request Google to recrawl your website?](#how-to-request-google-to-recrawl-your-website)
> * [Annex](#annex)
> - [JSONLD examples](#jsonld-examples)
> - [ahrefs recipes to rank](#ahrefs-recipes-to-rank)
> * [References](#references)
# Concepts
## SERP
Stands for `Search Engine Result Page`.
# Keywords
## Optimizing a web page for keywords
Once you've found the list of keywords you wish a web page to rank for (ideally only a few because each technique only support one keywors at a time, therefore, too many different keywords will dilute your results), place them in the following HTML tags (sorted by order of importance):
1. `title` tag in the HTML head.
2. `meta description` tag in the HTML head.
3. `canonical link` tag in the HTML head (e.g., ``). The Google bot hates duplicate content. This tag tell the GBot which page is the one and only page that should receive SEO love from it. Use it even if you think you don't have any duplicate page, because in reality, you do. Indeed, as far as the GBot is concerned, https://example.com and https://example.com?refer=facebook are duplicated page.
4. `h1`(1) tag in the HTML body.
5. `h2`(1) tag in the HTML body.
6. `h3`(1) tag in the HTML body.
7. `image alts` tag in the HTML body. There are so many missing `alt` attributes in web page. This is a shame as it is a missed opportunity to rank.
8. `anchor text`. Make sure that the text you use in your `a` tag describes the link as clearly as possible. If that links points to an external website, that website's domain authority will benefit from your good description. The same applies to an internal link. The Google bot loves organized content.
> (1) A typical mistake is to use a H2 with no H1 because H1 looks too big. The issue is that H1 tags worth more than H2s when it comes to SEO. If the content of your header contains keywords you wish to rank for, try to use H1. Use CSS to change its style so it matches your design.
## Finding the keywords ranking for a domain
To figure out what are the keywords for which a specific domain ranks, use the `Organic Keywords Trend` chart. To see that chart:
1. Login to [Semrush](https://www.semrush.com/).
2. Select `Domain Analytics/Overview`, enter the domain in the top input and click the `Search` button.
3. Select the `Organic Research` in the left pane to see the `Organic Keywords Trend` chart.
### Understanding the `Organic Keywords Trend` chart
The top horizontal bar can be read as follow:
- `Keywords`: Current number of keywords that rank within the first 100 Google pages.
- `Traffic`: Current number of users that those keywords have redirected to your website this month.
- `Traffic cost`: How much would it cost to rank the way your keywords do.
The __*Organic Keywords Trend*__ can be read as follow:
- The legend show the colors that represent the keywords categories based on how they rank in the SERP (e.g., `Top 3` are keywords that makes your domain rank in the top 3 Google pages).
- Each vertical bar is a snapshot of the keywords ranking. For example, in the image above, hovering on the _March 20_ bar shows that 33 keywords in total ranked your webiste inside the first 100 SERPs. Amongst those 33 keywords:
- 0 ranked in the first top 3 SERPs.
- 5 ranked between the 4th and 10th SERP.
- 4 ranked between the 11th and 20th SERP.
- 11 ranked between the 21st and 50th SERP.
- 13 ranked between the 51st and 100th SERP.
When you click on that bar, you can see the details of those keywords.
## Finding the historical keywords ranking for a domain
This is achieved by following the same steps as the [Finding the keywords ranking for a domain](#finding-the-keywords-ranking-for-a-domain) section. However, you'll need to have the Semrush Guru tier at minimum (almost USD200/month). In the `Organic Keywords Trend` chart, click on any bar in the chart to see the keywords ranking details for that point in time.
# Crawl budget or how to use `robots.txt` `sitemap.xml` and `noindex`
Tl;dr Those three techniques aim to optimize your crawl budget:
- Use a `robots.txt` to prevent non-marketing pages to consume your crawl budget.
- Use one or many `sitemap.xml` to make sure that marketing pages are crawled to make the best out of your crawl budget.
- Ise the `noindex` value in the HTML head to prevent certain pages that can't be listed in the `robots.txt` to be crawled. This technique is used to:
- Prevent duplicate content.
- Deal with faceted navigation
- Soft 404 error page (i.e., pages that return a 200 status code saying that the page is not found), instead of a explicit 404 status HTML page.
- Infinite space (e.g., calendar page where the URL contains the date)
## What is the crawl budget
> WARNING: Optimizing crawl budget only worth it if your website contains a few thousands web pages. Otherwise, it is a waste of time. That being said, nurturing good SEO habits doesn't hurt and will make it easier to grow.
Crawling your website is not effortless. This means that search engine companies don't allocate an infinite amount of resources to crawl your precious website. Instead, they allocate it a specific _budget_ called the _crawling budget_. This budget is usually denominated in number of pages that the search engine will crawl. This budget depends on many factors that are left to the discretion of each search engine company, though [some factors have become public](#factors-that-improve-the-crawl-budget). Without knowing exactly what your budget is, you should do your best to configure your website to prioritize the pages you want to be index and de-prioritize the pages that you do not wish to consume any amount of your precious crawl budget. The pages you should block from consuming your crawl budget are:
- [Duplicate content](#dealing-with-duplicate-content).
- [Page that are important to users but present no marketing value](#robotstxt-or-how-to-block-pages-with-no-marketing-value-to-waste-your-crawl-budget) (e.g., admin panel, settings page).
- Soft 404 pages (i.e., pages that return a 200 status code saying that the page is not found), instead of a explicit 404 status HTML page.
## Factors that improve the crawl budget
- Fast web pages even under pressure: If the GoogleBot notices that your pages load very quickly even with a lot of traffic, it can decide that increase the number of pages it schedules to crawl.
- No errors
- JS and CSS files: Every resource that Googlebot needs to fetch to render your page counts toward your crawl budget. To mitigate this, ensure these resources can be cached by Google. Avoid using cache-busting URLs (those that change frequently).
- Avoid long redirect chains. A redirect count as an additional page to crawl in your budget.
## Understanding how the crawl budget is spent
> For a detailed explanation of the Google Search Console's `Coverage` report, please refer to https://support.google.com/webmasters/answer/7440203?hl=en.
- Use the `Coverage` section of the Google Search Console.
- Review the URLs in the `Valid` category to confirm they are listed as expected. Unexpected pages are:
- Duplicated content (often due to faceted URLs).
- Soft 404s.
- Non-marketing pages.
## Non-marketing pages
- __Thank you page__. Those pages could rank for long-funnel keywords.
- __User settings__.
## Dealing with duplicate content
- Determine whether duplicate pages have already been indexed:
- Login to the [Google Search Console](https://search.google.com/search-console).
- Select the correct property.
- Click on the `Coverage` section in the menu.
- Review all `Valid` URLs and look for duplicate URLs.
- For all duplicate URLs:
1. Do not block them in the `robots.txt` yet. Otherwise, this won't give Google a change to deindex them first.
2. Make sure they have a canonical URL set the head.
3. Add the [`noindex`](#x-robots-tag-with-noindex).
4. Wait until the effect of the previous steps shows the duplicate page in the `Excluded` URLs category (this could take a couple of days).
5. Block that page in the `robots.txt`.
6. Optionally, if that page was useless and could be deleted in favor of the canonical version, then do it. Then make sure to create a 301 from that duplicate link to the canonical.
### Canonical URL
```
```
- A canonical URL impacts both indexing and crawlability:
- Indexing: When all duplicate pages use the same canonical URL, only the canonical URL is indexed.
- Crawlability: Once the page has been crawled and indexed once, Google will now which page is a duplicate. This means that subsequent crawls will only crawl the canonical URL and skip the duplicated content. This will allow to avoid wasting the crawl budget. Also, by making sure that the only the canonical URLs are added to the sitemap.xml, we can implicitely improve the crawl budget (as opposed to listing the duplicated links in the sitemap).
- Both `rel="canonical"` and `content="noindex"` will prevent the page to be indexed by Google.
- Do not mix canonical URL with `noindex`. This confuses the GoogleBot. If it sees both, it will choose to follow the canonical URL signal (ref: [Google: Don’t Mix Noindex & Rel=Canonical](https://www.searchenginejournal.com/google-dont-mix-noindex-relcanonical/262607/)).
- Canonical URL has the same effect as a 301 permanent redirect. In fact, canonical URL was originally made for situation where a 301-redirect was not possible.
- Use only the URLs that are canonical for your sitemap.
## `robots.txt` or how to block pages with no marketing value to waste your crawl budget
> [Shopify Robots.txt Guide: How To Create & Edit The Robots.txt.liquid](https://gofishdigital.com/shopify-robots-txt/)
> - To create a robots.txt online, please refer to https://www.seoptimer.com/robots-txt-generator
> - To test a robots.txt file, use the [Google robots.txt tester tool](https://www.google.com/webmasters/tools/robots-testing-tool).
> - The [`X-Robot-Tag` with `noindex`](#x-robots-tag-with-noindex) will not block crawling. It just prevents the GoogleBot from indexing the page. Pages with `noindex` will still consume crawl budget.
```
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: http://www.example.com/sitemap.xml
```
Where:
- The user agent named Googlebot is not allowed to crawl any URL that starts with http://example.com/nogooglebot/.
- All other user agents are allowed to crawl the entire site. This could have been omitted and the result would be the same; the default behavior is that user agents are allowed to crawl the entire site.
- The site's sitemap file is located at http://www.example.com/sitemap.xml.
### Making Google aware of the robots.txt
Manually submit it to the Google Serch Console.
## sitemap.xml overview
> To create a robots.txt online, please refer to https://www.xml-sitemaps.com/
> - [Everything you need to know about multilingual and multinational sitemaps](https://skryvets.com/blog/2018/05/01/everything-you-need-to-know-about-multilingual-and-multinational-sitemaps/)
Pages that are not listed in the sitemap.xml as well as not listed in the robots.txt will still be eventually crawled, but they won't receive as much attention from Google.
### Making Google aware of the sitemap.xml
There are two ways to make Google aware of your sitemap.xml:
1. Include it in the robots.txt. To see an example, please refer to the [robots.txt](#robotstxt-overview) section.
2. Manually submit it to the Google Serch Console.
> #1 is considered a best practice.
## `X-Robots-Tag` with `noindex`
```
```
## `robots.txt` vs `noindex` or the difference between crawling and indexing
# Website to-do list
To double-check that the list below is correctly implemented, refer to successfull website that tick all the SEO boxes:
- https://www.nytimes.com/
## Standard
### Metadata in the `head`
At a minimum, the page's `head` tag must contain:
```html
https://developers.google.com/search/docs/data-types/video#clip
# Google Search Console
This online tool from Google allows to gain insights on how your website is being crawled by the GoogleBot. It can also submit pages for crawling.
- Submit new sitemap.xml or explicit new URLs.
- Get alerted on issues.
- Understand how Google sees your pages.
- Test:
- Mobile usability
- Rich results ()
## Understanding how your pages are performing
This is mainly detailed under the `Performance` section.
- __`Queries`__: Details which keywords drive the most traffic.
- __`Pages`__: Shows which pages receive the most traffic.
How to use this section:
- __Improve conversion__: Use the `Pages` sectino to identify the pages that receive a lot of traffic but do not convert in terms of click.
- __Optimize your website for the best keywords__. Use the `Queries` to understand which keywords is driving the most traffic and create dedicated pages just for those keyrods.
- __Compare your page performance from one period to another__:
- Select filter at the top (e.g., Query with keywords)
- Click on the `Date` filter at the top and select `Compare` rather than `Filter`
- You may see an increase of traffic due to:
- Seasonality
- Better content optimization for specific keywords.
- Improvement is Web Vitals and fixed issues.
- You may see a decrease of traffic due to:
- Seasonality
- Page errors (jump to the [Anlysing a specific URL](#anlysing-a-specific-url) section to diagnoze issues)
- Content is less popular
- You've canibalized that page with a new optimized landing page
## Anlysing a specific URL
Simply paste the URL in the search bar at the top.
## Google Search Console Tips
- Link your property in Google Analytics with the one in Google Search Console:
- Open your property in Google Analytics
- Click the `Admin`
- Under the `Property` section, under the `PRODUCT LINKING`, click on the `All Products`
- Link Google Search Console to feed new valuable data into your Google Analytics.
# UX and SEO
[Rendering on the Web](https://developers.google.com/web/updates/2019/02/rendering-on-the-web)
# Tools
> - List of all Google SEO tools: https://support.google.com/webmasters/topic/9456557
> - JS minification techniques: [Optimizing JavaScript bundle size](https://www.debugbear.com/blog/reducing-javascript-bundle-size)
> - [Article: Small Bundles, Fast Pages: What To Do With Too Much JavaScript](https://calibreapp.com/blog/bundle-size-optimization)
| Topic | Description | Link |
|:------|:------------|:-----|
| `robots.txt` | Create a robots.txt online | https://www.seoptimer.com/robots-txt-generator |
| `robots.text` | Test the validity of a robots.txt | https://www.google.com/webmasters/tools/robots-testing-tool |
| `robots.text` | Test URLs against an inline robots.txt | https://technicalseo.com/tools/robots-txt/ |
| `sitemap.xml` | Create a sitemap.xml online | https://www.xml-sitemaps.com/ |
| `sitemap.xml` | Validate a sitemap.xml online | https://www.xml-sitemaps.com/validate-xml-sitemap.html |
# Tips and tricks
## Red flags
### Red flags - Google Search Console
- Sudden spike in valid URLs in the `Coverage` section. This is usually due to misconfigured faceted pages.
# How to
## How to test how your page is seen by Google?
- Go to the [Google search console](https://search.google.com/).
- Select the __*URL inspection*__.
- Click on the __*View crawled page*__.
This renders all the HTML, but unfortunately, it won't render a full image of that HTML, just the beginning. To see the full render, you have not choice but to copy paste the HTML in a local file and render it yourself 😫.
## How to check keywords ranking history for a domain?
Please refer to the [Finding the historical keywords ranking for a domain](#finding-the-historical-keywords-ranking-for-a-domain) section.
## How to request Google to recrawl your website?
1. Login to the Google Search Console (https://search.google.com/search-console).
2. Choose one of the two options:
1. Upload a new sitmaps.xml with new `lastmod` date for the URLs you wish to refresh. That's the fastest way to perform a batch re-crawl.
2. Paste a URL in the `Inspect` search bar at the top, then click in the `REQUEST INDEXING` button.
# Annex
## JSONLD examples
- [General website description](#general-website-description)
- [Describing a home page structure](#describing-a-home-page-structure)
- [Describing the position in the website](#describing-the-position-in-the-website)
### General website description
```html
```
### Describing a home page structure
```html
```
### Describing the position in the website
```html
```
## ahrefs recipes to rank
### General SEO
- [How to Get on the First Page of Google](https://ahrefs.com/blog/how-to-get-on-the-first-page-of-google/)
- [A Simple (But Effective) 31-Point SEO Checklist](https://ahrefs.com/blog/seo-checklist/)
- [How to Improve SEO: 8 Tactics That Don’t Require New Content](https://ahrefs.com/blog/how-to-improve-seo/)
- [SEO For Beginners: A Basic Search Engine Optimization Tutorial for Higher Google Rankings](https://www.youtube.com/watch?v=DvwS7cV9GmQ&list=PLvJ_dXFSpd2uHtGoHf8K06ebr-TIrgM0G&index=2)
### Keyword research
- [How To Do Keyword Research for SEO — Ahrefs’ Guide](https://ahrefs.com/blog/keyword-research/)
- [Keyword Difficulty: How to Determine Your Chances of Ranking in Google](https://ahrefs.com/blog/keyword-difficulty/)
- [How many keywords can you rank for with one page? (Ahrefs’ study of 3M searches)](https://ahrefs.com/blog/also-rank-for-study/)
### Link building
- [The Noob Friendly Guide To Link Building](https://ahrefs.com/blog/link-building/)
- [9 EASY Link Building Strategies (That ANYONE Can Use)](https://ahrefs.com/blog/link-building-strategies/)
- [Guest Blogging for SEO: How to Build High-quality Links at Scale](https://ahrefs.com/blog/guest-blogging/)
# References
- [Introducing Progressive Web Apps: What They Might Mean for Your Website and SEO](https://moz.com/blog/introducing-progressive-web-apps)
- [Mobile-first indexing best practices](https://developers.google.com/search/mobile-sites/mobile-first-indexing)
- [Rendering on the Web](https://developers.google.com/web/updates/2019/02/rendering-on-the-web)
- [Small Bundles, Fast Pages: What To Do With Too Much JavaScript](https://calibreapp.com/blog/bundle-size-optimization)