What is an XML Sitemap?
An XML sitemap (also known as an Extensible Markup Language Sitemap) is a file in simple text format that details all the URLs on a specific website. It includes information on each specific URL, including the importance of the page, when it was last updated and how frequently it is updated amongst other information. The purpose of having an xml sitemap is to assist search engines (like Google) to crawl a website more efficiently, ensuring that if any changes are made to the web page the sitemap can be updated automatically, including when there is a new page added or removed from the site.
There are no risks or disadvantages associated with having an xml sitemap, even though it is not a guarantee that search engines will read your sitemap. You will not be penalised for having one and your website can benefit from having one rather than not having one.
Types of files supported by XML Sitemaps
Sitemap XML files support different types of files, search engines can detect the various types of sitemaps.
Here is a list of file types supported by XML Sitemaps:
- Geo Data
Why have an XML Sitemap?
- XML sitemaps help search engines get a better understanding of what you would like to index on your site, it also indicates the most important pages that you want to prioritize in the crawling process. It indicates the order of importance of your pages ranging from the most important to the least important.
- An xml sitemap can be highly useful if your site is not well structured or does not have good internal linking.
- If your site has a lot of pages and a deep structure, a sitemap makes it easier for a search bot to navigate the site without missing the important pages.
How to create an XML sitemap?
There are various tools used to create xml sitemaps, one of them being Screaming Frog. Creating an xml sitemap with Screaming Frog is simple. The first thing you need to do is crawl your site with, then Screaming Frog brings up all the URLs on your site from indexable, non-indexable and all pages your site with different status codes. Once the crawl is complete you can simply click the option to create an xml sitemap, choose which pages to include and exclude, then save the file as “sitemap.xml” in a folder.
Below is an example of an xml sitemap created by Screaming Frog:
After creating your sitemap and implementing it on your site server, it is essential that you submit the file, to search engines like Google Search Console to advance the speed of your site being indexed.
What is a robots.txt?
A robots.txt is a text file that is created to direct search engines bots on how to crawl pages on your site. Robots.txt files form part of the robots exclusion protocol which are standards that regulate how bots should crawl a site and index the content of a site.
In simple terms, a robots.txt file indicates to user agents whether to crawl or not to crawl a website. The instruction given by robots.txt files is specified as allowing or disallowing the behavior of user agents. They can also be dangerous because you can quite easily accidentally block your whole site from being crawled, so you need to be careful when setting the rules to indicate the pages to block.
How to create a robots.txt file?
Creating a robots.txt file is as simple as creating any other .txt file via tools like notepad. So, you just simply open an empty notepad file and then enter the preferred format (see below). Once you have the format in and have set the allowed and disallowed pages you can save the file as a .txt file then you can implement it on your site.
A typical robots.txt file has the following basic format:
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
Why have a robots.txt file?
Robots.txt files can be very useful as they control the access of bots in specific areas of a site. A great use for a robots.txt file could be to indicate where your sitemap is, in that way search engines won’t have to crawl your entire website to distinguish between the important pages and the not so important pages.
Having a sitemap.xml and robots.txt file can be very good for your SEO efforts; these tiny files can give your site structure and order when being crawled by search engines. It is not a must to have these files on your site, but we highly recommend them as good SEO practice. Having these files is better than not having them at all, it gives search engines a clearer picture of the order of importance of your webpages and what you want and do not want to be crawled on your site.
Is your website XML and Robots file correctly? The Algorithm Technical Audit will help accelerate your digital strategy. Find out more.
Article Author : Kevin Machimana