How to create a Multi Site Sitemap

Let's assume for a moment that you know how to create a plain old Google Sitemap - one that provides a listing of every page and entry on a web site. Now, what if you wanted to created a Sitemap for a collection of web sites? And what if you had a large collection of web sites? What then?

Well, you could go about doing what you normally would do for a single blog, but instead add the blog_ids="all" parameter. That would work, but is this the best solution? Absolutely not.

Aggregating that many entries and pages together in a single file is almost certainly going to have a very negative effect on publishing performance, especially over time. Creating a file of that size is time consuming in and of itself, not to mention the database load of pulling together and iterating over such an enormous dataset. So how do you compensate?

Luckily there is a pretty straightforward solution that Google has provided for us:

  1. Define a Google Sitemap for each blog in your installation. We wrote an article about this previously to help you.

  2. Create a Google Sitemap Index which lists all of your individual Google Sitemaps for each blog in your installation. Set the template to output a file called sitemap_index.xml. The template is simple and looks like this:

     <?xml version="1.0" encoding="UTF-8"?>
     <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
     <mt:Blogs>
        <sitemap>
             <loc><$mt:Link template="sitemap"$></loc>
        </sitemap>
     </mt:Blogs>
     </sitemapindex>
    

The template above makes a pretty fundamental assumption, that every blog in your install has a template installed via a theme with an basename or id of "sitemap". If such a template is not installed, then a publishing error will occur. This of course is problematic. So when creating and installing this template you may wish to make modifications to account for this. Listed below is an advanced sitemap index template that includes some plugin options I installed via Config Assistant which makes hiding blogs from the sitemap much easier, as well as controls which blogs get listed.

Advanced Sitemap Index

The following template is a slightly more advanced version of the one above. It does two things:

  1. Relies on a blog level checkbox custom field called "ExcludeBlogFromListing" which controls whether or not a blog should be indexed by Google. You can create this custom field manually, or define it in your theme.yaml, or as a plugin option via Config Assistant.

  2. It only links to a blog's sitemap template if the blog has a theme id of my_theme_id.

These two options give you more control over the visibility of blogs in your system allowing you to hide blogs from Google if you want/need to.

    <?xml version="1.0" encoding="UTF-8"?>
    <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <mt:Blogs>
    <mt:if tag="BlogTemplateSetID" like="/^(my_theme_id)/">
    <mt:ExcludeBlogFromListing>
    <mt:else>
       <sitemap>
            <loc><$mt:Link template="sitemap"$></loc>
       </sitemap>
    </mt:ExcludeBlogFromListing>
    </mt:if>
    </mt:Blogs>
    </sitemapindex>