About Movable Type's Publish Queue

The Movable Type Publish Queue is an essential component to any large scale Movable Type powered web site because it plays a crucial role in publishing performance optimization. There are a number of benefits to using the publish queue, they are:

  • It eliminates redundant, duplicated and unnecessary publication of files.

  • It offloads publishing to stand alone process which can be throttled and scaled independently from the Movable Type web application itself.

  • It speeds up the commenting experience by reducing the number of files that an end user must wait to be published prior to being able to navigate the web site again.

How it Works

It might be best to describe how the publish queue works by examining a scenario in which it would be utilized: republishing the necessary files in response to a comment.

Adding Jobs to the Queue

When a comment comes in to Movable Type multiple files are often in need of being updated, not only because the comment needs to be published to the entry’s permalink page, but also because multiple other pages which display a comment count associated with the comment’s entry may need to be updated.

Each of those pages (assuming they are configured to be published via the publish queue) will then be added to the “publish queue.” When this happens, a publishing “job” is created and added to the database for each page that need to be published. There is one row in the database for each individual job in the system.

Now let’s assume for a moment that shortly after receiving the first comment, a second one is published by a different visitor to your web site. This action also results in pages needing to be republished. However this time, before those pages are added to the queue as jobs the system checks to see if a job corresponding to each page is already on the queue. If there is, then the job is discarded because its work would be unnecessarily duplicated otherwise. If the job is not already on the queue, then it is added. This ensures that no unnecessary work is performed by the system.

In addition, each page that is added to the publish queue is given a priority which dictates the order in which the corresponding job will be processed. The higher the priority, the sooner the system will work on the job. Movable Type assigns priority based upon the following criteria:

Page/Template TypePriority
Preferred Page and Entry archives10
Index templates with a filename beginning with “index” or “default”9
Feed index templates9
All other index templates8
Non-preferred Page and Entry archives5
Daily archives4
Weekly archives3
Monthly archives2
Any Category archive1
Any Author archive1
Yearly archives1

And that is how jobs are added to the queue. There is a separate process that exists that is then responsible for publishing.

Creating Publish Queue Workers

One or more publish queue “workers” can be created to process jobs on the queue. The number of workers needed by a system is based largely upon two variables:

  • The capacity of any one worker to process jobs on the queue.
  • The volume of jobs being added to the queue over time.

A worker is created by running the “run-periodic-tasks” script that comes with every copy of Movable Type. This script can be run in three modes:

  • daemon mode - in this mode the script never quits; instead it constantly monitors the job queue for work to be done and nearly the instance a job is made available for work, the script will begin work on it.

  • run-once - in this mode the script is run via the command line and will quit only after there is no more work on the queue to be done.

  • scheduled task - in this mode the script is executed in the “run-once” mode periodically according to a schedule defined by cron or a similar service.

Processing Jobs on the Queue

Each worker will monitor the queue for jobs. When one becomes available it is pulled off the queue to be worked on. Once it is "off the queue" no other workers can claim it. This makes sure that no two workers are trying to work on the same job at the same time.

In the event that something goes wrong during the publishing process and the file is not published, then the system will notice saying something skin to, "uh-oh, look at this job that was claimed on the queue, but was never successfully finished," and then free up the job for a worker to pick up and try again on. If the task is retried more than 5 times, then the job is marked as failed and left on the queue. In this state it is possible for a similar job to be placed on the queue, and if the problem that was resulting in the published failure is not transient, then that job is likely to fail again.

An important thing to note is that if a job is pulled off the queue by a worker to be worked on, then it remains possible at that point in time for that same page to be added to the queue again in response to the receipt of another comment. The rational being that by the time the page is finished being rebuilt it is most likely out of date, and so needs to be published again.

What Powers It?

The Publish Queue is powered by a stand alone job/queue management library called “The Schwartz.” The Schwartz is actually a more generic and abstract job management system capable of processing any number of tasks via a similar queuing mechanism.

For the time being, Movable Type only utilizes the Schwartz for publishing, but in the future may use this framework for sending emails or other non-critical system tasks.

Publish Queue Tools

There is one tool in particular that is recommended for most systems that utilize the Publish Queue, aptly named the Publish Queue Manager.

This tool provides a user interface within Movable Type that allows administrators to monitor and inspect jobs on the queue. Each job can be deleted, or have its priority changed.

For more information, visit the plugin’s web site.

Using RSync and/or NFS in Multi-Server Environments

In large multi-server environments it often becomes necessary to take a single file that has been published by a Publish Queue worker and somehow get it to show up on multiple front end web servers.

Confused? Well, consider the following scenario. Suppose your system employs multiple machines for the express purpose of process publishing jobs on the publish queue. Now let’s say one of those machines updates one set of files and another one of those machines updates a different set of files. How then does one get these disparate files to a machine intended to serve them to your readers.

In a single server environment this is never a problem because the machine serving the files and the machine publishing them are one in the same. Therefore publish queue workers publish directly to your web servers document root and thus make updated pages available. In a multi server environment there are generally two different ways to solve this problem:

  • Link your front end web servers and your publish queue machine together via a shared filesystem like NFS.
  • Physically copy files from your publishing machines to your front end web servers via rsync or scp.

Now, let’s explain each of these options in more detail.

NFS

In using the NFS solution all of your publishing servers (or Publish Queue workers) write files to an external NFS mount. In so doing these files never actually physically reside on the publishing server, they only appear to be local thanks to NFS which helps different servers share the same set of files between them. The front end web server then mounts this shared NFS directory for reading.

ProsCons
Scales better because each file is written once and immediately made visible on the front end web server.Very poor performance in a geographical disparate setup (e.g. Amazon EC2 or other cloud services).
Easier to setup IMHO.Single point of failure. If something were go wrong with your shared filesystem, then much of your system will be hosted. This can be mitigated with a solid RAID config or other highly reliable disk storage.

Note: “NFS” here is used only for illustrative purposes. Technically any shared file system technology will do.

RSync

When using rsync, Movable Type will invoke a command line utility designed for keeping two different file systems in sync with one another. This is what happens when Movable Type is configured to use RSync:

  1. User leaves a comment.
  2. Job is created in Publish Queue.
  3. Worker pulls job off queue and publishes file to local file system.
  4. Worker then begins to rsync (usually via scp) to each of the designated servers.
ProsCons
Failure tolerance - by replicating your published content you ensure that if one file system or server goes bad, you still have something to fall back on.Slightly harder to setup IMHO.
Great for cloud hosting services like Amazon EC2, or any time in which your publishing server and front end web servers are not likely to be on the same subnet.Scalability - the more front end web servers you have the more servers you will need to synchronize with. This can add latency to your publishing process and cause some servers for a brief period of time to have slightly different content from one another.
Only works in Unix environments.

Setting Up Publish Queue and Rsync

To get started using Publish Queue and rsync you will need to follow these steps:

  1. Make sure that your publishing servers are configured to publish files to the exact same path as your front end web servers are configured to read from. In other words, your publishing server should mirror exactly the file/directory/path structure of your front end web server.

  2. Setup a user on your front end web server has that has write access to the directory that serves your blog’s published files to the outside world. Make sure this user can connect via SSH to your front end web server from each of your publishing servers - without having to supply a password. This often done using SSH’s special file called .authorized_keys.

Testing Your Setup

Once this is complete it is best to test make sure you can transfer files between the two hosts. To do so, successfully execute the following command from one of your publishing servers:

prompt> cd /
prompt> scp /path/to/a/file.txt username@someserver.com:/path/to/a/

If it is not obvious, please make sure to replace “/path/to/a/file.txt” with an absolute path to a file in your blog’s document root. Also, replace “username” and “someserver.com” with the username and server address to transfer files to.

Your mt-config.cgi file

Once you have tested that files can be transferred between hosts without being prompted for a password, then add this to the mt-config.cgi file on each of your publish queue servers:

SyncTarget username@someserver.com:/
RsyncOptions -e ssh

Additional Reading

To learn more about the Publish Queue, consider reading the following resources: