Publishing is an integral concept of the content management process in Sitecore. While all Sitecore developers and users should be familiar with the differences between types of publishing (Incremental, Smart, Republish), most will probably not know some of the nitty-gritty details that separate each.
Let's focus on one of the most-used and (most likely) least understood: Incremental Publishing. In order to understand the purposes and operation of incremental publishing, one has to learn all about the Sitecore publish queue.
Sitecore Publishing Refresher
There are three types of publish operations in Sitecore, and they are (listed from fastest to slowest):
- Incremental: publishes items that have been updated since the last publish
- Smart: publishes all items that have differences between the
master
andweb
databases - Republish: publishes all items from the
master
toweb
databases
These are the options that users see in the publishing wizard:
Here's a one-sentence history lesson: Incremental publishing was added to Sitecore as a way to provide a speedier alternative to Smart publishing (which is always going to be slow due to its item-by-item comparisons between two databases).
What is the publish queue?
The Sitecore publish queue stores a list of items that will be published during the next Incremental Publish operation. During an Incremental publish, items in the publish queue will be updated in the web
database.
Most functionality for the queue is provided by:
- the
dbo.PublishQueue
database table (master
,web
, andcore
all have this table, butmaster
is used for content) - API methods in
Sitecore.Kernel.dll
(acorss multiple namespaces) - Sitecore pipelines that handle CRUD and publish operations
There is a Sitecore Admin page that provides information about the publish queue (/sitecore/admin/PublishQueueStats.aspx
), although it only provides totals and other high-level information. As it turns out, getting detailed information on the contents of the publish queue is tricky (shameless plug: my SPRK module provides a Publish Queue Report that provides a detailed breakdown of all items in the publish queue).
How does the publish queue work?
At a high level, the publish queue's role in publishing looks like this:
Certain actions - such as an item being created, saved, or deleted - trigger pipelines that update the publish queue table
When an Incremental Publish starts, Sitecore grabs the last publish date (a DateTime of when the last publish operation took place) from the
dbo.Properties
table (in themaster
database)Items in the
dbo.PublishQueue
table are sorted/filtered based on the last publish date to determine what should be published - these are called publishing candidates- a helper class is responsible for sorting/filtering of the table; it's located at
Sitecore.Publishing.Pipelines.Publish.IncrementalPublishHelpers
- there is a
PublishingCandidate
concrete class, located atSitecore.Publishing.Pipelines.Publish.PublishingCandidate
- a helper class is responsible for sorting/filtering of the table; it's located at
Each item is sent through the standard publish item pipeline
After the Incremental Publish is complete, a new last publish date is added to the database's
Properties
table
Going Deeper
There are a few oddities and quirks with the publish queue that are not easily understood for most developers. I've covered a few major ones here.
Cleaning Up the Publish Queue
Much like the Sitecore event queue, the publish queue can grow quite large and potentially impact publishing performance. Sitecore has an agent that (by default) cleans up older entries (30 days) in the publish queue every 4 hours. This configuration is located in Sitecore.config
:
<!-- Agent to clean up publishing queue -->
<agent type="Sitecore.Tasks.CleanupPublishQueue, Sitecore.Kernel" method="Run" interval="04:00:00">
<DaysToKeep>30</DaysToKeep>
</agent>
Accessing the Publish Queue Directly
The general rule of accessing the Sitecore databases: don't. Always use the Sitecore API to interact with content, and the publish queue is a great example of why.
Trying to make sense of the dbo.PublishQueue
table right from SQL Server will result in confusion: it looks more like an event queue and can easily contain hundreds of thousands of entries. For example:
The Sitecore API has a few classes and methods that parse the publish queue - along with the last publish date - to build an accurate picture of what content is truly awaiting an Incremental publish.
- DefaultPublishManager: located at
Sitecore.Publishing.DefaultPublishManager
, this class and supporting helpers handle CRUD operations to the publish queue for CMS operations (such as creating, updating, and deleting items in the content tree) - PublishQueue: located at
Sitecore.Publishing.Pipelines.Publish.PublishQueue
, this class is used to parse the publish queue and provides an accurate list publishing candidate items
There are about a dozen other classes that relate to the publish queue in some way, and unfortunately some are deprecated, others are duplicate functionality, and others exists to solely support other areas of the platform. Consult your favorite decompiler for more information on these classes.
About That Publish Queue Stats Page...
The aforementioned Publish Queue Stats page is the built-in admin tool for getting basic information about the publish queue - but it's just that: basic.
During my time developing my SPRK module, I learned that the "Items to Process" totals on this page don't necessarily reflect that actual number of items that will be published from the queue. As it turns out, fewer items might be published than what's indicated by this number. During an Incremental publish, Sitecore checks to be sure each item actually has changes that need to be published - and skips items that don't necessarily need updated.