In the scenario above, a visitor gets an inconsistent view of the website. They load an old version of index.html
from before the deployment and a new version of style.css
from after the deployment. Depending on how the website layout was changed, this could result in a totally broken experience.
I call this the client-side consistency problem. It is a textbook example of a distributed systems problem.
The problem is even worse if the website relies on scripts. New scripts might have code with wrong expectations about the old version of index.html
. If there are more than one script, they might be downloaded in any order, with a mix of scripts coming from before and after the deployment.
Atomic deployments don't solve the problem. In the scenario above, the deployment was atomic! Both files got deployed at exactly the same instant.
It's easy to think that the problem is unimportant because it is low-probability. After all, a visitor has to get unlucky enough to load a page during a deployment for the problem to affect them.
However, that's the wrong way to think about it. A better way to think about it is to conservatively assume that every visitor who comes to your site during a deployment will get a broken copy of the site. Framed in those terms, deployments suddenly seem very dangerous!
To make matters worse, visitors don't have to visit your site during a deployment to be affected. Depending on how their browsers and your server are configured, this sequence of events could be possible:
In this second scenario, a visitor gets a broken version of your site because their browser cached the style.css
file from an old version of your site. After loading the new index.html
from after the deployment, they again have a broken experience even though their page load didn't overlap with a deployment.
I have some reason to believe that a lot of big companies have independently identified and worked around this problem:
Specific technologies vary, but the general technique works like this:
style.css
becomes style__v1.css
, style__v2.css
, etc.That rules out the first scenario because the client's old version of index.html
will still point to the correct, old stylesheet (style__v1.css
):
And it rules out the second scenario because the new version of index.html
will point to style__v2.css
, which the visitor's browser has not yet cached:
Eventually, old assets like style__v1.css
will have to be removed from the server. This is akin to garbage collection, and it is notoriously difficult in distributed systems. Ideally we would somehow "fence out" old clients so their browsers would never use a version of index.html
that references style__v1.css
... but there is no practical way to implement such a fence.
Instead, we have to settle for a carefully-chosen retention time. The retention time has to be long enough that we can be virtually certain no clients will be using an old version of index.html
that references style__v1.css
. The correct retention time depends on a lot of details about how the website is configured, but since disk space is cheap, it can probably be very long, even up to a year or several years.