With components, each change can be isolated. Instead of one single executable file, the pages at Etsy are in separate files. A change "push" is just a few files at a time and doesn't require a server restart. This reduces risk while making deploys (and rollbacks) much more manageable.
Separate test/deploy strategies by risk. Updating a simple web page is one thing but what about services that require such personal information as credit cards, emails and passwords? Each of these might require a different test strategy or process. As reported three years ago, Zappos (a division of Amazon), separates the code that's regulated from the rest. Changes that impact the regulated systems go through a more stringent process that is released less often, with, yes, more formal checks.
Continuously monitor production. The damage buggy code makes in production is a multiple of how bad the bug is multiplied by the amount of time the bug stays in production. If the team can find and fix the defect fast, the risk is much lower. One key to finding the problems is seeing errors. For Web applications, this includes 500 errors, 404s (redirects to nowhere), crashes and other defects and all of that can be graphed and visualized with tools.
Here's Web developer/scientist Noah Sussman who designed Etsy's Continuous Integration (CI) System explaining the monitoring system at Etsy:
Automatic deploy and rollback. Monitoring production to find bugs is great; fixing it with the touch of a few keystrokes or a web-based app is even better.
(Bonus) Configuration flags. Instead of a patch or manual rollback, it may be possible to turn the feature off with a web-based app. All the programmer has to do is wrap the feature in an "if ()" statement that ties back to a code library. Change the config flag to "Off" and the new feature disappears. Sussman's article Config Flags: A Love Story.
(Bonus) Incremental rollout. Imagine config flags that are not global, but instead tied to the user. Users who want risk employees, their friends/family and known early adopters get to see the feature. Free and trial users see the feature when that config flag is flipped, and so on. The general theme is that more conservative users, the ones who use the software to run their business, see the most stable, well-tested version of the system. In their book How We Test Software at Microsoft, Alan Page and his co-authors refer to this as "testing in production." Instead of config flags, their team has different versions of the application running on different servers, and migrates processes to the right server based on user type.
Sign up for Computerworld eNewsletters.