DevOps & The Fire Fighting Dilemma


So there you are, happily working on your code when an outage is reported and your full attention now goes to restoring the affected systems. As the restoration time stretches on and on, you and your team are further pulled in to solve the outage, in a fire fighting mode. Depending how frequently the team goes into this fire fighting mode, no actual code gets finished, no actual improvement happens and the outages continue to eat up all the team’s time.
If this situation sounds familiar, it is because it happens far more often than IT organizations care to admit. And that is what happened to a friend of mine, who asked: “How am I supposed to go on like this?”

Accept Reality and Look for Clues

Well, you better get used to the idea that when systems are down and your customers are hurting, most organizations respond with an all hands on deck approach. I am yet to come across a situation where the star players who are able to resolve an outage were not pulled in to solve the big ones, dropping everything else in the process.
So what is one to do when getting sucked into a fire fighting exercise? Not only work to restore service, but also go in with an open and inquisitive mind! And in the process, look for “the lesson to be learned” and identify what could be done differently to avoid a recurrence.

The Bright Side of Disappointment

If you like things done right, you may often be disappointed when the team’s results do not materialize. And here is what I found out: by learning to cope with disappointment and letting it build up, I have been repeatedly forced to look for change, for a solution, and then another, and then another, in fact leading to a trail of directionally correct and meaningful continuous improvements!
So, as disappointment mounts, choose calculated rebellion (a.k.a. leadership) over conformism. Identify what is broken and figure out how to fix it. It is not necessary to look for “end world hunger” type solutions, just to identify a directionally correct and meaningful improvement. And keep at it!

The Need for a “Buffer”

It is as important to be able to avoid going into fire fighting mode as it is to go into it. But how can anyone avoid it when there is always a fire going on? That is where the need for a “buffer” comes up.
The buffer may be your ability (and willingness) to work extra hours, or it may be a relief crew (e.g. a contractor to augment the team for a while). And as non-intuitive as it might sound, the goal of the buffer is actually to “free up time”. As such, it is imperative to use the buffer to deliver results which will avoid the next fire, this way time invested now actually means time saved later!

The Light at the End of the Tunnel

So next time you find yourself asking “How am I supposed to go on like this?”, just give these steps a try:
  1. The nature of fire fighting (and daily operations) is beyond anyone’s control, so embrace it! Go towards it with your eyes and mind open. And while your primary goal is to restore order and minimize damage, in the back of your mind remember to learn from this experience, and identify how to avoid its recurrence (develop a problem statement).
  2. Turn disappointment into a force of good. Choose to lead change and identify directionally correct, meaningful improvements (postulate a solution).
  3. Create a “buffer” to code the envisioned automation.
  4. Assess the code impact (e.g. reduced down time) during the next fire drill and look for the next area of improvement!
In the end, DevOps is about living the dream (or the nightmare ;-)) in order to learn how to use code to deliver continuous improvements and truly live the dream! And joining Ops+Dev is the most effective way to learn about the problems and get better at addressing them. Hummmm, seem like this whole DevOps thing might make sense after all. What do you think? Feel free to use the comment box below to share your thoughts. And please follow me, if you would like to be automatically notified when I publish new articles.


This work by Marcelo Bernardes (@marcelobern) was cross-posted on Medium, and LinkedIn and is licensed under a Creative Commons Attribution 4.0 International License.


Popular posts from this blog

Cloud integration will change Professional Services

What do SMAC and RGB have in common?