On Saturday mornings, my 5 year old son goes to football training. This means I stand on a cold school playing field next to some other cold (and socially awkward) dads making small talk about how cold it is. A few Saturdays ago a dad who works in the IT department for a large city bank was on the phone for large parts of the session. Working in IT there are certain words or expressions that exhibit the cocktail party effect.
So, without wishing to eavesdrop I could not help but overhear these phrases: “unix patching”, “how long till we can restore service” and “rollback plan”. I could tell he was dealing with a legacy piece of his IT estate, so I did not ask him why he did not have a resilient set up that allowed him to use A/B patching groups and chose not to comment that ‘planned downtime’ is still downtime and should not really be tolerated.
Unless you work in a start up and/or have only one product line, then you probably have something running that you really should have shut down and put on freecycle long ago.
However, decommissioning old systems is neither glamorous, nor much of a business priority most of the time. The FT celebrated 125 years of business last year and we have numerous product lines and publications in online and paper based mediums. So, although we don’t have anything relying on a Babbage difference engine, there are a few systems that could do with consigning to a skip. That said, we have managed to start addressing this old kit. We are most of the way through turning off some legacy systems with the products that rely on them being either switched off completely or migrated onto newer shinier equipment.
Generating the momentum to attack these systems and getting the cooperation from other teams (with their own competing priorities) was the first part of this process. To foster enough desire to dedicate people’s time and effort to decommissioning, and to undertake the work itself, we stumbled across a series of steps which can be distilled into a few bullet points.
- Make a realistic Total Cost of Ownership (TCO) assessment and put it in front of everyone that makes decisions. This must include not only the hosting or power costs but also both the support costs of looking after old systems and the development costs of having to code around them too.
- Frighten the decision makers into action. Detail how many components of the system are End Of Life (EOL) or dropping out of support and use graphs like this bathtub curve of hardware failure rates from wikipedia.
- Tell people you will take on the problem. The first two points (if you have repeated them often enough) should have convinced people that there is a need to remove some old systems. They will now be aware that there is a problem and will want someone to tidy up for them. This is not the exciting end of IT, and your senior management team will just want someone to make the problem go away while they think about newer, more interesting kit. However, as one of my decommissioning team put it, “We are going to put away our old, broken toys before we start playing with the new ones”.
Once you have the green light (and some budget) to start work:
- Turn up the logging and use something to collate and examine the traffic and work your legacy systems are actually handling – at the FT we used Splunk . The devil is always in the detail here; you need to understand your data. How much are your bots or systems monitoring? How much should be dealt with by a different system already in place? Etc.
- Graph your progress. Drive down the traffic piece by painful piece and graph it so that the team and the rest of the business can see there is progress, and so does not lose heart.
- Set a deadline for other teams you depend on. When you understand the traffic the old systems are carrying and have started to prune it back, set a deadline and consider a department-wide countdown to warn other teams what’s coming and focus minds.
- Remind the business about the value of killing the old kit. Show metrics and other examples of benefits, reduced costs, increased simplicity, reduced power consumption, etc while you drive traffic away from the old systems.
- Finally, turn it off anyway. There will still be something that you have not been able to trim back or migrate. Ultimately the only way to find out if it matters may well be to simply pull the plug and see if anyone complains. I once worked with a network manager who, on inheriting a rats nest of cat 5 cabling, said he simply pulled a handful out everyday to find out if any one used them, if no one complained he knew he could tidy up.
Rejoice in turning stuff off, it should be celebrated… However, having decommisioned a system there is a pretty good chance no one will thank you nor, indeed, if you have done a very good job, even notice. In addition, because of the transient nature/pace of change/relentless progress of IT, you will have to start all over again almost immediately. However, I believe that without the janitors, all that relentless progress would eventually grind to a shuddering halt.