By: Anurag Sharma
In an increasingly distributed world we often ask ourselves if continuous delivery can accommodate legacy software systems and if we can use chaos engineering to improve reliability in these environments. There is frequently an assumption that legacy can’t be agile or an attitude that its future is uncertain but sure to be short. But nobody knows how short and it could be years or even decades before this type of technical debt can be worked away. Legacy is heritage, is cherished and often represents core business systems in an organization; cash cows that the business depends on, that have become highly complex during their existence and as such are incredibly difficult to replace and cannot just be abandoned.
Can Chaos Engineering Help with Legacy Systems?
Not everyone is a unicorn, born on the web FANG and even these organizations carry some legacy although they may not find it as crippling as many older enterprises do. Legacy applications carry with them legacy databases built on technologies and there’s a good chance any organization has to deal with massive old-style RDMS. Whilst most of the world is moving from monoliths to microservices, there is always a transition period where legacy applications and components need to operate close to the speed of the new world. These legacy monoliths typically have a tightly coupled architecture which needs to be loosened to allow for incremental, small batch change, test and release in order to enhance velocity.
Applying the best approaches in isolation will be like taking a look at one tree and assessing the impact on the rainforest.
There is a proven approach for building consistency into software development, continuous delivery, which is here to stay. On the IT operations side we have chaos engineering, capable of swiftly uncovering the failures of software that teams aren’t aware exist but have the potential to ruin a business. Chaos engineering carries a real and very clear message that it’s preferable to constantly practice small failures than increase the risk of catastrophic public failure which can seriously and adversely impact a business’ reputation.
The Idea is to consider the overall ecosystem and when it comes to legacy choose your battle appropriately, which includes all your critical legacy systems.
Let me clarify a few things about resilience assessment approaches, Disaster Recovery versus Game Days before we proceed further.
What doesn’t matter for Chaos:
Chaos to your Legacy CICD Pipelines
The ultimate goal of your CICD is to automate the software build process to enhance velocity. Once you set it up it makes sense to integrate chaos with your CICD. As part of the deployment pipelines, you can push your chaos files to start disruption in the specific environment. Here are a few scenarios:
Remember when you include chaos in your CICD pipeline to continuously validate key hypotheses where deployment should always succeed, even if capacity is low.
Useful Experiments on Your Build cycle
For legacy pipelines, let’s take the example of the mainframe. It starts with version control tools like ISPW, ChangeMAN, etc, Build, Release, Deploy e.g. Topaz, IBM tools, etc, Operate manual/automated, Monitoring BMC, Splunk, etc. Here are two chaos experiments which help to assess your pipeline:
The idea here is to speed up deployment and find issues before they hit production
Which are the best monkeys for Game Days?
For build pipelines, the golden spot remains in the middle because, usually, the software itself plays a role in responding to the failure. For example, the software might include an automated restart, throttling, failover, etc. If those are software functions, then the software can either work or not work, and the build should be able to uncover that.
A true differentiation of the best from the rest is, your growing focus on the reliability of the entire ecosystem, how effectively you test the resilience of your system from build to all the way through production. Chaos Engineering along with the future of releases CD are two best set-ups that use it effectively and get the maximum value of it.