Digital Immune System
Gartner has started talking about a new ‘stalking horse’ called the Digital Immune System (DIS). In fact they say “By 2025, organizations that invest in building digital immunity will increase customer satisfaction by decreasing downtime by 80%.” As you might imagine when a NonStop person hears about ‘reducing downtime’ and ‘increasing customer satisfaction’ our ears perk up. The whole NonStop mission is to create a computer that doesn’t fail and the reason for the mission are the consequences that occur when a system fails. In many cases loss of money but in some even loss of life. Availability has always been a key feature and will remain so. Back in the old days we had some percentages for why a system failed. They were generally, physical (actual hardware failures), design (hardware, software and application errors), environmental (power and cooling, natural disasters, network connections) and operations (the things people did to inadvertently crash a system). The NonStop system was designed to ride through the physical and design issues through hardware fault tolerance and the unique Operation System, which would fail-over into a different environment. Environmental issues were solved by a business continuity or disaster recovery system. Human error, though harder to control is assisted by automation and a system that is easier to manage based on our Single-System-Image virtualization. And our record stands for itself. There is a somewhat new threat these days. I suppose it could be lumped under environmental but is specifically tied to security. More and more the outages we read about involve cyberattacks. In a previous “Trends & Wins” I mentioned a Ransomware recovery strategy we are working to release. That covers one of a number of cyber issues. This is what Gartner is talking about in their Digital Immunity System. Just as a human body has natural defenses against virus and bacteria to avoid illness, Gartner envisions a digital immune system within Datacenters which can detect and encompass a rouge cyber attack limiting exposure within an organization. An immunity system, according to Gartner, would have six Prerequisites:
- Chaos Engineering
- Auto Remediation
- Site Reliability Engineering (SRE)
- Software Supply Chain Security
Observability is a key and crucial component of a DIS since the trick is to observe what’s normal, to see how systems interact, to determine data flow within the environment and then to be on the lookout for changes.
AI-Augmentation is the new verbiage for what NonStop saw in the Operations section of downtime. Humans when they don’t do something on a regular basis mess up (now and again). AI-Augmentation is a way to have machine-learning and AI in testing situations, especially software testing. Can AI produce better testing and therefore better harden applications against attacks? The belief is, yes.
Chaos engineering is similar to Chaos monkey. Chaos Monkey was a software tool developed by Netflix to test the resiliency and recoverability of Amazon Web Services (AWS). The software would simulate failures of instances of services running by shutting down one or more of the virtual machines. Netflix would ‘inject failures’ into their systems and get better and better at not only recovery but in creating more durable code from the beginning. By having a DIS apply these techniques a much better and stronger environment can be developed and maintained.
Auto remediation is a concept that applications can be designed to provide self-healing if they start detecting variances in the normal state. This would be in conjunction with Ai-augmentation and chaos engineering to better design applications that are more self-aware.
Site reliability engineering is really a combination of all of the above but focusing on customer experience. How well is everything performing from the perspective of the people interacting with it? SRE can be considered the implied or perceived service level agreements or really the satisfaction level of the users of the system. SRE is designing toward a beautiful customer/user experience.
Finally software supply chain security is a way to protect the overall software environment. Today environments are made up of proprietary/custom applications, open-source, SaaS and public cloud areas. Supply chain security needs a ‘bill of materials’ for all the software components and needs to find ways of verifying downloads and having strict version-control policies.
This is what Gartner considers important for a DIS. In my mind such a system or at least the heart of such a system also has some requirements and these are:
- Available – 24x7x365
- Scalable – linear and without limits
- Secure and Immutable
- High-speed Interconnectivity to spokes
- Secure, available, scalable Database capability
And as one my expect from me, that sure sounds like a NonStop system. So will we work to create a core DIS system? Keith Moore and I (along with a few others) are discussing the idea. The requirements seem to be a good fit for the NonStop system and the whole concept of a DIS matches our mission statement. All I can say is ‘stay tuned’.