News from HPE’s NonStop Division

HPE NonStop: Because business resilience does not tolerate shortcuts

Dear NonStop community,

As it appears evident this year, we’re all going to discuss of cyber resilience, isn’t it? Is it just marketing fluff and a new name for cyber security? Or is there more to it? Whenever there is a new trend in the industry, you invariably see everyone jumping on the bandwagon, claiming they have the best offering to address that problem. And why not. After all, if you have a good product or even a good feature, why wouldn’t you take the chance to promote it the best way you can. Fair game, maybe, but truly not all offerings are equal, and you end up with a whole spectrum of solutions, all claiming to achieve the goal. Some will be good, but some other, under immense market pressure, will have to simply make the claim with whatever they are able to deliver in a short time frame and, in some cases taking ridiculous shortcuts.

Take High Availability (HA), a well-known domain under the umbrella of business resilience. As we know in our community, it takes quite a bit of design, architecture, features, integration, processes, partnerships, and skills to reach the level of fault-tolerance that NonStop typically achieves. Yet, often in our industry, one can easily label a product “HA” and be done, drawing a responsibility line that stops to that single product. Many won’t investigate the details any further, yet we know how those details could produce dramatically different outcomes. A 20 second failover time for this product, possible data loss for this other one, having replicas only for database reads for this 3rd one, make a choice between availability and consistency (aka the CAP theorem issue) for database writes, are all part of the disappointing reality of those HA implementations. They put the responsibility onto customers to close the gap between keeping a product always up versus keeping an application always up.  And we should keep in mind that the best availability you can get is as good as your weakest link and this is the reason the HA category is unforgiving, cannot tolerate shortcuts or a vision of high availability that is limited to one single product at a time.

One of the most underrated shortcuts in my opinion is the complexity to reach that high availability outcome. One example goes back 8 years ago when OpenStack was the biggest trend in the industry. It was the new IaaS framework, in response to VMware, that was completely based on Open Source. In a record time, a highly active community was able to design and rollout this new framework and get quick and significant adoption.  However, while providing a control plane for IaaS was the main goal, high availability came as an afterthought in many places. For example, each OpenStack service (about 20 of them such as Keystone, Horizon, Nova…) had their own metadata and if you wanted to make sure this metadata is HA then the suggestion was to implement 3 replicas of MySQL or MariaDB databases. Wait, what? Isn’t that an immense complexity overhead just dumped on the end-user to figure out and manage forever and ever!?

The root cause of this unacceptable and unnecessary complexity is rooted in the fact that those OpenStack services could not rely on a standard high availability file system or database that would exist in all Linux distributions. Linux does not come with a single system image (SSI) cluster out of the box, neither does it come with a database cluster that you can always expect to be there. The consequence is those services solve their own HA issue in silos and now you have hundreds of smallish databases popping up everywhere, where you multiply servers to place the replicas and each with a set of users and passwords that are potential security breaches if not paid attention to and let’s not go into the cost and supportability implications. And remember those manuals steps are only part of setting up the control plane, which is supposed to make your life easier in managing your IaaS but now you have to make that same control plane reliable by yourself! Sorry, this is too complex, too unsecure, too risky and these are the outcomes of taking design shortcuts.

NonStop on the other hand, does not take shortcuts. From the beginning it was designed to solve the most complex issues to achieve high availability at scale. It made it in a way that propagates elegantly and uniformly in the s/w stack all the way up to the application which is the one that really needs to be highly available. Ultimately this is how applications running on NonStop achieve superior availability.

Today, while other vendors still have to figure out better solutions for high availability, NonStop can move on serenely to tackle new and urgent matters such as cyber resilience, another key element under the overarching business resilience domain.  Like high availability, security does not tolerate any weak link. It requires to have a comprehensive approach such as defined by NIST with the CIA triad (Confidentiality, Integrity, and Availability). While the security umbrella includes mechanisms and concepts such as strong authentication, zero trust security and encryption it also requires strong availability, the A in CIA. The NonStop platform knows those attributes extremely well, but the high availability of the application needs to be extended to include high availability of your data no matter what. Ransomware is the pain point to be addressed and cyber resilience therefore defines your capacity to protect your business against this new type of attack.

 

In a recent business resilience presentation, a quote struck with me. “According to the UN every dollar invested in disaster preparedness can save 7 dollars’ worth of disaster related economic loss” (Building Resilience: the importance of prioritizing disaster risk reduction 15 Aug 2012: Helen Clark, UN Development Program Administrator).  It becomes clear that my earlier question, is cyber resilience marketing fluff, was merely provocative. Companies which do not embrace a strong plan for cyber resilience take a big risk with the company itself. Cyber resilience is the realization that business continuity, reliability and security are all equally important. You cannot take shortcuts and while innovating is key to stay competitive, doing it with a reliable platform is what will bring the best outcome.

To find more about cyber resilience, do not miss our NonStop TBC 2023 conference that happens on September 12-14, in Denver Colorado. You’ll meet a community whose DNA is business continuity, where domain experts in security and hybrid clouds can exchange ideas or start new partnerships. And decision makers learn how to increase their business resilience.

Author

  • Roland Lemoine

    Roland Lemoine has been working on NonStop for 23 years and is currently the product manager for database and blockchain languages and development products. Previous experience includes customer support for middleware products, Open Source advocacy and a strong UNIX background.