HPE Virtualized NonStop: Journey to the Cloud

HPE Virtualized NonStop (vNS) systems deployed in private VMware clouds have elicited positive feedback on performance, availability, and seamless application migration from traditional converged NonStop servers. Although vNS adoption on private VMware clouds continues to increase, HPE has heard from several NonStop customers about their plans to migrate mission-critical workloads to public clouds.

Driving forces for this trend include reducing real estate and power consumption costs, claiming net zero carbon consumption, and having the systems managed by HPE and/or cloud providers. In many cases, NonStop customers have signed multi-year contracts with cloud providers, setting horizons to migrate workloads in the next couple of years.

This article provides a sneak peek at the investigative work being carried out by the HPE NonStop R&D organization to deploy vNS systems in public clouds.

Requirements for vNS in the Cloud

NonStop R&D identified several high-level requirements through discussions with customers:

  • Hyperscalers: vNS deployments in Microsoft Azure, Google Cloud Platform (GCP), and Amazon AWS.
  • Production system geographies: in-country data centers in multiple geographies.
  • Development and test system geographies: flexibility for out-of-country data centers in multiple geographies.
  • Hardware: general fleet servers are preferred to broaden deployment options and reduce costs.
  • Software compatibility: must run existing L-Series applications without changes.
  • Deployment and configuration: must be easy to deploy and configure.
  • Security: must secure system and data.
  • Availability: must achieve the same availability as the current vNS solution.
  • Manageability and operations support: Inquiries were received for HPE GreenLake Managed Services for NonStop, but cloud providers will also play a role in the upkeep of the infrastructure.
  • Billing: must support HPE GreenLake and NonStop Dynamic Capacity (NSDC), and possibly offer usage-based billing in the future.
  • Licensing: A Bring Your Own NonStop License (BYOL) model is a possibility.

Hyperscaler Offerings

Besides discussions with customers, NonStop R&D also engaged with Microsoft Azure, GCP, and AWS to evaluate if their cloud offerings meet current architectural requirements to deploy vNS systems. The key findings from these hyperscalers’ offerings are:

  • RoCE NICs and VMware: These are supported by specific Microsoft Azure and GCP cloud solutions available in select geographies. These solutions are called Skytap and Google Cloud VMware Engine (GCVE), respectively.
  • System fabrics: Redundant Ethernet switches are available, but they may carry traffic from other servers in the same rack.
  • Storage: Redundant storage options identified so far include VMware vSAN and internal drives in hosts without vSAN.
  • Security: Security capabilities offered include dedicated servers with secure access through the cloud provider network security architecture.
  • General fleet servers: These are widely available as dedicated servers in multiple geographies, but are configured with general-purpose Ethernet NICs and the hyperscaler’s native hypervisor (e.g. Hyper-V for Microsoft Azure) in lieu of RoCE NICs and VMware.
  • Orchestration: Multi-cloud orchestration frameworks such as Terraform are generally available; VMware vRealize Orchestration is offered by the Microsoft Azure Skytap solution.
  • Fault-tolerant deployment: Supported through anti-affinity rules, but approaches to specify VM placement vary between offerings.
  • Dedicated cores and memory: Supported, but approaches to specifying these resources vary between hypervisors.
  • Infrastructure upkeep: Hyperscalers generally handle hardware and virtualization infrastructure maintenance and upgrades, and generate alerts for planned and unplanned maintenance.

Addressing Main Gaps to Deploy vNS in Public Clouds

NonStop R&D is pursuing multiple areas of investigation to address four main gaps identified during discussions with hyperscalers:

  • RoCE NICs and VMware: NonStop R&D collaborated with Microsoft Azure and GCP to confirm the feasibility of deploying Proof of Concept (PoC) vNS systems on Skytap and GCVE, respectively. A solution based on RoCE NICs and VMware has the potential to be the first vNS solution in public clouds. However, availability of this solution will be limited to geographies that offer the Skytap and GCVE cloud solutions.
  • General fleet servers: Broadly available general fleet servers are typically configured with general-purpose Ethernet adapters and the hyperscaler’s native hypervisor. Because of this, NonStop R&D is investigating a long-term vNS solution that can use general-purpose Ethernet adapters for the system fabric traffic, in addition to supporting cloud-native hypervisors.
  • Orchestration: NonStop R&D carried out PoC tests to confirm the feasibility to deploy vNS systems with two popular multi-cloud orchestration frameworks (namely, Terraform and Pulumi).
  • Infrastructure upkeep: Discussions are currently underway with hyperscalers on how operational scenarios such as host upgrades will be handled.

These areas of investigation are being pursued in parallel to chart a path to two possible vNS on public cloud solutions:

  • An initial solution that leverages RoCE and VMware offerings from Microsoft Azure and GCP.
  • A long-term solution that can be deployed on general fleet servers from all major hyperscalers.

Initial vNS on Public Cloud Solution

A potential initial vNS on public cloud solution leverages cloud offerings that support RoCE NICs and VMware (namely, Microsoft Azure Skytap and GCVE). These offerings are available in several, but not all, geographies supported by Microsoft Azure and GCP.

After initial planning discussions with hyperscalers, PoC vNS systems were successfully deployed on Microsoft Azure and GCP data centers in October 2022, with redundant fabric switches. This exciting milestone was achieved through collaborative work between NonStop R&D and the hyperscalers. Both PoC systems successfully passed functional and fault tolerance tests carried out by the HPE NonStop QA team. Live demos of both systems were offered during the 2022 NonStop Technical Boot Camp.

The Microsoft Azure vNS PoC system leverages the Skytap cloud offering. This is a small system deployed on two servers running the L22.09 RVU. The vNS configuration consists of 2 virtual NonStop CPUs, 2 virtual storage CLIMs, 2 virtual network CLIMs, and 1 virtual NSC. This PoC system uses internal drives in the servers as VMware datastores. The VMware environment includes vRO (vRealize Orchestrator), allowing the system to be deployed with the current vNS Deployment Tools. The Microsoft Azure Skytap portal shown below provides access to the vNS PoC system.

After logging into the portal, jump host VMs provide access to VMware vCenter and the vNS PoC system. As the OSM and TACL screen shots below show, the “look and feel” of the system is exactly the same as a vNS system deployed on a private VMware cloud.

The GCP vNS PoC system leverages the GCVE cloud offering. The system was deployed on 4 servers running the L22.09 RVU. This configuration consists of 4 virtual NonStop CPUs, 2 virtual storage CLIMs, 2 virtual network CLIMs, 1 virtual NSC, and a spare server for rolling upgrade tests. This PoC system uses VMware vSAN datastores. The system was deployed through direct interactions with VMware vCenter, since VMware vRO was not installed in this environment. Terraform is a standard offering in GCVE, and NonStop R&D is investigating using the GCVE Terraform provider plug-in for vNS deployment. The GCP portal shown below provides access to the vNS PoC system.

Once inside, the “look and feel” of the system is also the same as a vNS system deployed on a private VMware cloud (see OSM and TACL screen shots below for details).

Long-term vNS on Public Cloud Solution

NonStop R&D is also investigating a long-term vNS solution that can be deployed on general fleet servers from all major hyperscalers. The key areas being explored are:

  • Leverage general-purpose Ethernet NICs for system fabric traffic,
  • Support for hyperscaler native hypervisors, and
  • Multi-cloud orchestration.

Leveraging General-Purpose Ethernet NICs for System Fabric Traffic

The following diagram depicts the evolution of the current vNS stack with RoCE fabric NICs to a future stack with general-purpose Ethernet fabric NICs:

The following table compares the current and future vNS stacks with respect to system fabrics:

Preliminary fabric performance measurements with general-purpose Ethernet NICs have shown reasonable results in comparison with RoCE adapters. However, NonStop R&D aims to generalize the vNS stack to support both general-purpose Ethernet and RoCE fabric NICs.  This will allow customers aiming for the lowest possible latencies to consider a vNS solution with RoCE NICs.

A Performance Modeling Library (PML) that closely simulates the latency and software costs of the vNS solution with general-purpose Ethernet NICs was implemented for additional validation. QA stress tests carried out on NonStop servers running the PML did not find any adverse impacts on critical system operations, particularly time-sensitive messages and I/O operations.

Support for Hyperscaler Native Hypervisors

A vNS system has virtual machine (VM) instances that run three different guest OSes:

  • NonStop CPU VMs run the NonStop OS (NSOS).
  • NonStop CLIM VMs run the HPDE (HPE Debian Enablement) Linux OS.
  • The virtual NonStop System Console (NSC) runs the Windows Server OS.

Similar to other Linux OS distributions, Debian can run as a guest OS on many hypervisors. Likewise, the Windows Server OS is also widely supported on many hypervisors. However, the NSOS can currently be deployed as a guest OS only on two hypervisors: VMware ESXi (hypervisor used in the current vNS solution) and Linux KVM (hypervisor used in the HPE Virtualized Converged NS2 system).

Deploying vNS systems on general fleet servers from all major hyperscalers will require running the NSOS as a guest OS on these hypervisors:

  • Hyper-V (Microsoft Azure hypervisor),
  • Google KVM (this is based on KVM but has security hardening enhancements implemented by Google), and
  • AWS Nitro.

NonStop R&D is investigating a generic approach to accelerate support for these hypervisors in lieu of pursuing separate solutions for each hypervisor.

Support for Multi-Cloud vNS Deployment Tools

NonStop R&D has implemented three different environment-specific deployment tools for virtualized NonStop solutions to date:

  • vNS Deployment Tools for VMware,
  • vNS Deployment Tools for OpenStack, and
  • vmconfig (a script-based tool used in the HPE Virtualized Converged NS2 system).

Recently, popular open-source frameworks such as Terraform and Pulumi have emerged to support multi-cloud orchestration software. These frameworks are supported by all major hyperscalers and are also available for on-premises private clouds such as VMware.

In order to explore the feasibility of future multi-cloud vNS deployment tools capable of supporting all major hyperscalers, NonStop R&D successfully carried out PoC vNS deployments with both Terraform and Pulumi. These frameworks were also evaluated to confirm that they support relevant features required for vNS deployments, such as anti-affinity rules to specify VM placements.

Additional Areas of Investigation

Security, support, and billing are important areas that will require additional investigation for a future vNS on public cloud solution.

Security is a known concern for public cloud workloads. The security model for a vNS on public cloud solution will need to consider:

  • Security hardening best practices for the vNS system.
  • Isolation of the cloud resources used by the vNS system including dedicated servers, storage, and networking.
  • Security hardening best practices for external network traffic and system manageability access.
  • Protecting the data.
  • Compliance to industry security standards such as PCI-DSS, GDPR, HIPAA, etc.

Because shared resources such as ToR (Top of Rack) switches are a possibility, the system interconnect traffic will continue to require dedicated VLAN tags. Hardware encryption of the system interconnect traffic is also a future possibility with “smart” Ethernet NICs, but this will require further investigation.

Some possible support model options that will need to be explored further include:

  • HPE supports the vNS system and the cloud vendor supports the underlying infrastructure.
  • HPE supports the entire stack (vNS system and infrastructure).
  • The cloud vendor supports the entire stack.

Another challenge to be investigated is the synchronization of cloud environment maintenance windows to ensure compliance with mission-critical SLAs for NonStop applications.

Possible options to be explored for billing models include separate invoices (one from the cloud vendor and the other from HPE) or one consolidated invoice. A real-time view of consumption data is another challenge that will require further investigation.

Call to Action!

NonStop R&D would love to hear more about your questions and thoughts for vNS in the cloud. A broader understanding of your requirements will help HPE to chart the path for a vNS solution in the cloud.

Authors

  • Marcelo de Azevedo

    Marcelo has been with HPE/HP/Compaq/Tandem since 1997. Marcelo is the R&D architect of several generations of NonStop clustering solutions based on the InfiniBand, RoCE, and ServerNet interconnects. He is also an architect for HPE Virtualized NonStop servers.  Marcelo has a PhD in Computer Engineering from the University of California, Irvine.

  • Jim Hamrick

    I have worked in HPE NonStop development for 25 years. I’ve worked on new prototypes in Tandem Labs, helped develop the connection management protocol for the InfiniBand Trade Association, and most recently I’ve taken a leadership role in planning the cluster communications architecture for future HPE NonStop products.

  • Ken James

    I have been working for NonStop for nearly 22 years as a hardware and software design engineer. I work on adapting the NonStop Operating System to run on new platforms and in new environments, including private and public clouds.

  • Lars Plum

    Lars has worked on the NonStop system since 1993 – the first 5 years in the OSS File System and in the NonStop Kernel since 1998. He worked on bringing up many generations of NonStop systems including NonStop i, NonStop X, and vNS including cloud platforms. He has also been handling many critical customer escalations over of the years.

Be the first to comment

Leave a Reply

Your email address will not be published.


*