NonStop Clustering Update

NonStop servers have a long tradition in clustering, and HPE is proud to introduce new NonStop X Cluster Solution (NSXCS) features with the L21.06.01 RVU. The new NSXCS features include support for HPE NonStop NS8 servers and new cluster switches based on InfiniBand HDR 200 Gbps technology.

Do you want to learn more about NonStop clustering and its latest features? Well, you came to the right place!

NonStop Clustering: An Overview

What is NonStop clustering, exactly? A single NonStop server is itself a cluster of up to 16 CPUs with a shared-nothing architecture that is integral to the NonStop platform. However, NonStop terminology has historically used the terms node or system to refer to a single NonStop server, whereas cluster refers to high-performance, low-latency solutions that allow clustering multiple NonStop nodes together in a tight interconnection. Thus, all NonStop clustering solutions to date can also be thought of as a “cluster of clusters” (i.e., a cluster hierarchy comprised of NonStop nodes, with each NonStop node also being a cluster).

Scalability is an important aspect of NonStop clustering solutions, and NSXCS allows customers to create a single system image with up to 384 CPUs distributed across up to 24 NonStop NS7 and NS8 nodes. NonStop clustering solutions fully coexist with the Expand network, and they have their own Expand line type. For NSXCS, the Expand line type is called Expand-over-IB.

So, if NonStop clustering is also part of the Expand network architecture, how does it differ from other Expand connectivity options such as Expand-over-IP? NonStop clustering offers an Expand bypass mechanism that allows processes in different nodes to communicate directly and without assistance from Expand line handlers to store and forward messages between the source and destination processes. This bypass mechanism ensures very low latency and CPU processing costs for inter-node messaging, in addition to high inter-node messaging rates.

A Brief History of NonStop Clustering

FOX clustering

Looking back in time, FOX (Fiber Optic Extension) was the genesis of the NonStop clustering solutions. FOX clustering was introduced in 1983 as an extension of the original Tandem architecture based on the redundant InterProcessor Bus (IPB) interconnect. FOX had an Expand line type called Expand-over-FOX, and it was supported in many generations of Tandem NonStop servers with the IPB interconnect as well as on NonStop S-Series servers with the ServerNet interconnect. However, FOX clustering required FOX controllers (or FOX Gateway controllers for NonStop S-Series servers) with embedded firmware, in addition to specialized software drivers to provide its Expand bypass messaging capabilities. Thus, FOX was a controller-based extension of the internal IPB and ServerNet interconnects, as opposed to utilizing the same native interconnect found within the nodes.

ServerNet clustering

The advent of the ServerNet interconnect made it possible to evolve NonStop clustering to a truly native clustering solution without specialized controllers, starting with the introduction of the ServerNet Cluster 6770 product in 2000. Two more generations of ServerNet-based clustering solutions followed: ServerNet Cluster 6780 in 2003 and the BladeCluster Solution (BCS) in 2009. The Expand line type in all generations of ServerNet-based clustering solutions is called Expand-over-ServerNet.

ServerNet is a point-to-point interconnect and supports wormhole routing, and all ServerNet-based clustering solutions leverage these capabilities natively. Processes rely on the same native ServerNet software and hardware protocol layers to communicate, regardless of whether they are running on different CPUs within the same node or in different nodes. Additional ServerNet hardware components are required to connect the nodes natively via the ServerNet interconnect, such as cables and cluster switches (or Advanced Cluster Hubs in the specific case of BCS). Packets between CPUs within the same node or in different nodes are routed based on the destination ServerNet ID (DID) field in the packet header. Wormhole routing ensures that packets are routed to the proper outgoing port of each ServerNet switch as soon as the DID field is received.

InfiniBand clustering

NonStop servers with X86 processors and the InfiniBand interconnect started shipping in 2015, and in that same year, HPE launched a new native InfiniBand clustering solution for HPE NonStop NS7 servers called the NonStop X Cluster Solution (NSXCS). Originally based on InfiniBand FDR 56 Gbps technology, NSXCS uses the same native InfiniBand software and hardware protocol layers for intra- and inter-node communications. The NSXCS Expand line type is called Expand-over-IB. In the L21.06.01 RVU, NSXCS added support for HPE NonStop NS8 servers and new cluster switches based on InfiniBand HDR 200 Gbps technology. NSXCS also supports high-end HPE Virtualized NonStop servers, but the native clustering interconnect in this case is RDMA over Converged Ethernet (RoCE).

NSXCS Advantages

Some of the key advantages of the NonStop X Cluster Solution are:

Enable customers to smoothly meet the needs of rapid application growth and scale-out.
Make it possible for customers to architect application load balancing via front-end and back-end systems for efficiency, as well as other approaches to segregate applications in multiple nodes.
Make it easy to deploy backup systems for high-performance business continuity solutions.
Support InfiniBand to use the fastest and most efficient data transfer method between nodes in the cluster with respect to latency, message transfer rates, and CPU software path length.
Fault-tolerance features include:
- Redundant fabrics.
- Redundant clustering software components that support upgrades without requiring system loads.
- Supports co-existence with all Expand connectivity options for additional redundancy.

Clustering Expand Bypass Mechanism

The diagram below shows the clustering Expand bypass mechanism in more detail. The diagram depicts Expand-over-IB line handlers, but previous clustering Expand line types (Expand-over-FOX and Expand-over-ServerNet) have a similar Expand bypass mechanism. The Expand bypass is shown with red solid lines in the diagram.

Security-checked messages take the path shown with black dashed lines in the diagram, as opposed to the Expand bypass. Examples of security-checked messages include remote file opens, remote file purges, remote process creations, etc. Security-checked messages flow through the Expand-over-IB line handlers as opposed to the Expand bypass. The Expand-over-IB line handler in the destination node interacts with the applicable security subsystem (such as NonStop Standard Security, Safeguard, or XYPRO XYGATE) to determine if the remote security-checked operation has the proper access permissions.

Once a remote open has completed successfully, subsequent write and read operations are carried out with ordinary messages. Ordinary messages flow via the Expand bypass. In NSXCS, the native InfiniBand software and hardware protocol layers that provides intra-node communications is also used for the Expand bypass.

NSXCS Topology Configurations

NSXCS uses InfiniBand cluster switches to connect NS7 and NS8 nodes. A pair of cluster switches (one X fabric and one Y fabric) and the nodes directly connected to those switches is referred to as a zone. The first generation of InfiniBand cluster switches are referred to as FDR cluster switches. These switches support a maximum port speed of 56 Gbps, and they are limited to NS7 only nodes. The new generation of InfiniBand cluster switches introduced with the L21.06.01 RVU are referred to as HDR cluster switches. These switches support a maximum port speed of 200 Gbps. HDR cluster switches are required for clusters with NS8 nodes, and they also support NS7 nodes.

NSXCS topologies can be categorized according to the types of nodes in the cluster as follows:

NS7 only: This can be described as a homogeneous NS7 cluster with FDR cluster switches. This topology supports up to three zones and up to 24 nodes. The supported RVUs depend on the type of NS7 node (namely, L18.08 and later for NS7 X3, L16.05 and later for NS7 X2, and L15.08 and later for NS7 X1).
NS8 only: This can be described as a homogeneous NS8 cluster. It requires HDR cluster switches and L21.06.01 or a later RVU. This topology supports one zone with up to 24 nodes.
NS8 and NS7: This can be described as a heterogeneous cluster with at least one NS8 and one NS7. It requires HDR cluster switches and L21.06.01 or a later RVU. This topology supports one zone with up to 24 nodes.

NSXCS Technology Specifications

The next table summarizes the technical specifications for NSXCS configurations based on FDR and HDR InfiniBand cluster switches.

IB Cluster Switches	NS7 only (FDR)	NS7+NS8 or NS8 only (HDR)
Node to cluster switch connections	FDR Active Optical Cables (AOCs)	EDR AOCs (NS7 nodes) HDR AOCs (NS8 nodes)
Maximum distance between a node and a cluster switch	30 meters (node to switch)
Maximum distance between nodes in the same zone	60 meters (node to switch to node)
Maximum distance between zones	30 meters with AOCs 65 km with link extenders and DWDMs	N/A
Supported zone configurations	1, 2, or 3 zones	1 zone
Maximum number of nodes per zone	24
Maximum number of nodes (total)	24 – connected through 1, 2, or 3 zones	24 – connected through a single zone

NSXCS Configuration Examples

This first set of examples shows possible NSXCS configurations with FDR cluster switches and NS7 nodes. The diagrams show only one of two fabrics for simplicity. Note that NS8 nodes are not allowed in any of these configurations as they use FDR cluster switches.

This next example shows a possible NSXCS configuration with HDR cluster switches and NS8 nodes.

Finally, the next example shows a possible NSXCS configuration with HDR cluster switches and both NS8 and NS7 nodes.

HDR InfiniBand Cluster Switches

The picture below shows the port side view of the new HDR InfiniBand cluster switch. This is physically identical to the root fabric switches and I/O expansion switches used in NS8 nodes, except that it provides connectivity between nodes in an HDR NSXCS cluster. The power supply side view is not shown for simplicity, but it contains 2 hot-swappable power supplies and 6 hot-swappable fans.

The HDR InfiniBand switch has the following specifications:

40 QSFP56 ports.
- These are mechanically similar to QSFP+ ports and supports higher link speeds.
Maximum HDR cluster switch link speeds:
- Ports connected to NS8 nodes: 200 Gbps
- Ports connected to NS7 nodes: 56 Gbps
Actual operating link speed of a switch port is determined by:
- Capability of connected cable.
- Capability of connected port at other end of link.
- Signal quality during link initialization.

Requirements for NSXCS Clusters with NS8 Nodes

Several software and hardware requirements must be observed to implement a homogeneous NS8 cluster or a heterogeneous cluster with both NS8 and NS7 nodes.

Software requirements:

All nodes require L21.06.01 or later.
The above requirement also applies to the CLIM software for all CLIMs, in all nodes.
The firmware revision level of all IB switches must be as specified in the NonStop Firmware Matrices L21.06.01 or later. This includes IB switches in NS7 nodes, IB switches in NS8 nodes, and HDR cluster switches.

Hardware requirements:

If NS8 nodes will be added to an existing cluster with NS7 nodes (with IB FDR cluster switches), then the cluster infrastructure must be upgraded to IB HDR cluster switches before the NS8 nodes can be added.
HDR cluster switches are required.
NS8 nodes are not supported in an NS7 cluster (e.g. connected to FDR infrastructure components).
Cabling requirements to connect nodes to HDR cluster switches:
- NS8 nodes: HDR AOCs
- NS7 nodes: EDR AOCs

Conclusion

NonStop has a rich legacy in system interconnects and clustering, and we’re carrying on that tradition with NSXCS. Hopefully, this article has given you a reasonable understanding of NonStop clustering and the latest NSXCS features. If you would like to learn more, consider watching the TBC21-002 (“NSXCS Technical Update”) session recording available at the NonStop TBC 2021 On-Demand Library .

Author

Marcelo de Azevedo

Marcelo has been with HPE/HP/Compaq/Tandem since 1997. Marcelo is the R&D architect of several generations of NonStop clustering solutions based on the InfiniBand, RoCE, and ServerNet interconnects. He is also an architect for HPE Virtualized NonStop servers. Marcelo has a PhD in Computer Engineering from the University of California, Irvine.

View all posts

The Connection

A Journal for the HPE NonStop Business Technology Community