Managing NonStop applications is very cost-effective

Depositphotos_169155574_S.jpg

In times when the cost of systems is decreasing, and the cost of labor is increasing, it may be worthwhile to examine how many people are needed to run a particular application.

In this article, I compare the system management effort to run a payments application on a high-availability cluster of Linux servers for one company with that of the same application by another company on a fault-tolerant HPE NonStop server. Both applications are configured to handle about the same transaction load: about 1,000 transactions per second.

The applications were implemented with a remote disaster-recovery site.

While the two companies used the same application software, but different hardware and database software, there was a considerable difference in the number of staff required to keep the systems up and running. While NonStop implementation required 9 FTEs to run, the alternative required over 50 people to manage the same application on different system- and database software.

The next sections show the details of the two implementations and discuss how the NonStop system architecture represents a cluster as a single system and therefore simplifies systems management. in this example: a factor 5 reduction in system management was observed.

Application Overview

Linux application overview

The Linux application is implemented on a vSphere ESXi cluster. A copy of the cluster with software and storage is implemented in a backup site. The backup site is kept up to date using replication software.

One cluster includes

  • The application cluster of 9 VMs of each 8 vCPUs hosted on 4 DL 380 G10 servers with 2 sockets with 22 cores.
  • The 3-node database server hosted 3 DL380 G10 servers with 2 sockets with 8 cores. The database is Oracle RAC.
  • For storage, a 4 node 3PAR 8000 Series system is used.

NonStop Application Overview

The NonStop version of the application is implemented on a HPE four processor 2 core (4p2c) NonStop-X system with local storage. A NonStop-X system uses X86 processor technology with standard HPE servers. A backup configuration is implemented in a backup site. The backup site is kept up to date using replication software.

It is important to understand that in the NonStop system architecture, a processor is in fact a separate compute module that is part of a cluster of multiple NonStop processors and interface modules that together form a system. A NonStop NSX system is built by HPE and shipped as an integrated system to the customers[1].

With this understanding, we can describe the NonStop implementation in common terms

  • 4 NonStop processor modules hosted on 4 DL380 servers, each with two cores activated. When needed, up to 6 cores can be activated to accommodate additional (peak) loads.
  • 4 cluster interface modules (CLIMs), of which two manage communication to the external network and two manage the JBOD storage system. Note that the 4 CLIMs provide fault-tolerance for storage and communication. CLIMs are DL380 servers with 8 cores that run a hardened version of Linux.
  • The 8 nodes in the cluster are connected via a dual high-speed, low-latency InfiniBand network.

Hardware requirements

Both applications require a high availability set-up. For the Linux application, the application servers and the database servers are configured as high-availability clusters. The NonStop configuration is a fault-tolerant system that combines application and database.

The two tables below show how many physical servers (“boxes”), logical servers (VMs) and cores are required to run the same peak load of 1,000 business transactions per second.

Linux application on one site Physical servers Logical servers Number of cores
Linux application 4 9 176
Linux DB server 3 3 48
3Par Storage 4 4
Totals 11 16 224

Table 1: Hardware requirements of Linux implementation

 

NonStop application on one site Physical servers Logical servers Number of cores
NonStop application and database 4 8

(Dynamically expandable to

4*6 = 24)

CLIMs 4 32
Storage
Totals 8 1 40

Table 2: Hardware requirements of NonStop implementation

The Linux implementation uses a 3PAR storage array that holds an Oracle database, which is accessed by a 3-node RAC cluster. The NonStop database is managed by the integrated NonStop SQL database and shares the same hardware with the application. This is the reason why the hardware requirements for NonStop are less than that of Linux – Oracle – 3Par.

System Management Requirements

Managing clusters of systems requires more effort than managing a single system, and the bigger a cluster gets, the more time may be spent on system maintenance. Every (virtual) server in the cluster runs its own copy of the Operating System (OS) and application software plus configuration files. There are ways to automate cluster management, but getting this right is not a trivial task.

The best way to manage a cluster, however, is to make it look like a single system to the system manager, and that is what happens with the NonStop implementation.

  • The NonStop OS runs from the same binary (disk) copy on all NonStop processors in the system, which in this use-case means on all four nodes that run the OS. When the system starts, the OS is loaded from the system disk and copied into the other nodes of the cluster, where they are started and initialized in their environments. They then coordinate the process information from their processes with the global information that contains all information of the system.
  • Even though the CLIMs run a Linux OS, they are managed from the NonStop environment, using the NonStop OS toolset and scripts. For example, IO performance is measured and reported by the same (MEASURE) tool that reports on other system resource consumption.
  • Application availability is ensured by the NonStop OS which automatically lets backup processes take over the workload from a failed component (be it a process or a NonStop processor / node), and this is possible because all processes in a system have access to data, including configuration files, that are stored anywhere in the NonStop cluster.
  • Host-based disk mirroring protects the system against hardware issues with the storage devices.
  • A single, cluster-wide transaction monitor (called TMF) is the resource manager that guarantees transaction integrity across all data volumes and provides backup and restores transaction logs.

Management benefits of NonStop implementation

A NonStop system administrator considers a 4-node NonStop system as a single node, and operating system maintenance needs to be performed only once, even though in this example, 8 servers are updated to upgrade or patch one system. This needs to be compared to an upgrade of 16 virtual machines in the case of the Linux application, and these VMs are of different types (application server image, database server image, or 3Par storage image).

In a production environment, a second installation, for business resilience or disaster recovery is also a requirement and this doubles the number of maintenance tasks. The use of one or more test systems adds additional system management tasks.

Required number of FTEs

How much more effort is required becomes visible when the applications and personnel requirements are compared.

The NonStop environment, a primary site, a DR site, and one smaller test system required 9 FTE, 4 FTE for system management, and 5 FTE for application management. The company that ran the application on Linux, however, required 50+ FTE to manage systems and applications. That is 5 times more personnel costs than running a NonStop-based solution. It should be noted that the Linux test environment also included a backup site. Linux system management should include VMWare management.

Hardware requirements

We have seen how many people are needed to manage these applications. However, if you look at the hardware requirements you will see that the CPU power, measured in CPU cores, to run both applications is significantly lower in the case of NonStop, which totals 40 cores for one (primary) production cluster while the Linux implementation requires 224 for application and database server only, not counting the 3PAR storage system.

Conclusion

Technology improvements lead to lower costs of hardware, while the cost of personnel is rising and, therefore, becoming a larger part of the Total Cost of Ownership (TCO). The examples in this article show that significant savings in TCO can be achieved by using a system architecture that is designed to provide availability, scalability, and data integrity from the ground up. Cloud-native applications run clustered applications; however, bear in mind that each node in those clusters requires looking after.

[1] It is also possible to build a virtualized NonStop system as a VMWare vSphere ESXi cluster. Such a cluster uses an interconnect based on RoCE (Remote Direct Memory access over Converged Ethernet) and VMWare to implement virtual machines that run the NonStop and Linux OS images.

Author

  • Frans Jongma

    Frans Jongma is a Master Technologist for the NonStop Advanced Technology Center (ATC) and is based The Netherlands. Frans has worked in several consulting positions for the NonStop Enterprise Division since 1989. His main areas of expertise are: NonStop SQL application design, performance analysis and high availability. Prior to joining Tandem, Frans has worked on the design and implementation of database management systems and developer productivity tools for UNIX and proprietary systems. As part of the ATC Frans has been involved in proving the concept of SQL/MX as a Service which resulted in the SQL/MX DBS product and an HTML-based prototype of a web-based user interface to SQL/MX DBS. This HTML version is the basis of WebDBS, the GUI interface for administrators and users of NonStop SQL/MX DBS.

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.