My name is Jonathan Ziegler, and I have worked at Gravic for 12 years. I currently manage the Shadowbase software development group, spanning NonStop, Linux, Windows, and a variety of other technologies. Before I jump into discussing our Shadowbase synchronous replication product, I was asked to share a bit about my background.
Interest in Programming
Computers have always interested me. As a child, I was fascinated by them, ever since playing Oregon Trail and Sim Farm in the 90s. From there, I created a personal website in middle school. I have always loved mathematics, and that helped guide me towards programming. My formal technical education began with a couple of high school courses, then progressed to majoring in Computer Science and Business at Lehigh University, which ultimately led to my software development career today.
Satisfaction of Accomplishment
Technical experimenting and learning are challenging, yet fun for me. The best part about programming is the sense of accomplishment from rising to the challenge. It could be finding a gnarly bug (“eureka!”), writing a particularly elegant or efficient function, designing a scalable process architecture, or ensuring fast recovery in the event of failures, among many other things!
Another great, exciting part of working at Gravic is the annual Hackathon that we host. Participants break into small teams based on project interest, and experiment towards achieving a Minimum Viable Product (MVP). Working with other employees from different areas of the company with unique skillsets and backgrounds makes the projects interesting, allows for learning about the challenge from another perspective, and enables the positive side of “hacking” (see https://en.wikipedia.org/wiki/Hackathon for more information). This past year, we even helped bring the hackathon concept to the Connect NonStop TBC event by assisting Connect with the first annual TBC Hackathon!
Zero Data Loss
For the past several years, I have been working on a particularly challenging Shadowbase technology called Zero Data Loss (ZDL). Data loss is one of the main issues whenever an unplanned outage of an IT service occurs. Losing data can cost a company millions of dollars or even cause loss of life, and the more data it loses, the greater the cost. While asynchronous data replication solutions for business continuity provide excellent protection against downtime and data loss after an outage, they cannot guarantee that data will not be lost. So, enhancing asynchronous replication, Shadowbase ZDL uses a patented method of synchronous replication, which eliminates the possibility of data loss caused by a disaster – potentially saving a business a huge expense, preserving lives, upholding its reputation, preventing lawsuits, and satisfying zero data loss regulatory requirements, among other benefits. Below, I included a figure, “HPE Shadowbase Business Continuity Continuum” to highlight the issue of data loss.
For more information, https://www.
The figure maps the potential for data loss (known as the recovery point objective, or RPO) on the vertical axis; moving from lower to higher, the amount of data loss improves (lessens) from a massive amount of data to none. As shown in the figure, it is primarily the data replication technology that improves RPO: asynchronous replication always has the potential to lose data when there is a failure, whereas synchronous replication will not. Therefore, asynchronous replication solutions are lower on the figure than synchronous replication solutions.
The figure also maps the speed of recovery (known as the recovery time objective, or RTO) after a failure on the horizontal axis; moving from left to right, the speed of recovery improves from “out of business” to what is known as the high availability and then the continuous availability quadrants. As shown, it is primarily the data replication architecture that improves RTO: active/passive (A/P) has the worst (slowest recovery) RTO, with sizzling-hot-takeover (SZT) and fully active/active (A/A) architectures with the best (fastest recovery) architectures. Because A/A architectures always have multiple copies of the database and application actively running in geographically separated datacenters, these architectures are called continuously available; the loss of one does not impact the application services provided by the other(s).
Please notice that all of the same architectures can be implemented with asynchronous replication vs synchronous replication, with the corresponding architectures graphed at either the potential-for-data-loss level (asynchronous replication) or the no-potential-for-data-loss level (synchronous replication). It is here, at the top of the figure (no data loss level), where Shadowbase ZDL holds its promise.
The Main Challenge
Shadowbase ZDL is challenging due to the complexity of ensuring that all data changes associated with a particular application source transaction are safe-stored (fully replicated) to a backup system, before the source transaction is allowed to commit. It is not straightforward, particularly when considering all the possible failure scenarios which could arise during this process, and how recovery should be handled in each case. (For example, what to do when the application’s data changes cannot be safe-stored due to a network issue? What if the network is just slow periodically, and how slow is “too slow”?)
In addition, to be practically useful, ZDL must introduce minimal additional system overhead, and minimal additional application latency, (which is the added delay that synchronous replication adds to the source transaction) at source transaction commit time. Furthermore, ZDL must be scalable, able to handle large amounts of data and high transaction rates, while being highly available. All of these requirements create programming challenges and the need for elegant, efficient, scalable, and reliable solutions.
What is exciting is that we are solving these challenges! It is a very rewarding time for the entire Gravic team to be working on such important technology to avoid the significant costs from data loss!