The Frigg Project – Open Observability for Nonstop

Exploring open-source observability for Nonstop.

Norse Goddess Frig

Norse Goddess Frig

What is Frigg?

Frigg was a Norse Goddess, wife of Odin, who was said to know the future. She, however, rarely told it to anyone. This has been the view of observability on Nonstop. The data is there but it is viewed as “hard” to externalize to enterprise- wide observability stacks. The Frigg project shows that Nonstop can integrate easily with observability software stacks such as Grafana, ELK, SigNoz, and many others.

Background

As part of a complete lifecycle management methodology, most enterprise IT deployments have standard processes and systems that deliver into a development and operations processes known as “DevOps”, or sometimes including security, as “DevSecOps”. Most often, the Dev side includes an Agile approach that embraces a loop for development change and deploy.  And the “Ops” side emphasizes the deploy of services into the enterprise. The aspect of “Ops” that has been somewhat overlooked has been the collective observation of the environment.

Over the past decades, there have been attempts to deliver a dashboard or “single pane” approach to enterprise observability. Historically, the tools tended to be derived from network monitoring tools (ca: 1990-2010) that added various hooks and exits for customization. Over the past 10+ years, this perspective has changed, and the “Ops” part of DevOps has embraced an enterprise-level monitoring, or “observability component of the DevOps paradigm.

 

Fortunately, the open-source community has attacked this area with a plethora of tools for storing, searching, and visualizing operational data. Unfortunately, the number options are overwhelming. But then, fortunately again the APIs and interfaces have become well-defined and are now a small set of choices for a very robust delivery of information into these enterprise observability software stacks. In other words, it is easier than ever to deliver enterprise observability for all systems including and especially for Nonstop!

What is “Observability”?

Observability is a term used to describe the function of “observing” information about the IT systems in an enterprise. As part of the “DevOps” processes, it is the operational discovery and influence on operations change.  When this term is used in the context of this article, we are referring to the entirety of the operational information that can be derived and collected from all systems in the enterprise.

Key Concepts

  • Nonstop is an enterprise element
  • Observability is an enterprise need
  • Nonstop integrates with enterprise observability stacks

Introduction to the technology

Because there is a historical and a diverse set of requirements for interfaces into and around the tools that expose observational data, there are dozens (hundreds?) of options for the collection, searching and visualization of this sort of data. The key need is for there to be just a few common ways for all platform systems to be able to deliver a common way to ingest and present the data.

Fortunately, a few tools and interfaces have become common and can be used as de facto standards.

The Operational Process

Our team at HPE have come up with a brief description of what it is to become “observable”. While this is our own interpretation, it is intended to align directly with the industry with regards to how to participate and how to best leverage as many possible combinations of tools as possible. The choice of the tools is user dependent. This general perspective is universal.

1) Produce

Data on systems is produced by application software, operating systems, subsystems, remote activity, and other various sources. In general, there are three types of useful data for operational visibility:

Metric data: As implied, this data it metered type information such as counters gauges and histograms.

Trace data: Data that represents a trace of processing. Examples include both internal trace data (e.g. procedure and functions inside of a program), and external trace data (i.e. multiple services that combine into one logical traceable activity – e.g. debit + credit = a single transfer trace)

Log data: Generic or specific time -sequenced information of any kind. Examples are system event data (like EMS), or security events, etc.

2) Capture

The data then needs to be captured in a consistent way. Capture is often overlooked as the most critical piece of the puzzle. Deciding on the capture (and distribution) tool is possibly the most important decision to be made. Fortunately, the industry is finally settling on a common capture tool and API called OpenTelemetry (one word), although the concept is these two words, “open” and “telemetry” representing the open access to data as being telemetric measuring and collecting of operational data.

3) Visualize

The flashy part is the visualization. This is what most people think of with regards to observability. There are tools for selecting, searching, aggregating, filtering, highlighting, and visualizing all the above data types. These tools are very powerful and can be integrated into AI systems and can be presented as dashboards for the entire enterprise with drill down and drill-in capabilities for metrics, trace, and log data.

friggPicture4

The Design and Capabilities

A key goal of this project is to provide the most flexibility and the most open access to the widest possible choices for all three steps of the processes. For produce, we chose to deliver with standard capabilities of the Nonstop platform itself; minimizing any new specialty system software or user application intervention. For Capture, we chose OTEL collector (described below) but also tested and leveraged other alternatives like logstash. For Visualize (and distribution) we continually rearchitect and deploy different tools. The flexibility in the visualization suite of options is impressive. We have literally used dozens of different configurations and visualization tools.

Frigg Project Goals

  • Demonstrate Nonstop open ops
  • Minimize/eliminate Nonstop server customization
  • Integrate with existing Nonstop
  • Mixed system integration
  • Leverage open-source to minimize cost
  • Integrate into a dev/ops lifecycle management process (future)

The Collector

The current design leverages a single collector. This is the recommended approach and is representative of most large enterprise designs. We chose to use OpenTelemetry Collector (OTEL Collector) for several reasons:

  • OpenTelemetry is the most common and most well-defined open API and collector design in the industry.
  • The current CNCF guide embraces OTEL as a full observability stack and the collector component seems to be emerging as the recommended collector (and API) standard.
    (see: CNCF Landscape in references)
  • OTEL integrates with nearly everything we have considered. It allows use of specialized visualization data structures and visualization tools independent of the OTEL first-level OTEL API.
  • The OTEL libraries for Java and for Python work for Nonstop today and some future library availability will enable TS/MP services to be instrumented for trace data observation.

The collector is the critical piece of the design. But the choice of which one to use is not as critical.  In the Frigg project design, it is designated to be the single place to send all information from any of the systems. Our demonstration includes up to 10 different servers, but an enterprise would likely have far more than that, and when you add in the number of containers executing within the enterprise, the collector becomes the single best vault to monitor, research, and document the enterprise execution state. This powerful capability is one of the key reasons to leverage an observability stack of any kind. One final note about the collector. While we chose to use OTEL collector in this design, it could be that we would use alternatives and combinations. The impact of changing the collector is minimal to any of the systems that send to it.

frigg-Picture2

Aggregation, Indexing, Distribution and Visualization

It is beyond the scope of this article to describe each of the components that we leverage. We encourage users to embrace the standards of their organization. The Frigg project attempts to demonstrate as many possible tools as possible without regard to what would normally be used in an enterprise. In the normal use case a single, or a few different visualization tools would be used in combination with a collector and possibly additional tools for aggregation and indexing (e.g. Prometheus, and such). The most important consideration for the project is flexibility of choice.

Frigg currently demonstrates use of Prometheus for metrics data, Grafana and SigNoz are used for dashboard presentation. But we have used other tools and leverage different software stacks depending on the resources available and which stacks seem to be desirable.

The team welcomes interaction with Nonstop users to better understand the common needs of our enterprise users. We intend to add more tools and dashboards based upon user requests and availability of the various software stacks. Further, we encourage NonStop users to see the demonstration in person. A picture is worth —  well — you know…

frigg-Picture3

Conclusion

The Frigg team was surprised at how quickly, easily, and flexibly, the observability ecosystem was deployed. For log data, there was no development required. For much of the trace data, there was minimal software effort required, and for metrics, some of the data was readily available and send to this ecosystem via scripts.  The goals of the project were met and continue to be enriched as we learn more and follow this explosive area of IT technology. A key take-away is that it takes very little work to instrument Nonstop to use these observability resources.

For more information

Frigg is being presented and demonstrated at regional Connect user meetings.  We welcome interaction and input on where to take the project next. Please feel free to contact any of the team members with questions and further needs in the observability space.

The Frigg Project Team

  • Keith Moore – NonstopTalker@hpe.com
  • Robert Martin – Martin@hpe.com
  • Rodney James – Rodney.James@hpe.com
  • Tom Miller – Tom.Joe.Miller@hpe.com

Further Resources

Author

  • Keith is a Distinguished Technologist and genuinely curious fellow at HPE and is a senior member of the Association of Computing Machinery (ACM). Keth is an ISC2 CISSP emeritus member and has extensive experience with real-time, “always-on” architectures, security and software design for enterprise usage. He works with HPE Labs and product development on integration techniques for cloud-enablement technologies and is an annual featured speaker at HPE’s Technology Forum and local User Group meetings. Keith spends spare time working as a volunteer with universities and other groups sharing computer design and computer history. Keith joined HPE in 1987 and has been in the IT industry over 45 years.

    View all posts
NonStop TBC 2025, the Woodlands, Texas

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.