Why data analytics?
Twenty years ago, data analytics was a field reserved for geeks and academics, something discussed during breaks in lectures on chaos theory. The idea was to take a very random system, with lots of seemingly disparate data, and be able to tie that tangled web of data together so that meaningful outcomes could be predicted. The most obvious example is weather forecasting, which involves analyzing all the earth’s atmospheric randomness in terms of ever-changing temperature, moisture, and wind, somehow using that data to tell people to bring an umbrella to that 1 pm picnic a week from Saturday.
Data analysis can spot terrorist attacks during the planning phase before the attacks are carried out. Prior to the 9/11 terrorist attacks in New York and Washington, U.S. government officials had all the information they needed to spot and prevent these attacks. Proactive data analysis could have spotted an issue just from phone records, credit card usage, and a sudden increase in flight lessons by foreigners. But nobody connected the dots.
Data analysis can tell farmers when, where, and what to plant. Companies can learn exactly what their customers are looking for. Auto insurance companies can find out who are the lowest risks. Health care companies can analyze a patient’s medications, genealogy, diet, lifestyle, location, etc., to predict adverse health outcomes. Data analytics can often predict illness long before your doctor.
Security monitoring on HPE NonStop and beyond
It has always been possible to monitor security on a NonStop server. In ancient times (mid-1980-s and earlier), one could theoretically monitor $CMON logs, for example, to see logon attempts, program executions, changes in process priority, and a few other types of system events. But nobody really ever bothered much with $CMON logs. Likewise, application programs would log error messages, and there were event collectors with names like $0 and $AOPR, which could collect and report out system trouble.
Then came Safeguard, which provided simple audit log files. Customers learned to write ENFORM programs to produce reports from the Safeguard audit data. Just as people became used to that, Safeguard was updated (version C22) to provide more complex auditing, just complex enough to render all that ENFORM work obsolete. Vendors were now selling products with names like AuditView (CSP) and Safeguard Reports Plus (Baker Street Software). These products hid the esoteric details of Safeguard audit records from customers.
Figure 1: NonStop Data Analytics: Many Raw Data Sources, Many Data Formats
Government regulations required long-term storage of data and real-time data monitoring, so more products appeared providing security alarms and alerts, as well as the ability to offload audit log data from the NonStop to SIEM (Security Information and Event Management) systems. These SIEM platforms have significantly improved over the years so that now the data is fully indexed and available via relational database searches. These systems (SPLUNK, RSA, QRadar, etc.) are much more than off-platform SQL machines: they can be taught to recognize and parse out data fields, they can react to data events by issuing their own alerts, they can produce graphs and charts and dashboards.
Best of all, a SIEM can help you spot trends, problems, and even ongoing security breaches. Unlike those old Enform reports, a SIEM can combine and utilize data from multiple sources on the NonStop, as well as multiple platforms. Instead of seeing that user Lee logged on yesterday at 1:14 pm and opened a few files and then logged off, you can see that user Lee did all of that and also interacted with a web application and ran a privileged program in OSS. You can see his activity in the NonStop SSH logs, Webserver logs, Safeguard logs, keystroke logs, as well as other sources which you choose to stream to a SIEM system. Maybe Lee uses an HPUX system or an IBM mainframe; events from these multiple sources can be analyzed together. The data from various sources both on and off the Nonstop are breadcrumbs that tell a story when analyzed in aggregate.
Many NonStop customers don’t send data to a SIEM at all. Or maybe they send data but never look at it. This may allow them to tick a few audit check boxes, and let them tell their auditors the good news that “We’re sending data to a SIEM”. Often these customers invest in a full-time person who deals with complex third-party software on the NonStop, making sure it is correctly configured and running and moving audit log data to a SIEM or maybe just to a NonStop database.
This person earns her keep to make sure that the archiving/alerting software is working correctly. Sometimes that’s about as far as it goes. Perhaps local alerts like EMS messages and email are generated, and maybe the data is sent to a SIEM, but, often, there it sits. Nobody is looking for the breadcrumbs. It’s like putting lots of great books on your bookshelf but never reading them: the knowledge is there, but nobody’s consuming it.
If we are willing to invest in the software to move audit log data out to a SIEM, then surely we shouldn’t hesitate to invest in the absent data analytics piece. Data analytics, in fact, really provides the greatest bang for the buck. This is where you can finally obtain a real return on your overall data security investment. Data analytics provides the capability of viewing NonStop activity within the actual context of that activity. Previously you might see from a Safeguard report that user Terry logged on to a production system at 8:47 am. Now you can see why Terry logged on, what Terry was doing (sorry Terry).
A data analytics solution for NonStop
Comforte is primarily a provider of security solutions, and we see the convergence of big data analytics and data security. Comforte’s turnkey NonStop data analytics solution encompasses three areas: data collection, data streaming, and data analysis.
Various fault-tolerant log reader processes ingest NonStop data from multiple log sources, like SSH, EMS, Safeguard, web logs, etc. Features of the product architecture help ensure that the data collection keeps pace with fast-moving auditing and that it can pick up where it left off in case of disruption. The collected data is normalized, if possible, and then written to a collector process. The collector streams the data to the SIEM via a TCP (or UDP) syslog connection.
Comforte’s solution includes an app specifically for Splunk, and future releases may provide similar apps for other SIEM systems. The Splunk app provides more normalization of NonStop-specific data. It includes NonStop data definitions, field discovery, and various NonStop-specific dashboards, dashboard panels, and built-in alerts. This is all completely customizable by the customer but is also very useable right out of the box for immediately displaying charts, graphs, and other visualizations of NonStop data. Just like that, you’re there!
Each of the dashboard graphs and charts is more than just a pretty face: You can, for example, click on a slice of a pie chart showing alias IDs that have logged on as SUPER.SUPER (or any ID), and you’ll see search results for each of those logons. You can narrow the timeframe with a few more clicks and focus on just the events you really want to look at. There you can see low-level details of those events, or take a look at other log records showing what that user was doing at that time.
Figure 2: Dashboard Example – Monitoring Logon Attempts
Customizable timeframes afford access to both historical and real-time data, making the product extremely useful both for detecting security events as they occur, as well as forensic analysis after the fact. Drilldown capabilities provide the ability to analyze a large amount of data and narrow the focus repeatedly until the field-level details of the actual events of interest are displayed.
Combined data collection, data streaming, and data analytics
Perhaps you want to look at events from multiple sources to see what a particular user was doing during a certain time period. You can bring in data from multiple log sources to see details of that user’s overall activity during that time range. Really, the possibilities are limited only by the data. This is why comforte has combined data collection, data streaming, and data analytics into a single solution.
One of the keys is that the data is sent off-platform to an industry-standard SIEM. We don’t use NonStop SQL but instead stream the data entirely off the NonStop server. This simple action accomplishes several important goals:
- The data is safely archived for long-term storage;
- The data is available for analysis with full-featured search capabilities, including data discovery;
- The data analytics search engine is an industry-standard product that can include data from all the platforms in your enterprise; and
- The data analytics processing is not going to utilize any NonStop CPU cycles.
One important general note about data collection and streaming: a solution should collect and stream, not interpret or decide that it knows anything about the context of the data at this point. Data analytics is best done at the SIEM, not on the NonStop while the data is being collected. It is OK to normalize fields in a particular record, but the collection software should not attempt to analyze/interpret/aggregate clusters of data records at this stage.
Know your data and your risk profile
Before committing to this type of machine data analysis for a NonStop environment, it is important to examine your organization’s priorities, including standard risk analysis. What kinds of threats are you worried about? These threats may include actual data breaches – accidental or intentional — or maybe they are not security breaches at all, but simply issues related to your systems’ performance. How serious are these issues and threats? Are there any existential threats?
Management commitment is a prerequisite here, not just to acquire the right software but also to hire and train people who can leverage all the potential of NonStop data analytics. It isn’t good enough just to stream your log data to Splunk or another SIEM: as mentioned earlier, you can’t just buy the books; you need to read them as well. Once your NonStop data is safely archived on a SIEM, the massive search capabilities of these SIEMs needs to be exploited to the fullest extent possible so that the data is always at your fingertips and so that real-time trends and events can be spotted at the earliest possible moment.
Obviously, you want to detect an issue before it becomes serious. At the time of writing this article, government officials investigating the January 6th storming of the U.S. Capitol are asking the FBI and other intelligence organizations why data analysis didn’t forecast the attack in the days leading up to January 6.
Like 9/11, January 6 was a physical attack. Exactly how does this relate to the topic of protecting NonStop systems from hackers and rogue employees?
9/11 was an attack from outside the United States, and 1/6 was an attack from within. Likewise, data attacks against a NonStop system emanate from both outside and inside your organization. The 9/11 and 1/6 attacks were preventable. In both cases, proper use of data collection and data analytics would have pointed to a strong probability of the actual attacks, allowing law enforcement authorities to prevent the attacks from ever happening. Instead, the data was analyzed post mortem, but by then, the damage had been done.
There is a very strong parallel between these physical attacks and the case of a digital attack on your organization. The biggest difference is that it may be hard to spot a digital attack until it is underway. In that case, it is the real-time analysis of live, streaming data, which will detect an intruder in time to take protective action. On the other hand, maybe a rogue employee prepares for a future attack by adding a Safeguard alias for SUPER.SUPER so that it can be exploited in the future. Perhaps an external attack is coming in from a web page, and the clues are there in the web server logs. Maybe a rogue developer replaces a Pathway server for no apparent reason. These are our breadcrumbs.
The physical attacks mentioned required airplanes (for 9/11) and pepper spray and hockey sticks (for 1/6). The digital attacks require valid user credentials to access the NonStop. In all cases, the clues are there in the data.
The world of data analytics separates the hero who prevents the attack from the uninitiated data security manager who has to explain why the breach was not detected in time.
Know your data!