z/OS Performance Monitors – Why Real-Time is Too Late

By Morgan Oatsperformance monitor

Real-time z/OS performance monitors are often advertised as the top tier of performance management. Real-time monitoring means just that: system and storage administrators can view performance data and/or alerts indicating service disruptions continuously as they happen.

In theory, this enables administrators to quickly fix the problem. For some companies, service disruptions may not be too serious if they are resolved quickly enough. Even though those disruptions could be costing them a lot more than they think, they believe a real-time monitor is the best they can do to meet their business needs.

For other companies, optimal z/OS performance is essential for day-to-day operations: banks with billions of transactions per day, global retailers, especially on Black Friday or Cyber Monday, government agencies and insurance companies that need to support millions of customers at any given time, transportation companies with 24/7 online delivery tracking; the list goes on and on.

For these organizations and many others, real-time performance information is in fact, too late. They need information that enables them to prevent disruptions – not simply tell them when something is already broken.

The Critical Nature of Availability in Mainframes

Mainframes are the backbone of the application infrastructure in many leading enterprises because they offer better security, scalability, reliability, availability, serviceability, and compatibility.

Companies that count on mainframe technologies for their core business needs every hour of every day includes:

For these organizations, a minute of downtime can cost more than $5,000, and if a disruption occurs on a transaction-critical day, the result can be millions of lost revenue on top of damaged credibility and future revenue losses.

A real-time monitor alert to an already occurring service disruption does not provide the preventative information critical to these organizations. It is the equivalent of an alert in your car that you’ve just been hit while driving on a busy highway.

Wouldn’t it be better if your car could alert you that your current trajectory will lead to a crash if you don’t correct your course, and suggests that you slow down by 10mph and move one lane over to stay safe? That’s meaningful intelligence.

Transforming RMF & SMF Data into Predictive Intelligence

z/OS generates volumes of rich measurement data via SMF and RMF. Prediction and prevention requires information to be created from all that raw data, and acting on that information is crucial to maintaining availability.

Creating intelligence out of that massive amount of data is near-impossible using traditional approaches with static reports and manual interpretation. But by automating the process to automatically and continuously process, correlate, assess, and rate the data based on specific workloads within your environment and hardware hardware best practices, this raw data transforms into actionable intelligence.

With this intelligence, reducing availability incidents is no longer a matter of resolving them with a workaround after the fact, but preventing them before they can occur.

The challenge is interpreting the data in an efficient and meaningful way. Existing performance reporting solutions that have been in use for years are unable to create this kind of intelligence from the data. Often, the preventative information is well hidden and only understandable to an absolute expert doing a thorough analysis.

A performance expert with 20 years of experience may know where to look when problems occur and is usually pretty adept at optimizing the environment to reduce incidents. However, without broad and deep visibility and intelligence, they are oftentimes working on a reactive basis.

In fact, since there are often too many problems to work on, there is no time for proactive analysis, and there is no time to or inclination to implement the analytics into the IT management process.

Furthermore, continuously analyzing the relevant details of every subsystem quickly becomes overwhelming. Not to mention the scarcity of deep experts with that kind of skill.

Preventing z/OS Performance Disruptions Before They Occur

Modernized solutions are using artificial intelligence to automate the tedious, time-consuming, data processing into the software, allowing the expert analysts to make the right decisions based on this new intelligence.

As Forrester research put it: The [monitoring] tools present us with the raw data, and lots of it, but sufficient insight into the actual meaning buried in all that data is still remarkably scarce”. To unlock the hidden predictive and preventative value, the data should not be treated as unstructured ‘big data’, but interpreted using detailed knowledge of the IT architecture and the meaning of each metric. This is where embedding the expert knowledge in the software sets itself apart.

Once we realize that incidents can be easily prevented, the myth that real-time monitors are the ‘top-tier’ of z/OS performance management is debunked. The value of this predictive intelligence is the ability to avoid service incidents, enabling your z/OS experts to implement new features and improve support for new applications, reduce costs, and preserve the reliability and availability that mainframes are known for.

 

2018 RMF/SMF Analytics - Status & Predictions

Leave a Reply