AI and z/OS Performance and Capacity Analysis: 2018 Predictions

Brent PhillipsBy Brent Phillips2018 Predictions on AI and z/OS Performance and Capacity Analysis

2018 is gearing up to be a watershed year for z/OS performance and capacity professionals.

Industry analysts have been talking for some years now about Artificial Intelligence (AI) and the role it will play in our work. But what that truly means, and its value in day-to-day operations has not yet been understood or realized by most professionals in this field.

There are many different types of AI, but not all are useful in making the computer do the kind of infrastructure performance and availability health assessment work that is no longer feasible for human analysts to proactively do every day. But when properly designed and deployed, it has proven very effective to implement automated, AI-driven decision making about what all the data means for identifying current or near-term performance problems and their root-causes.

The reason the computer can be more effective at this is that it is far more efficient than humans at continuously assessing how the application workloads are complying with hundreds or thousands of the most common issues that cause service disruptions on the specific infrastructure components running the workloads.

Answering for example, what z/OS best practices are indicating performance risk or problems, or what z/OS components are nearing saturation or have lost redundancy or are being used inefficiently? This automated application of domain-specific expert knowledge enables the human analysts to focus on the most important issues and root-causes that are, or will, affect the required application service levels.

2018 Predictions: AI and the IT Infrastructure

Applying AI techniques to the IT infrastructure operations data has been proven effective already for quite some time, at least in IntelliMagic solutions. Based on our experience in the market, as well as comments in the press and by industry analysts, we expect 2018 to be a watershed year in terms of mainstream recognition of the benefits of using AI to operate the IT infrastructure for optimal service levels.

In the bigger picture, this modernized, AI-driven analytics approach addresses issues such as:

Closing the Performance and Capacity Skills Gap

Many organizations are hiring new staff to complement their deep z/OS performance and capacity planning experts that are due to retire in the coming years. Yet the skills required take years to develop, and in the meantime, the team must deliver continuous availability for the production applications. Solutions with deep, platform-specific expert-knowledge that are accessible by the algorithms facilitate faster learning about what is important, as well as showing what the sometimes-obtuse root-causes of the more easily visible performance problem symptoms are.

Augmenting Human RMF/SMF Data Analysis with Artificial Intelligence

Manual, proactive analysis of vast amounts of performance metrics is not effective or feasible with the complexity and scope of the infrastructure using the limited human resources most teams have today. Instead, teams typically dig into the data only after performance issues arise. Inviting “AI to the team” enables the entire team to be more productive, more quickly.

Predictive & Preventative Performance Intelligence

Organizations do not need more reports; they already have more than enough for their staff to look at. What they need is refined intelligence about what is important in all the data and what it means for the performance and capacity and efficiency of the infrastructure. Responding quickly to application availability disruptions is fast becoming too expensive and unreliable, and even real-time monitors are too late to avoid the production problem.

The need for proactively predicting and preventing service disruptions will soon become a fundamental requirement for all organizations – not just the largest financial institutions. Only AI technology that utilizes platform-specific expert domain knowledge can provide effective predictive capabilities with minimal false positives (alerting about unimportant issues) and without false negatives (missing the important problems).

Reducing Costs without Impacting Performance

Finding ways to reduce ever-rising costs has always been a priority for organizations, but not at the expense of performance and availability. AI-driven analysis can automatically and continuously assess whether common inefficiencies have arisen in the dynamic infrastructure operation.

Keeping up with Modern Technologies

The z/OS infrastructure continues to add to its already rich source of metrics with additional metrics about new technologies such as Pervasive Encryption, data compression, and other features.  Properly analyzing these new data sources using antiquated reporting techniques and products requires custom coding and manual interpretation. Consequently, many sites today have significant gaps in visibility into the metrics required to support newer technologies.

Better intelligence means processing and assessing all of these new data types. Representing that information in an easy to understand manner that is flexible and interactive eliminates the need to invest resources to learn, develop, and maintain one’s own custom reports to understand and manage these new infrastructure components.

Moving Ahead with AIOps

The integration of artificial intelligence with IT operations analysis is now being referred to by some in the market as “AIOps”. 2018 is likely to see the emergence of AIOps on a much larger scale than in previous years because it provides a breakthrough in productivity and effectiveness at a time when human analysts are coming under increased loads due to past reductions in staff while the workload and infrastructure complexity is growing.

View our recorded webinar, 2018 RMF/SMF Analytics – Status & Predictions, where we discussed many of these topics in greater detail and demonstrated ways that you can implement your own modern strategies to current problems you may be facing.

 

2018 RMF/SMF Analytics - Status & Predictions

2 thoughts on “AI and z/OS Performance and Capacity Analysis: 2018 Predictions”

  1. Ron says:

    I gather this/reporting and so on…is after SMF/RMF is read into system…NO real time monitoring?

    1. thanks for the question Ron. Typically the interval for RMF data is 15 minutes. We can load and process near real time (every few intervals usually) so it is fresh. Even more important though, with the continuous automatic assessment of all the metrics against the machine readable z/OS expert knowledge that includes root cause monitoring, not just symptoms, you can get accurate predictive visibility that is sooner than the real time fire fighting information people are usually after.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.