The Roots and Evolution of the RMF and SMF for Mainframe Performance Data (Part 2)

George DodsonBy George Dodson

This is part 2 of this blog. If you haven’t read the first section, you can read that here.

After being announced as a product in 1974, RMF was further expanded to provide more capabilities such as RMF Monitor 2 and RMF Monitor 3. These provided real time insight into the internal workings of z/OS to help understand and manage the performance of the z/OS infrastructure. The value of the RMF performance measurement data has been proven over the decades as it, or a compatible product from BMC named CMF, is used in every mainframe shop today. Many new record types have been added in recent years as the z/OS infrastructure capabilities continue to evolve.

A related product – Systems Management Facility or SMF – was originally created to provide resource usage information for chargeback purposes. SMF captured application usage statistics, but was not always able to capture the entire associated system overhead. Eventually, SMF and RMF were expanded to capture detailed statistics about all parts of the mainframe workloads and infrastructure operation, including details about third party vendor devices such as storage arrays. RMF and SMF now generate what is likely the most robust and detailed performance and configuration data of any commercial computing environment in the data center.

As the data sources to report on the performance of the workloads and the computer infrastructure grew, different performance tools were created to display and analyze the data. The information in the data was very complex and the total amount of data captured is overwhelming, creating challenges to identify performance problems. Typically, this requires analysts who have extensive insight into the specific infrastructure areas being analyzed, and an understanding of how they respond to different applications workloads. As applications have grown more complex, more real-time, with more platforms and components involved, the performance analysis task also has become more difficult.

Continue reading

The Roots and Evolution of the RMF and SMF for Mainframe Performance Data (Part 1)

George Dodson

By George Dodson

This blog originally appeared as an article in Enterprise Executive.

Computer professionals have been interested in determining how to make computer applications run faster and determine the causes of slow running applications for more than 50 years. In the early days, computer performance was in some ways easy because electronic components were soldered in place. To understand what was happening at any point in the circuitry, we simply attached a probe and examined the electronic wave information on an oscilloscope.

Eventually, we were able to measure activity at key points in the computer circuitry to determine things like CPU Utilization, Channel Utilization and Input/Output response times. However, this method still had many shortcomings. First, the number of probes was very small, usually less than 40. Secondly, this method gives no insight into operating system functions or application operations that may be causing tremendous overhead. And of course, when integrated circuits were developed, the probe points went away.

In 1966 I joined an IBM team that was focusing on a better way to conduct benchmarks in what was then named an IBM Systems Center. Customers considering computer upgrades would come to our data center to determine how their programs would operate on newly released hardware. But it was simply not possible to host every customer in this way.

Continue reading

z/OS Performance Monitors – Why Real-Time is Too Late

By Morgan Oatsperformance monitor

Real-time z/OS performance monitors are often advertised as the top tier of performance management. Real-time monitoring means just that: system and storage administrators can view performance data and/or alerts indicating service disruptions continuously as they happen. In theory, this enables administrators to quickly fix the problem. For some companies, service disruptions may not be too serious if they are resolved quickly enough. Even though those disruptions could be costing them a lot more than they think, they believe a real-time monitor is the best they can do to meet their business needs.

For leading companies, optimal z/OS performance is essential for day-to-day operations: banks with billions of transactions per day, global retailers, especially on Black Friday or Cyber Monday, government agencies and insurance companies that need to support millions of customers at any given time, transportation companies with 24/7 online delivery tracking; the list goes on and on. For these organizations and many others, real-time performance information is in fact, too late. They need information that enables them to prevent disruptions – not simply tell them when something is already broken.

Continue reading

Dragging the Right Information Out of SMF/RMF/CMF for z/OS Disk Performance Analysis

Dave Heggen

By Dave HeggenDragging

Internal processing in IntelliMagic Vision is performed on a Sysplex Boundary. We want the SMF data from all LPARS in a Sysplex, and if multiple Sysplexes attach to the same hardware, then we want these Sysplexes together in the same interest group. By processing the data in this manner, an interest group will provide an accurate representation of the hardware’s perspective of activity and allow an evaluation of whether this activity is below, equal to, or above the hardware’s capability. It’s also true that the shorter the interval, the more accurate the data will be in showing peaks and lulls. The shortest interval you can define is 1 minute. This would typically be the average of 60 samples (1 cycle per second). It’s always a balancing act between the accuracy of the data and the size/cost of storing and processing the data.

Continue reading

Game Changer for z/OS Transaction Reporting

Todd-Havekost

By Todd Havekost

Periodically, a change comes to an industry that introduces a completely new and improved way to accomplish an existing task that had previously been difficult, if not daunting. Netflix transformed the home movie viewing industry by offering video streaming that was convenient, affordable, and technically feasible – a change so far-reaching that it ultimately led to the closing of thousands of Blockbuster stores. We feel that IBM recently introduced a similar “game changer” for transaction reporting for CICS, IMS and DB2.

Continue reading

Which Workloads Should I Migrate to the Cloud?

Brett

By Brett AllisonCloud Storage

By now, we have just about all heard it from our bosses, “Alright folks we need to evaluate our workloads and determine which ones are a good fit for the cloud.” After feeling a tightening in your chest, you remember to breathe and ask yourself, “How the heck do I accomplish this task as I know very little about the cloud and to be honest it seems crazy to move data to the cloud!”

According to this TechTarget article, “A public cloud is one based on the standard cloud computing model, in which a service provider makes resources, such as applications and storage, available to the general public over the internet. Public cloud services may be free or offered on a pay-per-usage model.” Most organizations have private clouds, and some have moved workloads into public clouds. For the purpose of this conversation, I will focus on the public cloud. Continue reading

HDS G1000 and Better Protecting Availability of z/OS Disk Storage

B._Phillips-web0

By Brent Phillips

 

If your job includes avoiding service disruptions on z/OS infrastructure, you may not have everything you need to do your job.

seeinsideHDSVSPG1000 (3)Disk storage in particular is typically the least visible part of the z/OS infrastructure. It is largely a black box. You can see what goes in and what comes out, but not what happens inside. Storage arrays these days are very complex devices with internal components that may become overloaded and introduce unacceptable service time delays for production work without providing an early warning.

Consequently, in the 10+ years I have been with IntelliMagic I have yet to meet a single mainframe site that (prior to using IntelliMagic) automatically monitors for threats to availability due to storage component overload. Continue reading

What HDS VSP and HP XP P9500 Should Be Reporting in RMF/SMF – But Aren’t

Gilbert

By Gilbert Houtekamer, Ph.D.

This the last blog post in a series of four, where we share our experience with the instrumentation that is available for the IBM DS8000, EMC VMAX  and HDS VSP or HP XP P9500 storage arrays through RMF and SMF.   This post is about the Hitachi high-end storage array that is sold by HDS as the VSP and by HP as the XP P9500.

RMF has been developed over the years by IBM, based on IBM storage announcements. Even for the IBM DS8000, not nearly all functions are covered; see “What IBM DS8000 Should Be Reporting in RMF/SMF – But Isn’t” blog post.  For the other vendors it is harder still –  they will have to make do with what IBM provides in RMF, or create their own SMF records.

Hitachi has supported the RMF 74.5 cache counters for a long time, and those counters are fully applicable to the Hitachi arrays.  For other RMF record types though, it is not always a perfect match.  The Hitachi back-end uses RAID groups that are very similar to IBM’s.  This allowed Hitachi to use the RMF 74.5 RAID Rank and 74.8 Link records that were designed for IBM ESS. But for Hitachi arrays with concatenated RAID groups not all information was properly captured.   To interpret data from those arrays, additional external information from configuration files was needed.

With their new Hitachi Dynamic Provisioning (HDP) architecture, the foundation for both Thin Provisioning and automated tiering, Hitachi updated their RMF 74.5 and 74.8 support such that each HDP pool is reflected in the RMF records as if it were an IBM Extent Pool.   This allows you to track the back-end activity on each of the physical drive tiers, just like for IBM.

This does not provide information about the dynamic tiering process itself, however.    Just like for the other vendors, there is no information per logical volume on what portion of its data is stored on each drive tier. Nor are there any metrics available about the migration activity between the tiers.

Overall, we would like to see the following information in the RMF/SMF recording:

  • Configuration data about replication.   Right now, you need to issue console or Business Continuity Manager commands to determine replication status.  Since proper and complete replication is essential for any DR usage, the replication status should be recorded every RMF interval instead.
  • Performance information on Universal Replicator, Hitachi’s implementation of asynchronous mirroring.  Important metrics include the delay time for the asynchronous replication, the amount of write data yet to be copied, and the activity on the journal disks.
  • ShadowImage, FlashCopy and Business Copy activity metrics. These functions provide logical copies that can involve significant back-end activity which is currently not recorded separately.  This activity can easily cause hard-to-identify performance issues, hence it should be reflected in the measurement data.
  • HDP Tiering Policy definitions, tier usage and background migration activity.  From z/OS, you would want visibility into the migration activity, and you’d want to know the policies for a Pool and the actual drive tiers that each volume is using.

Unless IBM is going to provide an RMF framework for these functions, the best approach for Hitachi is to create custom SMF records from the mainframe component that Hitachi already uses to control the mainframe-specific functionality.

It is good to see that Hitachi works to fit their data in the framework defined by RMF for the IBM DS8000.  Yet we would like to see more information from the HDS VSP and HP XP P9500 reflected in the RMF or SMF records.

So when considering your next HDS VSP or HP XP P9500 purchase, also discuss the need to manage it with the tools that you use on the mainframe for this purpose: RMF and SMF.  If your commitment to the vendor is significant, they may be responsive.

What EMC VMAX Should Be Reporting in RMF/SMF – But Isn’t

Gilbert

By Gilbert Houtekamer, Ph.D.

This the third in a series of four blogs on the status of RMF as a storage performance monitoring tool. This one is specifically about EMC VMAX. The previous postings are What RMF Should Be Telling You About Your Storage – But Isn’t” and “What IBM DS8000 Should Be Reporting in RMF/SMF – But Isn’t.

RMF has been developed over the years by IBM based on its storage announcements – although even for IBM DS8000 not nearly all functions are covered, see this blog post. Other vendors will have to work with what IBM provides in RMF, or, like EMC does for some functionality, create their own SMF records.

EMC has supported IBM’s RMF 74.5 cache counters since they were introduced, and they’ve started using the ESS 74.8 records in the past several years to report on FICON host ports and Fibre replication ports.  However, with respect to back-end reporting, it hasn’t been that simple. Since the EMC Symmetrix RAID architecture is fundamentally different from IBM’s, the EMC RAID group statistics cannot be reported on through RMF.

For EMC’s asynchronous replication method SRDF/A, SMF records were defined that, among other things, track cycle time and size.  This is very valuable information for monitoring SRDF/A session load and health.  Since Enginuity version 5876, SRDF/A Write Pacing statistics are written to SMF records as well, allowing users to track potential application impact.   The 5876 release also provided very detailed SMF records for the TimeFinder/Clone Mainframe Snap Facility.

Still, there are areas where information remains lacking, in particular on back-end drive performance and utilization.  Before thin provisioning was introduced, each z/OS volume would be defined from a set of Hyper Volumes on a limited number of physical disks.  EMC provided great flexibility with this mapping:  you could pick any set of Hyper Volumes that you like.  While conceptually nice, this made it very hard to correlate workload and performance data for logical z/OS volumes to the workload on the physical disk drives.  And, since the data on a z/OS volume was spread over a relative small number of back-end drives, performance issues were quite common.  Many customers needed to ask EMC to conduct a study if they suspected such back-end issues – and they still do.

With the new thin provisioning and FAST auto-tiering options, the relationship between logical and physical disks has been defined through even more intermediate steps.  While EMC’s FAST implementation using the policy mechanism is very powerful, it may be hard to manage for z/OS users, since no instrumentation is provided on the mainframe.  On the positive side, since data tends to be spread over more disk drives because of the use of virtual pools rather than individual RAID groups, back-end performance issues are less likely than before. Still, more information on back-end activity is needed both to diagnose emerging problems and to make sure no hidden bottlenecks occur.

Information that should make it into RMF or SMF to uncover the hidden internals of the VMAX:

  • Configuration data about SRDF replication.   Right now, users need to issue SRDF commands to determine replication status.  Yet proper and complete replication is essential for any DR usage, so the replication status should be recorded every RMF interval.
  • Data that describes the logical to physical mapping, and physical disk drive utilizations. There is external configuration data available through proprietary EMC tools that can sometimes be used in combination with RMF to compute physical drive activity. This is no substitute for native reporting in RMF or SMF.
  • Snapshot-related backend activity. Snapshots provide immediate logical copies which can generate significant back-end activity that is currently not recorded.  Snapshots are a frequent player in hard-to-identify performance issues.
  • FAST-VP policy definitions, tier usage and background activity. FAST-VP will supposedly always give you great performance, but it cannot do magic: you still need enough spindles and/or Flash drives to handle your workload.  For automatic tiering to work well, history needs to repeat itself, as Lee LaFrese said in his recent blog post, “What Enterprise Storage system vendors won’t tell you about SSDs.”  From z/OS, you want visibility into the migration activity, along with the policies for each Pool and the actual tiers that each volume is using.

It will probably be easier for EMC to create more custom SMF records, like they did for SRDF/A, than it would be to try to get their data into RMF.  Such SMF records would be fully under control by EMC and can be designed to match the VMAX architecture, making it much easier to keep them up-to-date.

EMC does seem to respond to customer pressure to create reporting in SMF for important areas.  An example of this is the creation of SRDF/A records and the recent write pacing monitor enhancement.

When considering your next EMC VMAX purchase, also consider discussing the ability to manage it with the tools that you use on the mainframe for this purpose: RMF and SMF.   If your company’s order is big enough, EMC might consider adding even more mainframe-specific instrumentation.

What RMF Should Be Telling You About Your Storage – But Isn’t

Gilbert

By Gilbert Houtekamer, Ph.D.

With every new generation of storage systems, more advanced capabilities are provided, that invariably greatly simplify your life – at least according to the announcements.  In practice however, these valuable new functions typically also introduce new a new level of complexity. This tends to make performance less predictable and harder to manage.

Looking back at history, it is important to note that RMF reporting was designed in the early days of CKD (think 3330, 3350) and mainly showed the host perspective (74.1).  With the introduction of cached controllers, cache hit statistics came along that eventually made it into RMF (74.5).  When the IBM ESS was introduced, additional RMF reporting was defined to provide some visibility into the back-end RAID groups and ports (74.8), which is now used by both HDS and IBM.

Continue reading