z/OS Performance Monitors – Why Real-Time is Too Late

By Morgan Oatsperformance monitor

Real-time z/OS performance monitors are often advertised as the top tier of performance management. Real-time monitoring means just that: system and storage administrators can view performance data and/or alerts indicating service disruptions continuously as they happen. In theory, this enables administrators to quickly fix the problem. For some companies, service disruptions may not be too serious if they are resolved quickly enough. Even though those disruptions could be costing them a lot more than they think, they believe a real-time monitor is the best they can do to meet their business needs.

For leading companies, optimal z/OS performance is essential for day-to-day operations: banks with billions of transactions per day, global retailers, especially on Black Friday or Cyber Monday, government agencies and insurance companies that need to support millions of customers at any given time, transportation companies with 24/7 online delivery tracking; the list goes on and on. For these organizations and many others, real-time performance information is in fact, too late. They need information that enables them to prevent disruptions – not simply tell them when something is already broken.

Continue reading

No Budget for an ITOA Performance Management Solution

By Morgan Oats

no budget

Every department in every industry has the same problem: how can I stretch my budget to get the necessary work done, make my team more effective, reduce costs, and stay ahead of the curve? This is equally true for performance and capacity planning teams. In many cases, it’s difficult to get budget approval to purchase the right software solution to help accomplish these goals. Management wants to stay under budget while IT is concerned with getting a solution that solves their problems. When trying to get approval for the right solution, it’s important to be able to show how you will get a good return on investment.

Continue reading

All I Want for Christmas is…Time

By Jerry Streetgift

With the holiday season upon us, I occasionally think of what might be waiting for me to unwrap. Will it be another gift card? I hope not. Gift cards are someone’s way of saying, “I appreciate you so much that you should get your own present.” There are many things that I would enjoy getting as a present, but the one thing that would actually make my life better would be a couple of extra hours in my day. I need more time! Unfortunately, I can’t get the earth to slow down and make a full revolution in 26 hours instead of 24. So I need tools to save me time within the 24 hours that I’m scripted to have.

As IT Performance professionals, we are continually asked to do more.  Systems grow more complex, analyses need to be delivered faster, and dollars have to be spent more wisely than ever. When professional life demands require more time, you can either give up your personal time or let the quality of your work suffer. I don’t want to do either of those things so I would choose to do my job both faster and better. A tool that helps me accomplish both goals is IntelliMagic Vision. Continue reading

z/OS Petabyte Capacity Enablement


By Dave Heggen

We work with many large z/OS customers and have seen only one requiring more than a petabyte (PB) of primary disk storage in a single sysplex. Additional z/OS environments may exist, but we’ve not yet seen them (if you are that site, we’d love to hear from you!). The larger environments are 400-750 TB per sysplex and growing, so it’s likely those will reach a Petabyte requirement soon.iStock_000027232723Small

IBM has already stated that the 64K device limitation will not be lifted. Customers requiring more than 64K devices have gotten relief by migrating to larger devices (3390-54 and/or Extended Address Volumes) and by exploiting of Multiple SubSystems (MSS) for use by PAV Aliases and Metro Mirror (PPRC) Secondary and FlashCopy Target devices.

The purpose of this blog is to discuss the strategies of how to position existing and future technologies to allow for this required growth. Continue reading

IBM TS7700 Replication – Is Your Data Safe? (Part 2 of 2)


By Burt Loper


One of the challenges in IT is getting your data replicated to a remote location for fail-over and data recovery if your main operations center is compromised. It is not sufficient to set up replication, you also have to watch closely whether your replication goals are met at all times.

Part 1 of this blog explored the various TS7700 replication modes. Part 2 explores how IntelliMagic Vision can be used to monitor the health of the TS7700 replication process.

TS7700 Replication Monitoring

The TS7700 keeps track of many performance statistics about its operation. A constant watch of these metrics is needed to make sure that performance and replication goals are being met. IntelliMagic Vision performs fully automated daily interpretation of all relevant performance statistics. It applies built-in intelligence about the hardware and workloads to rate the health of the clusters and flag exceptions in dashboards and charts. The enhanced metrics are put in a database that can also be used for ad-hoc reporting with easy-to-use graphical views. Continue reading

You Can’t Do Performance Analysis with SMI-S and Other Myths


By Brett Allison

Some vendors are perpetuating the myth that SMI-S is not designed for performance management. Recently some of our customers asked a vendor to surface additional performance metrics through SMI-S. They received a response along the lines of: “SMI-S is not supposed to handle performance metrics; it is mainly for management. If you want performance metrics, you should buy our proprietary tool.”

While SMI-S has some limitations, the SMI-S Block Server Performance (BSP) defines a very rich set of storage system components for which metrics can be defined: System, Peer Controller, Front-end Adapter, Front-end Port, Back-end Adapter, Back-end Port, Replication Adapter, Volume, and Disks. The BSP further defines counters for reads, writes, read throughput, write throughput, read response time and write response time for each of the components. Coupled with the comprehensive configuration information available within the standard, the SMI-S standard provides a rich canvas for a client consuming SMI-S data to paint the performance profile of a vendor’s hardware.

Continue reading

IBM TS7700 Performance – What can you do if your Cache Hit Percent drops too low and affects your tape performance?


By Burt Loper

What can you do if some of your tape jobs seem to be running too long and you observe that your Cache Hit Percent for your TS7740 virtual tape system is low?

First, let’s define Cache Hit Percent.  When host systems mount a tape, there are 3 possibilities:

  1. A scratch mount – this is always a cache hit as long as the TS7700 fast ready categories are configured correctly.  These are called Fast Ready mounts by the TS7700.
  2. A specific mount where the volser being requested is already in the TS7700’s cache.  These are called Cache Read Hit mounts by the TS7700.
  3. A specific mount where the volser being requested is not in the TS7700’s cache.  These are called Cache Read Miss mounts by the TS7700.

The first two mount types usually result in very quick tape mount times on the order of 1-2 seconds.  The mount time for a Cache Read Miss mount is much longer since the data from the volser must be retrieved from a tape cartridge back into the cache before the host mount can complete.  Cache Hit Percent for an interval is the sum of the Fast Ready mounts and the Cache Read Hit mounts divided by the Total virtual mounts (i.e. Fast Ready + Cache Read Hit + Cache Read Miss).

So, a low Cache Hit Percent means that more tape mounts are encountering  longer mount times associated with a Cache Read Miss mount and may mean that jobs take longer.  A general rule of thumb is the Cache Hit Percent should be at least 80%, but greater than 90% is preferable.

Continue reading

Where art thou, Standards?


By Brett Allison

Enterprise-wide reporting across the IT stack and IT domains is a nightmare to develop and maintain.

iStock_000026008843SmallIn 2009, I developed a storage chargeback system for a customer using 17 different data sources.  The data collection was mostly automated but admittedly it wasn’t very robust.  The data was from different sources including native CLI, manual input, database queries, and automatically generated reports.  It was fairly modest but effective, however it took a part-time developer 20 to 40 hours per month to generate the reports and maintain the system after I turned it over.

Even within the storage domain there wasn’t one single interface for communicating with the devices.  Other systems, such as change management and asset management databases, had standard SQL query interfaces but some systems were inaccessible and required someone to update a flat file. Continue reading