Break Open Your VSM Black Box and Expose Internal Tape Processing

J._Ticic-web0By John Ticic

 

When virtual tape systems run properly, it’s great. But when there are problems, or you need to examine detailed tape information, the virtualization makes it hard to see what is really going on inside the black box.

Luckily, with z/OS we have SMF, and the virtual tape hardware vendors can define a custom record to provide measurements on the internals that can help see what is happening inside. Oracle STK VSM, for instance, generates detailed SMF data in a user record that allows us to examine tape processing in fine detail using the intelligent post-processing from the enhanced Oracle Tape support in IntelliMagic Vision.

Some of the questions that you may want answered are:

    • Why are the tape mounts taking so long?
    • How many virtual tape mounts need to be staged from real tapes?
    • Are my virtual tapes being replicated in a timely fashion?

Let’s look at an example of investigating virtual tape mounts that take a long time.

VSM blog 1

We see the activity per VSM system, but this doesn’t tell us how long virtual tape mounts are taking. Fortunately, there is a wealth of data available for us to dig deeper, with just a few clicks.VSM blog 2

The maximum mount time graph shows us a significant peak of around 800 seconds to mount a tape for VSM system PZRW95E. Why is it taking so long? And which Job or task is affected?

Drilling down through the data to the affected VTVs (Virtual Tape Volume) is easy and reveals the following:VSM blog pic 3

VTV 0EPZWE is being requested by DFHSM. This is a mount for an existing VTV, and it is taking 830 seconds – over 13 minutes! Why is it taking so long to mount a VTV?

By clicking on identify in IntelliMagic Vision, detailed information concerning all activity for this volume is displayed. This includes not only the detailed VSM SMF data but also the native z/OS SMF information obtained from the INPUT/OUTPUT and Mount SMF records (SMF 14, 15 and 21.)VSM blog 4

Note: No replication information is available since no new data was written to this tape volume.

Looking closer at the original VSM mount, we see that it was received at 4:47 PM and that a recall was necessary. The recall was initiated at 4:57 PM and completed at 5:01 PM. The tape volume that DHSM is mounting needs to be staged from a real tape, which means that a mount on a real tape drive (RTD) needs to occur. The recall information above indicates that RTD number 8 was used to mount physical volume 192347 that contained the VTV.

So why is it taking so long before the recall mount on RTD number 8 starts?

Taking a closer look at the activity for the real stacked tape MVC (Multiple Volume Cartridge), 192347 reveals the contention.VSM blog 5

The original DFHSM mount at 4:47 (previously highlighted in blue) is waiting because other VTVs are being used on the same MVC. Recall of VTV 0EPAXM starting at 4:45, then recall of 0ERTTW at 4:48, and finally recall of  0EPBAY at 4:52 are all being processed before VTV 0EPZWE can be mounted. This happens to be DFHSM requesting volumes that have been stacked onto the same MVC.

In this case, DFHSM has no awareness of where the logical volumes (VTVs) have been stacked so DFHSM cannot optimize mount requests.

Without IntelliMagic Vision’s enrichment and presentation of the Oracle STK VSM SMF data, this kind of analysis is impossible.

One thought on “Break Open Your VSM Black Box and Expose Internal Tape Processing”

  1. Hi John,

    Congratulations on your article. It is indeed very good and help us understand how to use Vision to help us keep track of our performance issues.

    I tried to follow you using our data, but I didn’t find the last report (one about MVC activity). How did you get it? What is the path?

    Regards,

    Fábio

Leave a Reply

Your email address will not be published. Required fields are marked *