Eliminating Data Set Contention

By Joe Hyde

 

Sometimes a disk storage system can seemingly be humming along nicely with good average response times, but there may still be particular workloads that suffer from significant delays. This happened recently with one of IntelliMagic’s customers.

If you care about your applications and users, it is important to not only look at overall average response times, but also at response times on a more granular level. Performance by SMS Storage Group is an appropriate level to check whether particular applications or users experience issues, since SMS Storage Groups often correspond to applications. In the IntelliMagic Vision dashboard below we noted that the Pending component of response time was flagged as problematically high for two important storage groups.

blog DB Delay chart #2 V1.2

Pending time is one of the four classic components of response time and directly affects application performance, so we used IntelliMagic Vision to find the root cause.

In the classic FICON definition, Pending time itself is comprised of three components: “Command Response Delay”, “Device Busy Delay”, and “Other”. The “Other” component is what remains when the measured subcomponents (Command Response Delay and Device Busy Delay) are subtracted from the total measured Pending time. It is a bit different for zHPF. I won’t bore you with the details, but I will ask you to take on a bit of faith that pending time covers roughly the same part of I/O response time whether the I/O uses FICON or zHPF. It is just measured differently. So even though this customer used zHPF, we can use the above definition.

When looking at the components of Pending time in IntelliMagic Vision, I switch gears a bit and plot the data for each individual fifteen minute RMF interval over the seven day period.

blog DB Delay chart #3 V1.2

You can see from this chart that the “Other” component of Pending time is relatively constant and that there is only some slight fluctuation in command response delay. The real spiky behavior is in the device busy delay.

One way to get device busy delay is if the disk storage system actually returns a “device busy” status when the first command in the chain is processed. Predecessors of System z have been around since the 1960’s and storage interaction has evolved from the old days when the channel did a lot of work to satisfy the I/O. Today, with zHPF, where most of the protocol is encapsulated into a much shorter dialog encompassing only one or two round trips. Device busy is generally not presented unless the I/O command is using the FICON protocol, and the device is presently reserved by another system. So this was an unlikely cause of device busy delay in the situation that we are looking at. Nevertheless, I checked the device reserve metrics in IntelliMagic Vision for these two SMS storage groups, and indeed there had been no reserve activity whatsoever.

With reserves eliminated as a potential root cause, we will look at the other mechanism that can cause device busy delay. I call it “extent conflicts,” which is a technical way of saying “data set update contention.” The background to how extent conflicts come into play has to do with the System z having five decades of history in combination with IBM’s commitment to allow existing customer-written applications to run without changes as the architecture evolves. Traditionally, only one I/O operation at a time could execute on a volume. In the late 1990s multiple allegiance and parallel access volumes were invented, allowing more than one I/O to be processed concurrently on a volume, both from one LPAR (parallel access volumes) as well as across different LPARs (multiple allegiances). For security reasons, IBM had already implemented in the ECKD architecture a means to limit the extent of access to an I/O chain. The low and high track address is the limit of access for the I/O chain, so concurrency was allowed when these extents did not overlap, eliminating the fear of introducing errors in existing customer applications. To further reduce the risk of extent conflicts, the media manager access method, used today by such file systems as VSAM, DB2 and zFS will “shrink wrap” the extents sent to the disk storage subsystem around the tracks referenced in the I/O so that even accesses to the same data set may hopefully be able to run concurrently. Later, IBM added a flag for the caller of media manager to allow the disk storage subsystem to ignore extent conflicts, thus fully maximizing I/O parallelism. In this case, the caller takes responsibility for data integrity when I/Os are executed concurrently.

In the case we were investigating, the applications on the two storage groups were sophisticated applications that use media manager and also set the ignore extent conflicts flag, so there shouldn’t have been any significant device busy delay. To understand why there was still device busy delay you need to remember that not only is the System z host obligated to keep existing applications working, but so is the attached storage. In this case, the disk storage system was a DS8870. Prior to Release 7.2 firmware, the DS8870 did not honor the ignore extent conflicts flag when the volume was in a peer-to-peer remote copy (PPRC) relationship. The reason for this is that the order of updates to the secondary may not match what was done on the primary volume, unless extra control information is sent to and acted on by the secondary volume. Many thanks to the IBM development team for providing this information! This customer used Metro Global Mirror (a flavor of PPRC) and had a version of Release 7.1 firmware installed. Thus, the DS8870 did not ignore extent conflicts. So the device busy delay was unnecessary since it was due to extent conflicts that should have been ignored.

After this investigation, the customer upgraded their DS8870 firmware to release 7.2. Below is a chart with the left side showing a seven day period in which the delays occurred and the right side showing the seven days directly after the upgrade to Release 7.2 firmware.

blog DB Delay chart #4 V1.2

Peak device busy delays went from more than 40ms to zero, thus truly maximizing I/O concurrency. To emphasize just how much these two SMS storage groups were suffering, below is a before/after chart of the average number of concurrent requests stuck waiting for extent conflicts to be resolved.

blog DB Delay chart #5 V1.2

In some 15-minute intervals, there were up to 30 requests waiting for extent conflicts to be resolved on average for the entire interval! Hence the old antacid commercial tag line: Plop, plop. Fizz, fizz. Oh, what a relief it is.

Do your disk storage systems exhibit excess device busy delay times that could be resolved? Contact IntelliMagic today with your questions and to set up a customized demonstration.

3 thoughts on “Eliminating Data Set Contention”

  1. Hi Joe,

    Here is Fábio, from Banco do Brasil.
    First congratulations on your article. They are indeed very good. You really go deeply in your technical discurssions.

    You said: ” I checked the device reserve metrics in IntelliMagic Vision”

    Where did you check that? What are those metrics? What are the reports?

    Tks you very much.

    1. Joe Hyde says:

      Thanks for your kind words.

      As for reports, I want to let you know that in Vision 8.2 there’s a new and improved search function. I used that to show you all the different reports available for what Vision labels “Reserved (%)”. That’s the pct of samples where the volume has a hardware Reserve state.

      I used (Servers, Storage and Volume Groups > Storage Groups > Reserved (%)).

  2. George Dodson says:

    Joe, very nice article! Hope all is well with you and yours. Tell Sonny I said hi!

    George

Leave a Reply

Your email address will not be published. Required fields are marked *