Beneficial Use of GDPS Copy Once Facility (Experimental Evidence)

Brett

By Dave Heggen

There’s no law that requires GDPS implementations to use the Copy Once Facility for Global Mirror, but in my opinion, there ought to be.

The Copy Once Facility incorporates a simple idea: Copy Once describes a group of volumes without critical data; data that does not need to be continuously copied. An old version of the data on these volumes is sufficient for recovery. The beauty of the Copy Once Facility is that it is largely an act of omission: the volumes in the Copy Once group are suspended and withdrawn from the Global Mirror Session after the initial copies are completed. An additional feature of Copy Once is that you can periodically refresh the data in the DR Site if you want to. Refresh is only required if volumes move, if volumes are added or deleted, or if data gets resized. Some installations perform a refresh once a quarter as a matter of policy to ensure they have a valid copy of the data.

Some examples of good candidates for Copy Once are volumes that provide Data Set Allocation for data to be overwritten in recovery, volumes for which an old version of the data is just fine in case of recovery, such as my TSO data, and volumes for which only the VOLSER is needed at the recovery site, such as Work/Temp/Sortwk volumes.

Recently, one of our customers asked us to conduct an experiment using IntelliMagic Vision for z/OS to demonstrate the bandwidth benefits of the Copy Once Facility that they were already using. In this study, we examined their IBM GDPS implementation with Global Mirror and compared the Global Mirror Send Rate with and without Copy Once. The ‘Copy Once’ activity was measured on Monday December 8th and Tuesday December 9th, and what we called ‘All Active’ measurements were taken exactly a week later, on Monday December 15th and Tuesday the 16th, after temporarily setting all Copy Once volumes to normal copy mode where they are all mirrored continuously.

This environment has its Primary volumes on site ‘A’, a Secondary Metro Mirror copy on site ‘B’ for High Availability and a Tertiary Global Mirror copy on site ‘C’ for Disaster Recovery. In order to record the copy activity, we made sure SMF record type 74.8 (PPRC Link Statistics) was available and made one volume on the DSS containing the Secondary Volumes online to z/OS. This allowed us to capture the Link Statistics for the Synchronous Metro Mirror write activity (A->B) and the Asynchronous Global Mirror write activity (B->C) for the same intervals. We wanted to compare the used bandwidth (MB/sec) on the remote copy links in the Copy Once situation to the used bandwidth when the Copy Once facility was not used. Note that the Copy Once facility only applies to the Global Mirror part, that is, the Tertiary copies. There is no comparable feature for the Secondary copies under Metro Mirror – all data must get copied all the time.

The following table describes the status of the remote copy setup between the A, B and C sites on the two pairs of dates in the study.abcdates

 

 

 

 

In the table below you can see the 24-hour averages of the MB/sec that was being sent from the Primary to the Secondary and from the Secondary to the Tertiary on the days we investigated. These values were taken from an export we made from IntelliMagic Vision.

DSS Group 1:DSSgroup1

DSS Group 2:

dss-group2

DSS Group 3:

DSSgroup3
Remember that the Metro Mirror Send Rate between A and B is independent of whether Copy Once is used or not, since Copy Once is only applicable to the Global Mirror relationship between B and C. Thus, the A->B Send Rate comparison was only included to gauge how similar the primary workload was during the ‘All Active’ and ‘Copy Once’ time frames. The first observation was that on the Tuesday during the Copy Once timeframe, the write workload for DSS Group 1 was much higher than during the ‘All Active’ Tuesday. We had expected that the A-> B Send Rates would be roughly equal, say within 10-15% of each other, given that we are comparing weekdays to the exact same day in the following week. When we discussed this observation with the customer, this discrepancy between the Tuesdays for DSS Group 1 was not considered a large enough problem to throw it out of the comparison (chalk it up to an unexpectedly busy day).

To show the savings on bandwidth because of Copy Once, we first looked at the B->C Send Rate which measures the used bandwidth for the Asynchronous copy for both scenarios. The difference in the table shows a negative number if there is a decrease in bandwidth in the Copy Once situation, compared to the ‘All Active’ situation. You can see the decrease is omnipresent and significant; the only case where the savings were expressed in a single digit number was for the above mentioned anomalous Tuesday for DSS Group 1.

To express the savings more clearly, and independent of day-to-day primary workload changes, we created a “Percent sent to Tertiary” calculation to show what percentage of the Synchronous copy activity to the Secondary is also copied from the Secondary to the Tertiary site. During the “All Active” times, when every volume is actively copied under Global Mirror, we would expect the B -> C send rate to be quite close to the A -> B Send Rate; and indeed the table shows values around 80%. During “Copy Once” times, this percentage drops significantly, sometimes even to under 40%. The “Relative Difference” value shows how much savings in Secondary-to-Tertiary bandwidth can be achieved just by starting using the Copy Once facility.

On the whole, the use of Copy Once Facility provided a savings of between 28% and 50% of the Disaster Recovery bandwidth which would have been required if the Copy Once Facility did not exist and all volumes were continuously mirrored. One could argue that the higher bandwidth must somehow be provisioned for the initial copy and refresh activity, but the counter to that argument is that the refresh can occur at a time of your choosing, when overall bandwidth requirements are low.

Based on the case above, I believe that using the Copy Once Facility of GDPS is good practice. A Copy Once Facility is not expensive to implement or maintain and can save up to 50% of the bandwidth that would be required if it wasn’t available. These results can also be extrapolated to zGM, SRDF/A, or HUR environments since Copy Once concerns the characteristics of the data in the data center and not the replication technique selected.

If you would like a better understanding of the potential bandwidth savings from implementing GDPS Copy Once in your environment, please contact us to schedule a discussion with one of our technical experts.

4 thoughts on “Beneficial Use of GDPS Copy Once Facility (Experimental Evidence)”

  1. Ron Hawkins says:

    Dave,

    A very good write up. May I suggest that you embellish your examples with the peak data rate, rather than the 24 hour average, as this is often where customers have to spend some big bucks.

    Back when I was doing Remote Copy sizing in APAC for z/OS. Generally speaking the temporary data sets used by batch would account for around 30% of the bandwidth required for remote copy, especially the CFW activity to SORTWKnn data sets.

    While HDS Remote defaults to suppressing Cache Fast Write being remitted to the S-VOL, customers were often more comfortable with a Storage Group for all temporrary datasets, and the volumes in the STORGRUP were managed as Copy Once.

    There are requirements for GDPS Hyperswap to remote copy all writes to effect local Hyperswap recovery, but for GDPS, IBM, EMC and HDS solutions that do not use Hyperswap I think you may be able to extend the Copy Once bandwidth saving to the A-B leg, as well as B-C.

    Again, a good write up.

    Ron

    1. Dave Heggen says:

      Thank you for your comments Ron,

      When this data was presented to the customer(s), we included comparison charts at 15 minute intervals and data from GDPS SMF type 105 records at 1 minute intervals. These
      reports were natively produced from IntelliMagic Vision and not exported .CSV files
      that were subsequently processed by Excel (this is how the tables in the blog were created).

      You could contact me via email and we could discuss some details if needed.

      You are absolutely correct that the purpose of the A=>B leg was (in all cases) to create a high availability/hyperswap environment.

      Thanks Again, …Dave

  2. Hi, I’m the chief architect / program manager of the GDPS solution. I thought the Beneficial Use of GDPS Copy Once Facility (Experimental Evidence) article was well written and appreciate the assessment of the benefit of the GDPS/GM Copy Once function. thanks

    1. Dave Heggen says:

      Thank you for your kind words. …Dave

Leave a Reply

Your email address will not be published. Required fields are marked *