Application Design Issues Cause Low Throughput to Virtual Tape

Dave HeggenBy Dave Heggenlow throughput application failure

Our story begins as our stories usually do, somewhere in the middle after the customer has been working on a problem, in this case low throughput to virtual tape, for a while and they are just about to give-up.

The customer had a group of tape jobs that frequently would not finish on time. On time means the jobs complete within the batch window. Not on time means they would run beyond the batch window and would compete with the online activity. Some days the job would run in less than an hour, while the same job on other days would run for 10 to 15 hours.

Low Throughput to VSM Tape Systems

The customer investigated the issue when the jobs ran long and found that the throughput to the VSM tape systems was low for the jobs in question. A joint investigation with Oracle was started under the assumption that the problem was caused by the VSM tape systems. My intel tells me that 6 people from the customer worked on the investigation for a whole month and Oracle was unable to find anything wrong from a hardware or software perspective.

Continue reading

Easy z/OS Application Performance Testing: New Webinar

Brent PhillipsBy Brent Phillipseasy z/os application performance testing

The z/OS platform excels at efficiently executing transactions and is a critically important component for many business applications. Yet, application development teams rarely have an understanding of how application program changes impact the performance and cost efficiency of the underlying z/OS infrastructure.

Nearly all z/OS performance and capacity experts have stories about an application being launched into production and causing major service level delivery problems and/or unexpected z/OS infrastructure costs. These production problems happened despite existing DevOps test procedures for new application versions.

Last year IBM announced that z/OS mainframe clients could triple the capacity of their development environment with no increase in MLC. Being able to afford a properly sized Dev/Test environment is extremely valuable, but it is not enough to prevent these performance and cost issues.

Our webinar, Easy z/OS Application Performance Testing Services from the Infrastructure Perspective, discusses an easy way to see variances in the “application performance signature” from an infrastructure perspective.

The webinar covers:

  • Application Infrastructure Performance Signatures
  • How to Automate the Calculation of Variances between Releases
  • How to Increase Collaboration between DevOps areas
  • How to Achieve Fewer Unexpected Infrastructure Cost Increases
  • How to Have Fewer Production Performance Problems

In the webinar, we will examine the historical reasons why the performance and financial cost impact on the z/OS infrastructure has been difficult to test in DevOps organizations and how utilizing a modernized approach with artificial intelligence can significantly improve release quality as it relates to the z/OS infrastructure’s ability to efficiently deliver the service levels required by specific applications.

 

View the webinar on-demand

 

An Effective Solution to the Mainframe Skills Shortage for z/OS Performance and Capacity Professionals

Todd-Havekost

By Todd HavekostMainframe Skills Shortage

The mainframe skills shortage for z/OS performance analysts and capacity planners has left many organizations struggling to ensure availability. Current experts are often overworked and lack the manpower, resources, or tools necessary to effectively perform their jobs. This is often caused by a reliance on manual processes and the limitations of in-house developed solutions, rather than leveraging the built-in, automated capabilities provided by an effective performance solution.

This can put the availability of the infrastructure and applications at risk. Many enterprises are finding it to be difficult to replace or supplement z/OS performance skills that are becoming increasingly scarce.

In his blog, “Bridging the z/OS Performance & Capacity Skills Gap,” Brent Phillips wrote about the availability and efficiency benefits that can be gained from modernizing the analysis of the mainframe infrastructure using processes that leverage artificial intelligence.

Modernized analytics can also help solve the skills shortage by making current staff more productive and getting newer staff up to speed more rapidly. An effective analytics solution that will expedite the acquisition of skills for z/OS performance analysts and capacity planners needs 5 key attributes. These attributes are covered in detail with illustrations in the paper at the link at the bottom. In this blog I will briefly introduce 3 of the key attributes.

Continue reading

Which Workloads Should I Migrate to the Cloud?

Brett AllisonBy Brett Allison

The cloud is the ultimate in infrastructure commoditization, reducing costs to their bare minimum and having end users pay for what they use. CIO’s and Directors are asking for workloads to move to the cloud primarily for cost savings reasons.

Most organizations have private clouds, and some have moved workloads into public clouds. For the purpose of this conversation, I will focus on the public cloud. According to this TechTarget article, “A public cloud is one based on the standard cloud computing model, in which a service provider makes resources, such as applications and storage, available to the general public over the internet. Public cloud services may be free or offered on a pay-per-usage model.”

The cloud provides an economic model for computing that may work well for some workloads, so the trick is to figure out which ones are a good fit.

Continue reading

HDS G1000 and Better Protecting Availability of z/OS Disk Storage

Brent PhillipsBy Brent Phillips

If your job includes avoiding service disruptions on z/OS infrastructure, you may not have everything you need to do your job.

Disk storage in particular is typically the least visible part of the z/OS infrastructure. It is largely a black box. You can see what goes in and what comes out, but not what happens inside. Storage arrays these days are very complex devices with internal components that may become overloaded and introduce unacceptable service time delays for production work without providing an early warning.

Consequently, in the 10+ years I have been with IntelliMagic I have yet to meet a single mainframe site that (prior to using IntelliMagic) automatically monitors for threats to availability due to storage component overload. Continue reading

The Attack of the Disk Clones

Lee

By Lee LaFrese

 

Attack of Disk ClonesWhen I first started working in the storage industry, the only way data was backed up was via tape. In the days of $10,000 per gigabyte of disk, there was no way any sane person would propose disk back-up.  The best practice for Disaster Recovery (DR) in these days was to create a nightly tape backup and then use PTAM (pick-up truck access method) to store it offsite. Applications were unavailable during the dump and the odds of successfully restarting after a disaster were typically not in your favor. DR testing was piecemeal at best and ignored at worst. In those days, the statistics suggest that many enterprises that experienced a major data loss due to a disaster simply went out of business.

Today, it is a different world. Cheaper disk, combined with the realization that most businesses need continuous availability, has led to replication schemes designed to avoid both data loss and downtime in the event of an unexpected outage. Point in time copy is used to make local disk clones to facilitate functions such as data mining, business intelligence, and rigorous DR testing. Straightforward backups are still done, but they now often use “tapeless tape” systems that rely on spinning disk instead of on magnetic tape. The net result is that instead of two copies of data (one on disk, one on tape), many enterprises now have more copies of their data than they can keep track of.  Indeed, this proliferation of copies has been a major influence on the explosion of storage capacity that has been going on. Although there are good reasons for all of these copies, it seems that our data centers are under siege by disk clones! Continue reading

HDS Pools Hit the Target in RMF for Hitachi Dynamic Tiering

Dr. Gilbert HoutekamerBy Gilbert Houtekamer, Ph.D.

In previous blogs we talked about metrics that we would like to see added to RMF and SMF records. We also discussed the challenges that EMC and HDS face fitting their measurement data in the IBM-defined RMF instrumentation.

A good example of what can be achieved given the constraints is what HDS did for their Hitachi Dynamic Provisioning (HDP) pools. HDP pools are the basis for thin provisioning and dynamic tiering in the Hitachi architecture. An HDP pool consists of a number of array groups. Arrays with different drive technologies can be combined in a pool with dynamic tiering, in a mix that you feel is appropriate for your workload. Continue reading

z/OS Petabyte Capacity Enablement

Dave HeggenBy Dave Heggen

We work with many large z/OS customers and have seen only one requiring more than a petabyte (PB) of primary disk storage in a single sysplex. Additional z/OS environments may exist, but we’ve not yet seen them (if you are that site, we’d love to hear from you!). The larger environments are 400-750 TB per sysplex and growing, so it’s likely those will reach a Petabyte requirement soon.iStock_000027232723Small

IBM has already stated that the 64K device limitation will not be lifted. Customers requiring more than 64K devices have gotten relief by migrating to larger devices (3390-54 and/or Extended Address Volumes) and by exploiting of Multiple SubSystems (MSS) for use by PAV Aliases and Metro Mirror (PPRC) Secondary and FlashCopy Target devices.

The purpose of this blog is to discuss the strategies of how to position existing and future technologies to allow for this required growth. Continue reading

Does everybody know what time it is? Tool Time!

Lee

By Lee LaFreseiStock_000016017745Small

Home Improvement was a popular TV show from the 90’s that lives on forever in re-runs. Usually, one of the funniest segments was the show within a show, “Tool Time”. During Tool Time, Tim Taylor (played by Tim Allen) would demonstrate how to use various power tools, often with disastrous and hilarious results. If things were not working right he would often exclaim “more power” as if that would make everything right. Unfortunately when you are not using the right tool in the right way, more power usually does more harm than good. The expression “when all you have is a hammer, everything looks like a nail” comes to mind.

Some of our prospective customers in the z/OS space use IBM DS8000. They often ask why they would want IntelliMagic Vision if they have IBM Tivoli Storage Productivity Center for Disk (TPC). The answer is very simple – for z/OS environments IntelliMagic Vision is clearly the right tool to use. There is no reason to pull a Tim Taylor and force fit something else for the job. Here are some of the reasons why IntelliMagic Vision is the best choice for this environment.
Continue reading

Does your Disaster Recovery Plan meet its objectives? Analyzing TS7700 Tape Replication (Part 1 of 2)

Burt LoperBy Burt Loper

This blog is the first in a series of two blogs on the topic of Mainframe Virtual Tape Replication.

One of the challenges in IT is getting your data replicated to another location so that you have a recovery capability if your main operations center is compromised. IBM TS7700 Series Virtualization Engines support the copying of your tape data to other locations.

This article explores the various TS7700 replication modes.

TS7700 Terminology

The IBM TS7700 Virtualization Engine is commonly known as a cluster. When you connect two or more clusters together, that is called a grid or composite library. The information here applies to both the TS7740 model (which uses backend tape drives and cartridges to store tape data) as well as the TS7720 model (which uses a large disk cache to store tape data).

In a multi-cluster grid, the clusters are interconnected with each other via a set of 1 Gb or 10 Gb Ethernet links. The TS7700’s use TCP/IP communication protocols to communicate with each other and copy tape data from one cluster to another.

Continue reading

nil sub sole novum – Vision 7 Support for IBM APAR OA39993

Dave HeggenBy Dave Heggen
IBM APAR OA39993 available for the zEC12 processor introduced what could be considered a new Service Time component for devices, Interrupt Delay Time.  This value measures the time from when the IO completes to when the z/OS issues Test SubChannel (TSCH) to view the results.

Not new, but previously Uncaptured

I selected the title for this Blog as “nil sub sole novum”. It’s a common Latin phrase whose literal translation is ‘Nothing under the Sun is new’, meaning ‘everything has been done before’.  This is very true for Interrupt Delay Time, the component is new but the activity has always been with us. This activity was previously without description.  It came from what you could call ‘Uncaptured IO Time’.  Even my spell checker rebels against the use of the word uncaptured.

Continue reading

Less is More – Why 32 HyperPAVs are Better than 128

Dr. Gilbert HoutekamerBy Gilbert Houtekamer, Ph.D.

When HyperPAV was announced, the extinction of IOSQ was expected to follow shortly.  And indeed for most customers IOSQ time is more or less an endangered species.  Yet in some cases a bit of IOSQ remains, and even queuing on HyperPAV aliases may be observed.  The reflexive reaction from a good performance analyst is to interpret the queuing as a shortage that can be addressed by adding more Hypers.  But is this really a good idea? Adding  aliases will only increase overhead and  will decrease WLM’s ability to handle important I/Os with priority.  Let me explain why.

HyperPAV, like many I/O related things in z/OS, works on an LCU basis.  LCUs are a management concept in z/OS: each LCU can support up 8 channels for data transfer, and up to 256 device addresses.   With HyperPAV, some of the 256 addresses are used for regular volumes (“base addresses”), and some are available as “aliases”.  You do not need to use all 256 addresses; it is perfectly valid to have no more than 64 base addresses and 32 aliases in an LCU.

Continue reading

What IBM DS8000 Should Be Reporting in RMF/SMF – But Aren’t

Dr. Gilbert HoutekamerBy Gilbert Houtekamer, Ph.D.

This the second in a series of four blogs by Dr. Houtekamer on the status of RMF as a storage performance monitoring tool. This installment is specifically based on experience using available instrumentation for IBM DS8000. What RMF Should Be Telling You About Your Storage – But Isn’t is the first in the series of blogs.

While more advanced capabilities are added with every new generation of the DS8000, these also introduce extra levels of ‘virtualization’ between the host and disk drives. One good example is FlashCopy, which delivers an instantly usable second copy, but also causes hard to predict backend copy workload. Others are Global Mirror or EasyTier, which both lag behind when it comes to the measurement instrumentation.

Although users value added functionality, it is wise to be mindful of the inevitable performance impact caused by increased complexity. Ironically, despite the rush to add more functionality, we have not seen a major update to RMF since ESS was first introduced and the 74.8 records were added to provide back-end RAID groups and ports statistics. Continue reading

What RMF Should Be Telling You About Your Storage – But Isn’t

Dr. Gilbert HoutekamerBy Gilbert Houtekamer, Ph.D.

With every new generation of storage systems, more advanced capabilities are provided, that invariably greatly simplify your life – at least according to the announcements.  In practice however, these valuable new functions typically also introduce new a new level of complexity. This tends to make performance less predictable and harder to manage.

Looking back at history, it is important to note that RMF reporting was designed in the early days of CKD (think 3330, 3350) and mainly showed the host perspective (74.1).  With the introduction of cached controllers, cache hit statistics came along that eventually made it into RMF (74.5).  When the IBM ESS was introduced, additional RMF reporting was defined to provide some visibility into the back-end RAID groups and ports (74.8), which is now used by both HDS and IBM.

Continue reading