Category Archives: z/OS Environments

No Budget for a Storage Management Solution

By Morgan Oats

Every department in every industry has the same problem: how can I stretch my budget to get the necessary work done, make my team more effective, reduce costs, and stay ahead of the curve? This is equally true for performance and capacity planning teams. In many cases, it’s difficult to get budget approval to purchase the right software solution to help accomplish these goals. Management wants to stay under budget while IT is concerned with getting a solution that solves their problems. When trying to get approval for the right solution, it’s important to be able to show how you will get a good return on investment.

Continue reading

Bridging the z/OS Mainframe Performance & Capacity Skills Gap

B._Phillips-web0By Brent Phillips

Many, if not most organizations that depend on mainframes are experiencing the effects of the mainframe skills gap, or shortage. This gap is a result of the largely baby-boomer workforce that is now retiring without a new generation of experts in place who have the same capabilities. At the same time, the scale, complexity, and change in the mainframe environment continues to accelerate. Performance and capacity teams are a mission-critical function, and this performance skills gap represents a great risk to ongoing operations. It demands both immediate attention and a new, more effective approach to bridging the gap.

Bridging the z/OS Mainframe Performance and Capacity Skills Gap

Continue reading

How Much Flash Do I Need Part 2: Proving the Configuration

By Jim Sedgwick

Before making a costly Flash purchase, it’s always a good idea to use some science to forecast if the new storage hardware configuration, and especially the costly Flash you purchase, is going to be able to handle your workload. Is your planned purchase performance capacity actually too much, so that you aren’t getting your money’s worth? Or, even worse, is your planned hardware purchase too little?

In Part 1 of this blog, we discovered that our customer just might be planning to purchase more Flash capacity than their unique workload requires. In part 2 we will demonstrate how we were able to use modeling techniques to further understand how the proposed new storage configuration will handle their current workload. We will also project how this workload will affect response times when the workload increases into the future, as workloads tend to do.

Continue reading

How Much Flash Do I Need? Part 1

By Jim Sedgwick

Flash, Flash, Flash. It seems that every storage manager has a new favorite question to ask about Flash storage. Do we need to move to Flash? How much of our workload can we move to Flash? Can we afford to move to Flash? Can we afford NOT to move to Flash?

Whether or not Flash is going to magically solve all our problems (it’s not), it’s here to stay. We know Flash has super-fast response times as well as other benefits, but for a little while yet, it’s still going to end up costing you more money. If you subscribe to the notion that it’s good to make sure you only purchase as much Flash as your unique workload needs, read on.

Continue reading

How to Measure the Impact of a Zero RPO Strategy

Merle SadlerBy Merle Sadler

Have you ever wondered about the impact of zero RPO on Mainframe Virtual Tape for business continuity or disaster recovery? This blog focuses on the impact of jobs using the Oracle/STK VSM Enhanced Synchronous Replication capability while delivering an RPO of 0.

A recovery point objective, or “RPO”, is defined by business continuity planning. It is the maximum targeted time period in which data might be lost from an IT service due to a major incident.

Zero RPO - Recovery Point Objective

Continue reading

The High Cost of “Unpredictable” IT Outages and Disruptions

By Curtis Ryan

High Costs of IT Outages

It is no secret that IT service outages and disruptions can cost companies anywhere from thousands up to millions of dollars per incident – plus significant damage to company reputation and customer satisfaction. In the most high profile cases, such as recent IT outages at Delta and Southwest Airlines, the costs can soar to over $150 million per incident (Delta Cancels 280 Flights Due to IT Outage). Quite suddenly, IT infrastructure performance can become a CEO level issue (Unions Want Southwest CEO Removed After IT Outage).

While those kinds of major incidents make the headlines, there are thousands of lesser known, but still just as disruptive to business, service level disruptions and outages happening daily in just about every sizeable enterprise.

The costs of these often daily occurring incidents, like an unexpected slowdown in response time of a key business application during prime shift, can have a significant cumulative financial impact that may not be readily visible in the company’s accounting system.

Continue reading

What’s Using Up All My Tapes? – Using Tape Management Catalog Data

BrettBy Dave Heggen

tape management catalog

Most of the data processed for IntelliMagic Vision for z/OS Tape is performance, event or activity driven, obtained from SMF and the Virtual Tape Hardware. Did you know that in addition to the SMF and TS7700 BVIR data, IntelliMagic Vision could also process information from a Tape Management Catalog (TMC)? Having this type of data available and processing it correctly is critical to answering the question “What’s using up all my tapes?”.

We’re all set up and distributed scratch lists. This is a necessary (and generally manual) part of maintaining a current tape library. It does require participation for compliance. Expiration Dates, Catalog and Cycle management also have their place to automate the expiration end of the tape volume cycle. This blog is intended to address issues that neither compliance nor automation address.

Continue reading

SRM: The “Next” As-a-Service

Brett

By Brett Allison

You may have seen this article published by Forbes, stating that Storage Resource Management (SRM) is the “Next as-a-Service.” The benefits cited include the simplicity and visibility provided by as-a-service dashboards and the increasing sophistication through predictive analytics.

IntelliMagic Vision is used as-a-Service for some of the world’s largest companies, and has been since 2013. Although we do much more than your standard SRM by embedding deep expert knowledge into our software, SRM, SPM, and ITOA all fall under our umbrella of capabilities. So, while we couldn’t agree more with the benefits of as-a-service offerings for SRM software, the word “Next” in the article seems less applicable. We might even say: “We’ve been doing that for years!”

Continue reading

Getting the Most out of zEDC Hardware Compression

Todd-Havekost

By Todd Havekost

One of the challenges our customers tell us they face with their existing SMF reporting is keeping up with emerging z/OS technologies. Whenever a new element is introduced in the z infrastructure, IBM adds raw instrumentation for it to SMF. This is of course very valuable, but the existing SMF reporting toolset, often a custom SAS-based program, subsequently needs to be enhanced to support these new SMF metrics in order to properly manage the new technology.

z Enterprise Data Compression (zEDC) is one of those emerging that is rapidly gaining traction with many of our customers, and for good reasons:

  • It is relatively straightforward and inexpensive to implement.
  • It can be leveraged by numerous widely used access methods and products.
  • It reduces disk storage requirements and I/O elapsed times by delivering good compression ratios.
  • The CPU cost is very minimal since almost all the processing is offloaded to the hardware.

Continue reading

Game Changer for Transaction Reporting

Todd-Havekost

By Todd Havekost

Periodically, a change comes to an industry that introduces a completely new and improved way to accomplish an existing task that had previously been difficult, if not daunting. Netflix transformed the home movie viewing industry by offering video streaming that was convenient, affordable, and technically feasible – a change so far-reaching that it ultimately led to the closing of thousands of Blockbuster stores. We feel that IBM recently introduced a similar “game changer” for transaction reporting for CICS, IMS and DB2.

Continue reading

The Circle of (Storage) Life

Lee

Storage Life Cycle

By Lee LaFrese

Remember the Lion King? Simba starts off as a little cub, and his father, Mufasa, is king. Over time, Simba goes through a lot of growing pains but eventually matures to take over his father’s role despite the best efforts of his Uncle Scar to prevent it. This is the circle of life. It kind of reminds me of the storage life cycle only without the Elton John score!

Hardware Will Eventually Fail and Software Will Eventually Work

New storage technologies are quickly maturing and replacing legacy platforms. But will they be mature enough to meet your high availability, high performance IT infrastructure needs?

Continue reading

What Good is a zEDC Card?

BrettBy Dave Heggen

informatics inc: You Need Our Shrink!

The technologies involving compression have been looking for a home on z/OS for many years. There have been numerous implementations to perform compression, all with the desired goal of reducing the number of bits needed to store or transmit data. Hostbased implementations ultimately trade MIPS for MB. Outboard hardware implementations avoid this issue.

Examples of Compression Implementations

The first commercial product I remember was from Informatics, named Shrink, sold in the late 1970s and early 1980s. It used host cycles to perform compression, could generally get about a 2:1 reduction in file size and, in the case of the IMS product, worked through exits so programs didn’t require modification. Sharing data compressed in this manner required accessing the data with the same software that compressed the data to expand it.

Continue reading

How’s Your Flash Doing?

By Joe Hyde

Assessing Flash Effectiveness

How’s your Flash doing? Admittedly, this is a bit of a loaded question. It could come from your boss, a colleague or someone trying to sell you the next storage widget. Since most customers are letting the vendors’ proprietary storage management algorithms optimize their enterprise storage automatically you may not have had the time or tools to quantify how your Flash is performing.

The Back-end Activity

First, let’s use the percentage of back-end activity to Flash as the metric to answer this question. Digging a little deeper we can look at back-end response times for Flash and spinning disks (let’s call these HDD for Hard Disk Drives). I’ll also look at the amount of sequential activity over the day to help explain the back-end behavior.

Below is 5 weekdays worth of data from an IBM DS8870 installed at a Fortune 500 company. Although it’s possible to place data statically on Flash storage in the IBM DS8870, in this case, IBM’s Easy Tier is used for the automatic placement of data across Flash and HDD storage tiers. Let’s refer to this scheme generically as auto-tiering. For this IBM DS8870, Flash capacity was roughly 10% of the total storage capacity. Continue reading

Flash Performance in High-End Storage

cor-m

By Dr. Cor Meenderinck

This is a summary of the white paper with the same title which was the Winner of 2016 CMG imPACt conference Best Paper Award. It is a great example of the research that we do that leads to the expert knowledge we embed in our products.

Flash based storage is revolutionizing the storage world. Flash drives can sustain a very large number of operations and are extremely fast. It is for those reasons that manufacturers eagerly embraced this technology to be included in high-end storage systems. As the price per gigabyte of flash storage is rapidly decreasing, experts predict that flash will soon be the dominant medium in high-end storage.

But how well are they really performing inside your high-end storage systems? Do the actual performance metrics when deployed within a storage array live up to the advertised Flash latencies of around 0.1 milliseconds? Continue reading

Beat the Annual MLC Software Price Increase

Todd-Havekost

By Todd Havekost

In August, IBM announced their annual 4% increase in z Systems Monthly License Charge (MLC) software prices. It was indicated in the announcement letter that the timing of the announcement is designed to give customers sufficient lead time to adjust their budgets for the following year. This cost increase may put additional strain on a lot of already tight budgets and force some shops to make unpleasant decisions. We at IntelliMagic think you have a better alternative to the MLC expense increases.

For several months, IntelliMagic has been delivering free MLC Reduction Assessments showing mainframe sites ways MLC expenses can be reduced. These assessments unleash the visibility provided by IntelliMagic Vision into SMF data from your environment, exploring several potential areas of opportunity for savings. And for a majority of mainframe sites we have been able to help identify significant potential MLC reductions through these assessments.
Continue reading

Which Workloads Should I Migrate to the Cloud?

Brett

By Brett AllisonCloud Storage

By now, we have just about all heard it from our bosses, “Alright folks we need to evaluate our workloads and determine which ones are a good fit for the cloud.” After feeling a tightening in your chest, you remember to breathe and ask yourself, “How the heck do I accomplish this task as I know very little about the cloud and to be honest it seems crazy to move data to the cloud!”

According to this TechTarget article, “A public cloud is one based on the standard cloud computing model, in which a service provider makes resources, such as applications and storage, available to the general public over the internet. Public cloud services may be free or offered on a pay-per-usage model.” Most organizations have private clouds, and some have moved workloads into public clouds. For the purpose of this conversation, I will focus on the public cloud. Continue reading

5 Reasons Why All-Flash Arrays Won’t Magically Solve All Your Problems

Brett

By Brett Allison

 
IntelliMagic Flash Storage
In the last few years, flash storage has turned from very expensive into quite affordable. Vendors that sell all-flash arrays advertise the extremely low latencies, and those are indeed truly impressive. So it may feel like all-flash systems will solve all your performance issues. But reality is that even with game-changing technological advances like flash, the complexity of the entire infrastructure makes sure that there are still plenty of problems to run into. Continue reading

Mainframe Capacity “Through the Looking Glass”

Todd-Havekost

By Todd Havekost

 

With the recent release of “Alice Through the Looking Glass” (my wife is a huge Johnny Depp fan), it seems only appropriate to write on a subject epitomized by Alice’s famous words:

“What if I should fall right through the center of the earth … oh, and come out the other side, where people walk upside down?”  (Lewis Carroll, Alice in Wonderland)

Along with the vast majority of the mainframe community, I had long embraced the perspective that running mainframes at high levels of utilization was essential to operating in the most cost-effective manner. Based on carefully constructed capacity forecasts, our established process involved implementing just-in-time upgrades designed to ensure peak utilization’s remained slightly below 90%.

It turns out we’ve all been wrong.  Continue reading

Achieving Significant Software Cost Reduction on the IBM z13

B._Phillips-web0

By Brent Phillips

 

While most mainframe shops have explored how to reduce mainframe software costs, at IntelliMagic we are finding significant latent savings opportunities still exist at even the best run sites.

Since software cost reduction is always important, we thought it would be helpful to pass along a valuable resource from the March 2016 SHARE Conference for mainframe users, which included a new session by Todd Havekost of USAA, a Fortune 100 financial services company.

Mr. Havekost’s presentation ‘Achieving Significant Capacity Improvements on the IBM z13’ outlined the results of their software cost optimization initiatives. Part of the story is that some of the historical capacity planning assumptions no longer apply and how lowering RNI reduced both MIPS and the cost of IBM Monthly Licensing Charge (MLC) software. The session has been identified as outstanding and the winner of the SHARE Best Session Award. Continue reading

This is alarming

stuartphoto1

By Stuart Plotkin

 

Don’t Ignore that Alarm!Print

Ignore an alarm? Why would someone do that? Answer: because some tools send too many!

To avoid getting overloaded with meaningless alarms, it is important to implement best practices. The first best practice is to implement a software solution that is intelligent. It should:

  • Understand the limitations of your hardware
  • Take into consideration your particular workload
  • Let you know that you are heading for a problem before the problem begins
  • Eliminate useless alarms

If you have followed this first best practice, congratulations! You are headed in the right direction. Continue reading

Is Your Car or Mainframe Better at Warning You?

jerrystreetBy Jerry Street

 

Imagine driving your car when, without warning, all of the dashboard lights came on at the same time. Yellow lights, red lights. Some blinking, while others even have audible alarms. You would be unable to identify the problem because you’d have too many warnings, too much input, too much display. You’d probably panic!

That’s not likely, but if your car’s warning systems did operate that way, would it make any sense to you? Conversely, if your car didn’t have any dashboard at all, how would you determine if your car was about to have a serious problem like very low oil pressure or antifreeze/coolant? Could you even operate it safely without an effective dashboard? Even the least expensive cars include sophisticated monitoring and easy interpretation of metrics into good and bad indicators on the dashboard.

You have a need for a similar dashboard of your z/OS mainframe to alarm you. When any part of the infrastructure starts to be at risk of not performing well, you need to know it, and sooner is better. By being warned of a risk in an infrastructure component’s ability to handle your peak workload, you can avoid the problem before it impacts production users or fix what is minor before the impact becomes major. The only problem is that the dashboards and reporting you’re using today for your z/OS infrastructure, and most monitoring tools, do not provide this type of early warning.

Continue reading

Break Open Your VSM Black Box and Expose Internal Tape Processing

J._Ticic-web0By John Ticic

 

When virtual tape systems run properly, it’s great. But when there are problems, or you need to examine detailed tape information, the virtualization makes it hard to see what is really going on inside the black box.

Luckily, with z/OS we have SMF, and the virtual tape hardware vendors can define a custom record to provide measurements on the internals that can help see what is happening inside. Oracle STK VSM, for instance, generates detailed SMF data in a user record that allows us to examine tape processing in fine detail using the intelligent post-processing from the enhanced Oracle Tape support in IntelliMagic Vision.

Some of the questions that you may want answered are:

    • Why are the tape mounts taking so long?
    • How many virtual tape mounts need to be staged from real tapes?
    • Are my virtual tapes being replicated in a timely fashion?

Continue reading

HDS G1000 and Better Protecting Availability of z/OS Disk Storage

B._Phillips-web0

By Brent Phillips

 

If your job includes avoiding service disruptions on z/OS infrastructure, you may not have everything you need to do your job.

seeinsideHDSVSPG1000 (3)Disk storage in particular is typically the least visible part of the z/OS infrastructure. It is largely a black box. You can see what goes in and what comes out, but not what happens inside. Storage arrays these days are very complex devices with internal components that may become overloaded and introduce unacceptable service time delays for production work without providing an early warning.

Consequently, in the 10+ years I have been with IntelliMagic I have yet to meet a single mainframe site that (prior to using IntelliMagic) automatically monitors for threats to availability due to storage component overload. Continue reading

IBM z/OS’s Microscope – GTF

By Joe Hyde

chemistry icons background

Remember the first time you looked at pond water under a microscope? Who knew such creatures even existed, let alone in a drop of water that appears clear to the naked eye.

IBM z/OS also provides a microscope. It’s called the Generalized Trace Facility, GTF for short. With a GTF I/O summary trace you can look deeply into the inner world of your storage systems. What appears innocuous at the RMF level can have some surprising characteristics when put under the GTF microscope. However, GTF contains so much data that it is not easy to “focus” this microscope and get out all the information. Fortunately, IntelliMagic now created software to process and analyze GTF I/O summary traces, such you can focus on the gems that are hidden in GTF I/O traces using IntelliMagic Vision.

To illustrate the value of analyzing GTF data, here is a story on how I used this feature to show something otherwise invisible. I recently blogged about DB2 work volumes that were exhibiting high device busy delay times. A DS8870 firmware upgrade eliminated this problem, but only when I analyzed the GTF I/O summary trace using the new feature could I really explain why the firmware upgrade made such a marked improvement. Continue reading

Shelfware, IT’s version of Home Exercise Equipment

Brett

By Brett Allison

 

treadmill doodleMany years ago, I conducted an IT software asset audit for an insurance company. The results were surprising to say the least.  They had a large number of tools, many with overlapping functionality.

But the biggest surprise was that they had several useful tools that had never been installed.  The teams didn’t even know that they owned these licenses!  This shocked me at the time. But over the years it became apparent to me that this was far from unique. For example, an IT executive at a Fortune 100 company told us “I believe your software does what you say it will do, but what I don’t believe is that our IT staff will get it implemented.” Continue reading

How am I doin’?

stuartphoto1

By Stuart Plotkin

Question mark
Ed Koch was mayor of New York City for 12 years. He was famous for stopping people on the streets and asking them, “How am I doin’?” I have met many IT professionals who have asked the same question about their disk storage systems. “How are they doing? Are my users getting the best possible performance? Are we about to have a catastrophe? Do I need to order more? Am I ordering too much? Am I making the most use of what I have? Tell me my level of ‘risk’.” Ed was defining the metric to measure how well he was doing by how people felt about how he was doing. A mayor should try to make people feel good, but there are other metrics as well, like a city’s financial solvency as just one example. When it comes to storage systems, what are the metrics that will tell us how well our storage systems are doing? Continue reading

Pend Time Haiku

By Joe Hyde

 

Woman waiting for busHave you ever been stuck waiting with no real explanation as to why? Here’s an RMF pend time haiku that tells the “what” but not the “why” of pend time delay:

What, pend time delay?

CMR, device, other

Why so much wait time…

The I/O infrastructure for IBM z Systems is second to none in the enterprise space, including the detailed metrics delineating where time is spent in I/O processing.  One such metric is pending time (pend), which has three components:

  1. Command Response (CMR) Delay
  2. Device Busy Delay
  3. “Other” – derived by subtracting the first two from the total pending time

Continue reading

The Attack of the Disk Clones

Lee

By Lee LaFrese

 

Attack of Disk ClonesWhen I first started working in the storage industry, the only way data was backed up was via tape. In the days of $10,000 per gigabyte of disk, there was no way any sane person would propose disk back-up.  The best practice for Disaster Recovery (DR) in these days was to create a nightly tape backup and then use PTAM (pick-up truck access method) to store it offsite. Applications were unavailable during the dump and the odds of successfully restarting after a disaster were typically not in your favor. DR testing was piecemeal at best and ignored at worst. In those days, the statistics suggest that many enterprises that experienced a major data loss due to a disaster simply went out of business.

Today, it is a different world. Cheaper disk, combined with the realization that most businesses need continuous availability, has led to replication schemes designed to avoid both data loss and downtime in the event of an unexpected outage. Point in time copy is used to make local disk clones to facilitate functions such as data mining, business intelligence, and rigorous DR testing. Straightforward backups are still done, but they now often use “tapeless tape” systems that rely on spinning disk instead of on magnetic tape. The net result is that instead of two copies of data (one on disk, one on tape), many enterprises now have more copies of their data than they can keep track of.  Indeed, this proliferation of copies has been a major influence on the explosion of storage capacity that has been going on. Although there are good reasons for all of these copies, it seems that our data centers are under siege by disk clones! Continue reading

Beneficial Use of GDPS Copy Once Facility (Experimental Evidence)

Brett

By Dave Heggen

There’s no law that requires GDPS implementations to use the Copy Once Facility for Global Mirror, but in my opinion, there ought to be.

The Copy Once Facility incorporates a simple idea: Copy Once describes a group of volumes without critical data; data that does not need to be continuously copied. An old version of the data on these volumes is sufficient for recovery. The beauty of the Copy Once Facility is that it is largely an act of omission: the volumes in the Copy Once group are suspended and withdrawn from the Global Mirror Session after the initial copies are completed. An additional feature of Copy Once is that you can periodically refresh the data in the DR Site if you want to. Refresh is only required if volumes move, if volumes are added or deleted, or if data gets resized. Some installations perform a refresh once a quarter as a matter of policy to ensure they have a valid copy of the data.

Some examples of good candidates for Copy Once are volumes that provide Data Set Allocation for data to be overwritten in recovery, volumes for which an old version of the data is just fine in case of recovery, such as my TSO data, and volumes for which only the VOLSER is needed at the recovery site, such as Work/Temp/Sortwk volumes. Continue reading