Noisy Neighbors: Finding Root Cause of Performance Issues in IBM SVC Environments

By Jim SedgwickNoisy Neighbors

At some point or another, we have probably all experienced noisy neighbors, either at home, at work, or at school. There are just some people who don’t seem to understand the negative effect their loudness has on everyone around them.

Our storage environments also have these “noisy neighbors” whose presence or actions disrupt the performance of the rest of the storage environment. In this case, we’re going to take a look at an SVC all flash storage pool called EP-FLASH_3. Just a few bad LUNs have a profound effect on the I/O experience of the entire IBM Spectrum Virtualize (SVC) environment.

Continue reading

How to Prevent an “Epic” EMR System Outage

By Curtis RyanElectronic Medical Records

Protecting the availability of your IT storage is vital for performance, but it can also be critical for life. No one knows this better than the infrastructure department of major healthcare providers. Application slowdowns or outages in Electronic Medical Record (EMR), Systems or Electronic Health Record (EHR) Systems – such as Epic, Meditech, or Cerner – can risk patient care, open hospitals up for lawsuits, and cost hundreds of thousands of dollars.

Nobody working in IT Storage in any industry wants to get a call about a Storage or SAN service outage, but even minor service disruptions can halt business operations until the root cause of the issue can be diagnosed and resolved. This kind of time cannot always be spared in the ‘life and death’ environment of the users of EMR systems in healthcare providers.

Continue reading

The Circle of (Storage) Life

Lee

Storage Life Cycle

By Lee LaFrese

Remember the Lion King? Simba starts off as a little cub, and his father, Mufasa, is king. Over time, Simba goes through a lot of growing pains but eventually matures to take over his father’s role despite the best efforts of his Uncle Scar to prevent it. This is the circle of life. It kind of reminds me of the storage life cycle only without the Elton John score!

Hardware Will Eventually Fail and Software Will Eventually Work

New storage technologies are quickly maturing and replacing legacy platforms. But will they be mature enough to meet your high availability, high performance IT infrastructure needs?

Continue reading

5 Reasons Why All-Flash Arrays Won’t Magically Solve All Your Problems

Brett

By Brett Allison

 
IntelliMagic Flash Storage
In the last few years, flash storage has turned from very expensive into quite affordable. Vendors that sell all-flash arrays advertise the extremely low latencies, and those are indeed truly impressive. So it may feel like all-flash systems will solve all your performance issues. But reality is that even with game-changing technological advances like flash, the complexity of the entire infrastructure makes sure that there are still plenty of problems to run into. Continue reading

How to Diagnose IBM SVC/Storwize V7000 (Spectrum Virtualize) Replication Performance Issues: Part 2 Diagnostics

Brett

By Brett Allison

 

In part 1 of this blog series we talked about how to select your SVC/V7000, replication technology that matches your business requirements, or more likely, your budget.

Now we need to think about how you can monitor and diagnose SVC/V7000 performance issues that may be caused by replication. I run into SVC/V7000 replication issues quite frequently, and have found that not all monitoring and diagnostic tools provide a comprehensive picture of SVC/V7000 replication. Further complicating matters, the nature of the technology you have selected will influence expectations and approach to problem determination.

Continue reading

This is alarming

stuartphoto1

By Stuart Plotkin

 

Don’t Ignore that Alarm!Print

Ignore an alarm? Why would someone do that? Answer: because some tools send too many!

To avoid getting overloaded with meaningless alarms, it is important to implement best practices. The first best practice is to implement a software solution that is intelligent. It should:

  • Understand the limitations of your hardware
  • Take into consideration your particular workload
  • Let you know that you are heading for a problem before the problem begins
  • Eliminate useless alarms

If you have followed this first best practice, congratulations! You are headed in the right direction. Continue reading

Shelfware, IT’s version of Home Exercise Equipment

Brett

By Brett Allison

 

treadmill doodleMany years ago, I conducted an IT software asset audit for an insurance company. The results were surprising to say the least.  They had a large number of tools, many with overlapping functionality.

But the biggest surprise was that they had several useful tools that had never been installed.  The teams didn’t even know that they owned these licenses!  This shocked me at the time. But over the years it became apparent to me that this was far from unique. For example, an IT executive at a Fortune 100 company told us “I believe your software does what you say it will do, but what I don’t believe is that our IT staff will get it implemented.” Continue reading

Today’s Forecast: Cloudy with a Chance of Sudden Alert Storms

Lee

By Lee LaFrese

 

lightningI live in Tucson, AZ. The joke here is that most of the year the weather is so stable you can publish the forecast on a billboard – sunny, clear skies. Sounds nice, doesn’t it?  But every July we have monsoon season and the weather turns “interesting”. I suppose that you could still put the forecast on a billboard – sunny with a chance of wild and crazy storms! These storms are unpredictable and sometimes quite violent. Unfortunately, weather science is not to the point where it can clear up the uncertainty. Thankfully, in the world of storage performance we can do better.

Typical IT shops have various real-time monitors designed to raise alerts when something goes wrong. In theory this sounds like a good arrangement. If you get an alert, you can take action and fix things quickly, right? But in reality this won’t always be as effective as it would seem. On the one hand alerts may point to symptoms after you have already felt the impact. Do you want to hear that it is raining when you are already soaked to the bone? It would be much preferable to have advance warning before the problem manifests itself. On the other hand, sometimes you get more alerts than you know what to do with. It may be unclear whether these alerts indicate a real problem or whether they are just a bunch of false alarms. This is the classic “alert storm” and it can sometimes be as disruptive as a real problem. You don’t want your team to scramble because of a bunch of false positives. However, you can’t just discount alerts, because if there really is a problem you ignore them at your own peril. Continue reading

IBM TS7700 Performance – What can you do if your Cache Hit Percent drops too low and affects your tape performance?

BurtLoper

By Burt Loper

What can you do if some of your tape jobs seem to be running too long and you observe that your Cache Hit Percent for your TS7740 virtual tape system is low?

First, let’s define Cache Hit Percent.  When host systems mount a tape, there are 3 possibilities:

  1. A scratch mount – this is always a cache hit as long as the TS7700 fast ready categories are configured correctly.  These are called Fast Ready mounts by the TS7700.
  2. A specific mount where the volser being requested is already in the TS7700’s cache.  These are called Cache Read Hit mounts by the TS7700.
  3. A specific mount where the volser being requested is not in the TS7700’s cache.  These are called Cache Read Miss mounts by the TS7700.

The first two mount types usually result in very quick tape mount times on the order of 1-2 seconds.  The mount time for a Cache Read Miss mount is much longer since the data from the volser must be retrieved from a tape cartridge back into the cache before the host mount can complete.  Cache Hit Percent for an interval is the sum of the Fast Ready mounts and the Cache Read Hit mounts divided by the Total virtual mounts (i.e. Fast Ready + Cache Read Hit + Cache Read Miss).

So, a low Cache Hit Percent means that more tape mounts are encountering  longer mount times associated with a Cache Read Miss mount and may mean that jobs take longer.  A general rule of thumb is the Cache Hit Percent should be at least 80%, but greater than 90% is preferable.

Continue reading

Where art thou, Standards?

Brett

By Brett Allison

Enterprise-wide reporting across the IT stack and IT domains is a nightmare to develop and maintain.

iStock_000026008843SmallIn 2009, I developed a storage chargeback system for a customer using 17 different data sources.  The data collection was mostly automated but admittedly it wasn’t very robust.  The data was from different sources including native CLI, manual input, database queries, and automatically generated reports.  It was fairly modest but effective, however it took a part-time developer 20 to 40 hours per month to generate the reports and maintain the system after I turned it over.

Even within the storage domain there wasn’t one single interface for communicating with the devices.  Other systems, such as change management and asset management databases, had standard SQL query interfaces but some systems were inaccessible and required someone to update a flat file. Continue reading