Category Archives: SAN Environments

IntelliMagic Vision and Splunk: The Best of Both Worlds for your Infrastructure Data Intelligence

By Brett AllisonIntelliMagic Vision and Splunk

Many companies are investing in Splunk® as a modern way to analyze machine-generated big data. Splunk provides a data analytics infrastructure that can load data from any source. Splunk is a versatile platform for data mining, and it provides a development platform to create reports and dashboards from raw data. In theory it is an excellent place to integrate all of your IT infrastructure and application data, but this is not without its challenges.

Splunk is optimized to parse and process unstructured log files and lets the user create any type of report. But since it does not contain built-in knowledge of your IT infrastructure, deriving deep insights in your data is largely up to the user. As a Splunk user, it is up to you to provide context and insights into your data. If you use Splunk to read infrastructure measurement data from your distributed infrastructure, or raw SMF and RMF data from your z/OS mainframe, it will not provide any interpretation out of the box.

This is where IntelliMagic Vision comes in: with its deep knowledge of storage, SAN, and mainframe infrastructures, it is designed to bring out both in-depth insights and relationships within your data. By turning raw data into rich information based on embedded expert knowledge, IntelliMagic Vision enriches your data which can then be sent to Splunk if so desired.

This way you can benefit from the enhanced data to create in-depth and meaningful insight within Splunk.

Continue reading

Making Use of Artificial Intelligence for IT Operations Analytics / AIOps

Brent PhillipsBy Brent PhillipsArtificial Intelligence for IT Operations Analytics

Enterprise computing systems and storage operations teams have a difficult job: manage the IT infrastructure so that application availability is always efficiently maintained. But this is virtually impossible due to the complexity and disparity of the meta-data and reporting tools for all the various infrastructure components. A lack of information is not the problem, rather the great need is to derive meaningful intelligence out of all the information.

But the cloud, for example, will not work for all applications due to performance and security requirements. And outsourcing doesn’t make infrastructure performance problems go away, in fact it can make them harder to resolve. So most enterprise organizations will still benefit from and require deep infrastructure performance analysis capabilities.

In recent years, a new class of products initially called IT Operations Analytics (ITOA) have come on the market with the design objective of providing a single interface into all the data generated from disparate devices, and more importantly, helping interpret what it really means for performance, availability, and efficiency.

The idea is to employ the computer to do more of the work of deriving meaningful intelligence out of all the data. If designed correctly, this is a type of artificial intelligence which is done by the machine and enables human IT operations teams to be more effective. In 2017 Gartner coined the term AIOps which is a nice nomenclature for the capability.

Continue reading

Platform-Specific Views: Multi-Vendor SAN Infrastructure Part 2

 By Brett AllisonPlatform specific views

Each distributed system platform has unique nuances. In Part 1 of this blog, I demonstrated how having a single view to manage your multi-vendor SAN infrastructure helped ensure performance and understand the overall health, performance and capacity. What is equally important to these common views is a solution that is capable of getting the detailed performance data capable of supporting vendor specific architectures.

New storage system platforms are popping up every year, and it’s impossible to stay ahead of all of them and provide the detailed, intelligent, performance views necessary to manage your SAN infrastructure and prevent incidents. However, IntelliMagic Vision supports a wide variety of SAN platforms for which we provide our end-to-end capabilities.

Continue reading

A Single View: Multi-Vendor SAN Infrastructure Part 1

Brett Allison By Brett Allison

One of the benefits of a SAN system is the fact that it is an open system. It’s always ready to communicate with other systems, and you can add storage and infrastructure from many different vendors as it suits your business and performance needs. However, just like a calculated job interview response, this strength can also be a weakness. Even if your distributed system can communicate with each other, it’s likely that your performance management solution is less “open” in this regard.

To properly manage the performance, connections, and capacity of your distributed system, you need something better than a bunch of vendor point solutions. You need to be able to manage your entire SAN infrastructure in a single view – otherwise the cost and hassle of having different performance solutions is not worth the benefits.

Continue reading

Finding Hidden Time Bombs in Your VMware Connectivity

By Brett Allison Hidden Time Bombs

Do you have any VMware connectivity risks? Chances are you do. Unfortunately, there is no way to see them. That’s because seeing the real end-to-end risks from the VMware guest through the SAN fabric to the Storage LUN is a difficult thing to do in practice as it requires many relationships from a variety of sources.

A complete end to end picture requires:

  • VMware guests to the ESX Hosts
  • ESX hosts initiators to targets
  • ESX hosts and datastores, VM guests and datastores, and ESX datastores to LUNs.
  • Zone sets
  • Target ports to host adapters and LUNs and storage ports.

For seasoned SAN professionals, none of this information is very difficult to comprehend. The trick is tying it all together in a cohesive way so you can visualize these relationships and quickly identify any asymmetry.

Why is asymmetry important? Let’s look at an actual example:

Continue reading

No Budget for an ITOA Performance Management Solution

By Morgan Oats

no budget

Every department in every industry has the same problem: how can I stretch my budget to get the necessary work done, make my team more effective, reduce costs, and stay ahead of the curve? This is equally true for performance and capacity planning teams. In many cases, it’s difficult to get budget approval to purchase the right software solution to help accomplish these goals. Management wants to stay under budget while IT is concerned with getting a solution that solves their problems. When trying to get approval for the right solution, it’s important to be able to show how you will get a good return on investment.

Continue reading

How Much Flash Do I Need Part 2: Proving the Configuration

By Jim Sedgwick

Before making a costly Flash purchase, it’s always a good idea to use some science to forecast if the new storage hardware configuration, and especially the costly Flash you purchase, is going to be able to handle your workload. Is your planned purchase performance capacity actually too much, so that you aren’t getting your money’s worth? Or, even worse, is your planned hardware purchase too little?

In Part 1 of this blog, we discovered that our customer just might be planning to purchase more Flash capacity than their unique workload requires. In part 2 we will demonstrate how we were able to use modeling techniques to further understand how the proposed new storage configuration will handle their current workload. We will also project how this workload will affect response times when the workload increases into the future, as workloads tend to do.

Continue reading

How Much Flash Do I Need? Part 1

By Jim Sedgwick

Flash, Flash, Flash. It seems that every storage manager has a new favorite question to ask about Flash storage. Do we need to move to Flash? How much of our workload can we move to Flash? Can we afford to move to Flash? Can we afford NOT to move to Flash?

Whether or not Flash is going to magically solve all our problems (it’s not), it’s here to stay. We know Flash has super-fast response times as well as other benefits, but for a little while yet, it’s still going to end up costing you more money. If you subscribe to the notion that it’s good to make sure you only purchase as much Flash as your unique workload needs, read on.

Continue reading

The High Cost of “Unpredictable” IT Outages and Disruptions

Curtis RyanBy Curtis RyanHigh Costs of IT Outages

It is no secret that IT service outages and disruptions can cost companies anywhere from thousands up to millions of dollars per incident – plus significant damage to company reputation and customer satisfaction. In the most high profile cases, such as recent IT outages at Delta and Southwest Airlines, the costs can soar to over $150 million per incident (Delta Cancels 280 Flights Due to IT Outage). Quite suddenly, IT infrastructure performance can become a CEO level issue (Unions Want Southwest CEO Removed After IT Outage).

While those kinds of major incidents make the headlines, there are thousands of lesser known, but still just as disruptive to business, service level disruptions and outages happening daily in just about every sizeable enterprise.

The costs of these often daily occurring incidents, like an unexpected slowdown in response time of a key business application during prime shift, can have a significant cumulative financial impact that may not be readily visible in the company’s accounting system.

Continue reading

Clogged Device Drain? Use Your Data Snake!

Lee LaFreseBy Lee LaFresePlunger

Have you ever run into high I/O response times that simply defy explanation? You can’t find anything wrong with your storage to explain why performance is degraded. It could be a classic “slow drain device” condition. Unfortunately, you can’t just call the data plumbers to clean it out! What is a storage handyman to do?

Continue reading

SRM: The “Next” As-a-Service

Brett AllisonBy Brett Allison

You may have seen this article published by Forbes, stating that Storage Resource Management (SRM) is the “Next as-a-Service.” The benefits cited include the simplicity and visibility provided by as-a-service dashboards and the increasing sophistication through predictive analytics.

IntelliMagic Vision is used as-a-Service for some of the world’s largest companies, and has been since 2013. Although we do much more than your standard SRM by embedding deep expert knowledge into our software, SRM, SPM, and ITOA all fall under our umbrella of capabilities. So, while we couldn’t agree more with the benefits of as-a-service offerings for SRM software, the word “Next” in the article seems less applicable. We might even say: “We’ve been doing that for years!”

IntelliMagic Software as a Service

IntelliMagic Software as a Service (or Cloud Delivery)

Continue reading

Noisy Neighbors: Finding Root Cause of Performance Issues in IBM SVC Environments

Jim SedgwickBy Jim SedgwickNoisy Neighbors

At some point or another, we have probably all experienced noisy neighbors, either at home, at work, or at school. There are just some people who don’t seem to understand the negative effect their loudness has on everyone around them.

Our storage environments also have these “noisy neighbors” whose presence or actions disrupt the performance of the rest of the storage environment. In this case, we’re going to take a look at an SVC all flash storage pool called EP-FLASH_3. Just a few bad LUNs have a profound effect on the I/O experience of the entire IBM Spectrum Virtualize (SVC) environment.

Continue reading

How to Prevent an “Epic” EMR System Outage

By Curtis RyanElectronic Medical Records

Protecting the availability of your IT storage is vital for performance, but it can also be critical for life. No one knows this better than the infrastructure department of major healthcare providers. Application slowdowns or outages in Electronic Medical Record (EMR), Systems or Electronic Health Record (EHR) Systems – such as Epic, Meditech, or Cerner – can risk patient care, open hospitals up for lawsuits, and cost hundreds of thousands of dollars.

Nobody working in IT Storage in any industry wants to get a call about a Storage or SAN service outage, but even minor service disruptions can halt business operations until the root cause of the issue can be diagnosed and resolved. This kind of time cannot always be spared in the ‘life and death’ environment of the users of EMR systems in healthcare providers.

Continue reading

The Circle of (Storage) Life

Storage Life Cycle

Lee LaFreseBy Lee LaFrese

Remember the Lion King? Simba starts off as a little cub, and his father, Mufasa, is king. Over time, Simba goes through a lot of growing pains but eventually matures to take over his father’s role despite the best efforts of his Uncle Scar to prevent it. This is the circle of life. It kind of reminds me of the storage life cycle only without the Elton John score!

Hardware Will Eventually Fail and Software Will Eventually Work

New storage technologies are quickly maturing and replacing legacy platforms. But will they be mature enough to meet your high availability, high performance IT infrastructure needs?

Continue reading

Compressing Wisely with IBM Spectrum Virtualize

Brett AllisonBy Brett Allison

Compressing Wisely - CompressionCompression of data in an IBM SVC Spectrum Virtualize environment may be a good way to gain back capacity, but there can be hidden performance problems if compressible workloads are not first identified. Visualizing these workloads is key to determining when and where to successfully use compression. In this blog, we help you with identifying the right workloads so that you can achieve capacity savings in your IBM Spectrum Virtualize environments without compromising performance.

Today, all vendors have compression capabilities built into their hardware. The advantage of compression is that you need less real capacity to service the needs of your users. Compression reduces your managed capacity, directly reducing your storage costs.

Continue reading

How’s Your Flash Doing?

By Joe Hyde

Assessing Flash Effectiveness

How’s your Flash doing? Admittedly, this is a bit of a loaded question. It could come from your boss, a colleague or someone trying to sell you the next storage widget. Since most customers are letting the vendors’ proprietary storage management algorithms optimize their enterprise storage automatically you may not have had the time or tools to quantify how your Flash is performing.

The Back-end Activity

First, let’s use the percentage of back-end activity to Flash as the metric to answer this question. Digging a little deeper we can look at back-end response times for Flash and spinning disks (let’s call these HDD for Hard Disk Drives). I’ll also look at the amount of sequential activity over the day to help explain the back-end behavior.

Below is 5 weekdays worth of data from an IBM DS8870 installed at a Fortune 500 company. Although it’s possible to place data statically on Flash storage in the IBM DS8870, in this case, IBM’s Easy Tier is used for the automatic placement of data across Flash and HDD storage tiers. Let’s refer to this scheme generically as auto-tiering. For this IBM DS8870, Flash capacity was roughly 10% of the total storage capacity. Continue reading

Which Workloads Should I Migrate to the Cloud?

Brett AllisonBy Brett Allison

The cloud is the ultimate in infrastructure commoditization, reducing costs to their bare minimum and having end users pay for what they use. CIO’s and Directors are asking for workloads to move to the cloud primarily for cost savings reasons.

Most organizations have private clouds, and some have moved workloads into public clouds. For the purpose of this conversation, I will focus on the public cloud. According to this TechTarget article, “A public cloud is one based on the standard cloud computing model, in which a service provider makes resources, such as applications and storage, available to the general public over the internet. Public cloud services may be free or offered on a pay-per-usage model.”

The cloud provides an economic model for computing that may work well for some workloads, so the trick is to figure out which ones are a good fit.

Continue reading

5 Reasons Why All-Flash Arrays Won’t Magically Solve All Your Problems

Brett AllisonBy Brett Allison

IntelliMagic Flash StorageIn the last few years, flash storage has turned from very expensive into quite affordable. Vendors that sell all-flash arrays advertise the extremely low latencies, and those are indeed truly impressive. So it may feel like all-flash systems will solve all your performance issues. But reality is that even with game-changing technological advances like flash, the complexity of the entire infrastructure makes sure that there are still plenty of problems to run into. Continue reading

Performance Virtual Reality – Seeking the Truth in Storage Benchmarks

Lee LaFreseBy Lee LaFrese

Performance analysts likeFigure 2 - The Four Corners of Storage Benchmarking myself have a love/hate relationship with benchmarks. On the one hand, benchmarks are perceived as a great way to quantify ‘feeds and speeds’ of storage hardware. However, it is very difficult for benchmarks to be truly representative of how real applications work. Thus, I consider benchmarks a form of ‘virtual reality’; and like virtual reality, benchmarks may seem very realistic but they can deceive you. Therefore, I’ve written this article from the viewpoints of expanding your knowledge about how benchmarks work so you stay rooted in the real world.

Continue reading

How Effective is Your Adaptive Flash Cache?

Brett AllisonBy Brett Allison

Have you ever wondered whether3PAR you should enable Adaptive Flash Cache on your HPE 3PAR?

Adaptive Optimization (AO) is HPE 3PARs automatic tiering solution. It provides the user with several performance and capacity related parameters for influencing the behavior of the automatic tiering. I covered this in detail in a recent whitepaper about HPE 3PAR AO. One of the findings from that study was that in this particular customer’s environment there were too many I/Os on the 450 GB 10K RPM drives and there were not enough I/Os on the SSDs. The result was that the 450 GB 10K RPM drives were running at nearly 100% busy all the time. My suggestion was to enable Adaptive Flash Cache (AFC)  by allocating some of the under-utilized SSD capacity. AFC supplements DRAM with NAND flash devices to cache small (<64 KB) frequently accessed read blocks and ultimately to improve read response time. Continue reading

How to Diagnose IBM SVC/Storwize V7000 (Spectrum Virtualize) Replication Performance Issues: Part 2 Diagnostics

Brett AllisonBy Brett Allison

In part 1 of this blog series we talked about how to select your SVC/V7000, replication technology that matches your business requirements, or more likely, your budget.

Now we need to think about how you can monitor and diagnose SVC/V7000 performance issues that may be caused by replication. I run into SVC/V7000 replication issues quite frequently, and have found that not all monitoring and diagnostic tools provide a comprehensive picture of SVC/V7000 replication. Further complicating matters, the nature of the technology you have selected will influence expectations and approach to problem determination.

Continue reading

How to Choose the Best IBM SVC/Storwize V7000 (Spectrum Virtualize) Replication Technology: Part I Introduction

Brett AllisonBy Brett Allison

Disaster Recovery Plan

Choosing the wrong V7000/SVC replication technology can put your entire availability strategy at risk.

For most customers, there seems to be a bit of a mystery in how replication works. On the surface, it is simple. Data is written to a primary copy and either synchronously or asynchronously copied to a secondary location with the expectation that a loss of data at the primary site would result in minimal data loss and a very minimal recovery effort.

There are several types of replication, and each type has its nuances. Each of these technologies should be evaluated in light of the following business requirements:

1. Recovery Point Objective (RPO): This is the amount of data loss expressed in time units (typically minutes) that you will lose should there be a failover to the secondary site.   Continue reading

This is alarming

Stuart PlotkinBy Stuart Plotkin

Don’t Ignore that Alarm!Print

Ignore an alarm? Why would someone do that? Answer: because some tools send too many!

To avoid getting overloaded with meaningless alarms, it is important to implement best practices. The first best practice is to implement a software solution that is intelligent. It should:

  • Understand the limitations of your hardware
  • Take into consideration your particular workload
  • Let you know that you are heading for a problem before the problem begins
  • Eliminate useless alarms

If you have followed this first best practice, congratulations! You are headed in the right direction. Continue reading

How to Detect and Resolve “State in Doubt” Errors

Brett AllisonBy Brett Allison


One of our customers recently came across a problem in their environment that I think warrants some attention. The VMWare administrator had gone to the storage team and asked if they saw any issues on the Fabric or IBM SVC storage environment because the infamous “state in doubt” message was popping up in the /var/log/vmkernel log file messages were similar to what is shown below:

<YYYY-MM-DD>T<TIME> esx12 vmkernel: 116:03:44:19.039 cpu4:4100)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe: NMP device “sym.029010111831353837” state in doubt; requested fast path state update…

The error indicated that there was a time-out by the HBA because the command took longer than 5 seconds to complete.

Continue reading

New Visibility into FAST/FAST-VP for Large Scale EMC Environments

Brett AllisonBy Brett Allison

Chances are you don’t drive blindfolded. [url=][img][/img][/url];[url=][img][/img][/url];[url=][img][/img][/url];[url=][img][/img][/url]However, we often run our complicated storage environments with very little visibility. Not by choice, but because status quo in the industry is to
find out about problems only after they are already impacting production users.

Why is this the case?  Well, have you ever tried to get an enterprise view of the health risks in your EMC FAST-VP environment using vendor tools? It’s not easy.

EMC’s FAST-VP or Fully Automated Storage Tiering – Virtual Provisioning technology, is designed to improve the overall performance and availability by placing the right blocks of data on the right drive technology.  This is enabled by policies created by storage administrators that apply tier capacity constraints to storage groups.  While there has been a significant amount of effort by EMC to simplify and generalize approaches for managing FAST-VP, and it can make a significant impact on performance and overall cost, it still can be a confusing solution. Continue reading

Shelfware, IT’s version of Home Exercise Equipment

Brett Allison

treadmill doodle

By Brett Allison

Many years ago, I conducted an IT software asset audit for an insurance company. The results were surprising to say the least.  They had a large number of tools, many with overlapping functionality.

But the biggest surprise was that they had several useful tools that had never been installed.  The teams didn’t even know that they owned these licenses!  This shocked me at the time. But over the years it became apparent to me that this was far from unique. For example, an IT executive at a Fortune 100 company told us “I believe your software does what you say it will do, but what I don’t believe is that our IT staff will get it implemented.” Continue reading

How are my Disk Storage Systems doin’?

Stuart PlotkinBy Stuart Plotkin

Question markEd Koch was mayor of New York City for 12 years. He was famous for stopping people on the streets and asking them, “How am I doin’?” I have met many IT professionals who have asked the same question about their disk storage systems.

“How are they doing? Are my users getting the best possible performance? Are we about to have a catastrophe? Do I need to order more? Am I ordering too much? Am I making the most use of what I have? Tell me my level of ‘risk’.”

Ed was defining the metric to measure how well he was doing by how people felt about how he was doing. A mayor should try to make people feel good, but there are other metrics as well, like a city’s financial solvency as just one example. When it comes to storage systems, what are the metrics that will tell us how well our storage systems are doing? Continue reading

Four Steps You Should Take to Identify, Resolve and Prevent IBM SVC Front-End Imbalance

Brett AllisonBy Brett Allisonimbalance

Did you know you could be at risk of a performance meltdown while you still have plenty of front-end bandwidth?

An imbalanced front-end can cripple the performance of your IBM SVC system. An imbalanced front-end is another way of saying that too much workload is handled by too few ports. This leads to buffer credit shortages, increases in latency, and low throughput. It is very easy to create imbalances within an IBM SVC system’s front-end, and it can be fairly difficult to see it happening without the proper tools. Continue reading

Performance Modeling for Disk Storage Systems – Is it for You?

Lee LaFreseBy Lee LaFrese

In social situations, people sometimes bring up what they do for a living. When I say, “I am a Storage Performance consultant,” I usually get blank stares. When I am asked for more details, I usually reply “I do a lot of modeling.” This often elicits snickers which is entirely understandable. Anyone that has met me knows that I don’t have the physique of a model! When I add that it is MATHEMATICAL modeling that I am talking about it usually clears up the confusion.

In fact, folks are typically impressed, and I have to convince them that what I do is not rocket science. Of course, a lot of rocket science is not “rocket science” either, if you use the term as a euphemism for something very complex and challenging to understand. In this article, I will try to help you understand how computer system performance modeling is done, specifically for disk storage systems. Hopefully, you will have a better appreciation of performance modeling after reading this and know where it can be used and what its limitations are. Continue reading

All I Want for Christmas is…Time

Jerry StreetBy Jerry Streetgift

With the holiday season upon us, I occasionally think of what might be waiting for me to unwrap. Will it be another gift card? I hope not. Gift cards are someone’s way of saying, “I appreciate you so much that you should get your own present.” There are many things that I would enjoy getting as a present, but the one thing that would actually make my life better would be a couple of extra hours in my day. I need more time! Unfortunately, I can’t get the earth to slow down and make a full revolution in 26 hours instead of 24. So I need tools to save me time within the 24 hours that I’m scripted to have.

As IT Performance professionals, we are continually asked to do more.  Systems grow more complex, analyses need to be delivered faster, and dollars have to be spent more wisely than ever. When professional life demands require more time, you can either give up your personal time or let the quality of your work suffer. I don’t want to do either of those things so I would choose to do my job both faster and better. A tool that helps me accomplish both goals is IntelliMagic Vision. Continue reading