Modeling – Is it for You?


By Lee LaFrese


In social situations, people sometimes bring up what they do for a living. When I say, “I am a Storage Performance consultant,” I usually get blank stares. When I am asked for more details, I usually reply “I do a lot of modeling.” This often elicits snickers which is entirely understandable. Anyone that has met me knows that I don’t have the physique of a model! When I add that it is MATHEMATICAL modeling that I am talking about it usually clears up the confusion. In fact, folks are typically impressed, and I have to convince them that what I do is not rocket science. Of course, a lot of rocket science is not “rocket science” either, if you use the term as a euphemism for something very complex and challenging to understand. In this article, I will try to help you understand how computer system performance modeling is done, specifically for disk storage systems. Hopefully, you will have a better appreciation of performance modeling after reading this and know where it can be used and what its limitations are.

Performance modeling is based on queuing theory. This is a branch of mathematics that describes how long you have to wait in line. Queuing theory has many practical applications in areas such as traffic planning, communications networks, and computer systems. In graduate school, I did a modeling project that studied the utilization of bathrooms at the university theater during performances. One unique finding from the project was that arrivals to the men’s room followed an exponential distribution, suggesting they were random in nature. Conversely, visitors to the women’s room were hyper-exponential, suggesting that they arrived in bunches. Thus, I was able to prove mathematically with measured data that women do go to the bathroom in groups. It is not just an urban legend!

When it comes to computer system modeling, there are different approaches that can be taken. The most detailed approach is a discrete event simulation, sometimes called Monte Carlo simulation. Doing an accurate discrete event simulation requires a very detailed knowledge of the hardware, software, and interactions. They tend to be compute-intensive and not very reusable. But implemented correctly, a discrete event simulation can be very accurate and provide insight into things like the expected variability of certain performance metrics.

Another common approach is what I call “back of the envelope” modeling. Actually, this is just applying common sense supplemented by a few simple calculations. For example, if a disk drive has an average base latency (seek + rotational latency + overhead + data transfer) of 10 ms or 0.01 sec, how many IO/sec can it run before it hits 100% utilization? Since a drive can only do one I/O at a time, this is simply calculated as one divided by latency. In this case, the resulting estimate is 100 IO/sec. This is fine as a first approximation, but in reality modern disk drives include seeking optimization algorithms. When there are queued operations, the average seek time goes down, and the maximum IO/sec will actually be higher. Back of the envelope modeling is useful, but typically relatively rough.

A more rigorous approach is analytic modeling. This uses queueing networks to estimate things like how long a job will take or the response time of an I/O operation. The figure below illustrates how a queueing network may be applied to model storage performance metrics.












The workload is abstracted in terms of the number of IO/sec, size of I/Os, read and write %, cache hit %, etc. The workload is then parsed and assigned to various hardware components such as adapters, internal processors, channels, drives and internal connections. These components are modeled based on their capabilities including how many operations per second each resource can handle and how long each operation takes. The subcomponent models calculate queue lengths, queue times, and utilizations. By adding up all the service times and queue times of the individual component models, the overall response time of the queueing network may be calculated. Note that this approach only models the average behavior of the system. To gain insight into distributions, a simulation model would be required. However, an accurate measure of average behavior can be very valuable for capacity planning or comparing hardware alternatives.

IntelliMagic Direction is an example of an analytic storage performance model. IntelliMagic has been developing storage models for over 20 years based on deep knowledge of the internal hardware capabilities. These models cover storage hardware from many of the leading vendors including HDS, EMC, IBM and others. If you are looking at new hardware acquisitions or just want to have a good sense of how much performance capability is left in your current storage systems, a IntelliMagic Direction study can be very valuable. With IntelliMagic Direction, your measured peak workload is used as input and expected response times and utilization of your storage components are estimated based on expected I/O growth.   Hardware configuration changes or migration to new storage platforms may be easily evaluated. IntelliMagic also leverages built-in knowledge about the hardware in IntelliMagic Vision, which is a solution focused on proactive monitoring and infrastructure analytics.

IntelliMagic Direction studies are available as a service from IntelliMagic. As part of an IntelliMagic Direction study, IntelliMagic’s experienced consultants use your data to create models and recommend what changes may help optimize your storage performance. Although some storage vendors offer services, only IntelliMagic can provide an unbiased, vendor independent viewpoint on what will best meet your performance needs.

Do you see modeling in your future? If so, please contact us at and we will introduce you to a “supermodel!”

2 thoughts on “Modeling – Is it for You?”

  1. Ralph Hennen says:

    You’re opening discussion brings to mind another concern when looking at storage performance modeling from an engineering perspective. Too much of the time I think we see performance engineering as troubleshoot alone. To my mind this is akin to missing the path of the hurricane while being focused only on evacuating as the storm hits. Granted, troubleshooting is the most adrenalin producing part of performance engineering, but there are up stream characteristics of storage performance that can alert in advance for the oncoming storm. We can see these in anomaly patterns, and also in slight changes to characteristic performance profiles. I’ve used these many times to show the on slot of performance storms. To populate these models requires correct data (adequate sample times, validated condensation/calculation of variables). As we go up the performance report chain from 5 sec response/service time data charts through anomaly charts to profile charts, it is important to simplify the information for easy understanding and provide a drill down method to take and observed anomaly and drill into the troubleshooting data to find the source.

    I have also found a lack of commitment of organizations to the early warning alerts. These companies seem to provide change procedures after the performance goes down the tubes but have no means of getting changed made in advance of the inevitable performance problem. That’s a serious, costly issue when not addressed.

  2. Ralph Hennen says:

    Modeling?? — I can’t imaging a life without it.

    As a young child I watched my mother consistently prepare dinners for a family of 5 with a single burner on a stove that had 4 burners. When I asked why she didn’t use the others and save time she said she was saving electricity. Humm, I said. Talk about performance modeling where is counts, as a 7 year old waiting for food (clearly a missed SLA) was like punishment. From then on I saw the world as a series of metaphors and models, and came to adopt the idea: data -> information -> model -> question => [data -> information …](i=1 -> N).

    The thing that has always attract me to storage performance modeling is the lack of good models, though that’s subject to what is meant my “good.” These are finite state machines for which there are decades of reasonable models, few of which are in play especially in the OpenSystems world. But it’s not cheap to do real performance modeling and do it well. Even the question of good data seems, at times, to come into question on OpenSystems platforms. Storage vendors frequently produce more and more data and call it a performance model. But slowly but surely packages are appearing that produce reasonable models with the analytics to support them.

    However, when in doubt I come up with my own performance models that describe the observed performance phenomena — good, or bad. (Mom, why are you using 4 fibre ports for 200 servers when you have 64 fibre ports? And you ask why the application is so slow? Maybe you’re saving electricity or light.)

    My follow on discussion is: what is the value of a simulation, given a model?

Leave a Reply