Automating Analysis of z/OS Alerts

jerrystreet

By Jerry Streetz/OS Alert

You are nowhere near your workstation, and you receive an urgent text that z/OS has an unexpected increase in CPU utilization. While the alert is beneficial, it would be more helpful if you had all the information you needed to diagnose the issue in hand. We discussed the pains of bad alerts in part 1 of this blog. The remainder of this blog discusses how you can integrate your z/OS alerting with the rich artificial intelligence of IntelliMagic Vision.

Rather than just another alarm in a sea of alarms, you should be able to expect more from your alerts. Especially with something as important as your z/OS, alerts should provide actionable recommendations to the problems they are alerting you to. IntelliMagic Vision does just that. IntelliMagic Vision can be configured to send detailed root cause analysis reports based on pre-defined z/OS alerts allowing you to understand the impact and urgency of the alert and guiding subsequent investigation.

Continue reading

Do Not Settle for Bad z/OS Alerts

jerrystreetBy Jerry Streetz/os alert

When I was growing up, long car rides were a bit challenging due to our car’s alerting system: smoke, steam, horrible clunking noises, or dead silence. Everything was great until Betsy (my mom always named our car Betsy) did not move anymore. Then we had to get the car to a mechanic who was an expert at making us feel ignorant and took a lot of our money to fix something simple (usually).

Then cars started getting better at alerting the operator about simple problems, but you still had to take the car to a mechanic to fix the problem. Today, between YouTube, Google, and Internet forums, you can often get the steps it takes to resolve a lot of these alerts for a whole lot less money; however, there is still more that needs to be done between getting an alert from your car and solving/fixing the issue.

What if your car could alert you to an issue, do an Internet search for you, and send a fixit video to your smartphone before you could even get to a safe place to check your smartphone? That kind of intelligence would be convenient. The same principle applies to alerts you get from your z/OS Operating System.

When I started working in Operations, when we still called it “MVS”, an Operator would see an alert and call me (usually at night). I would sometimes have to drive into the office or call another Systems Programmer, analyze the alert, and act upon it. What if now, the alert could automatically perform root cause analysis and send supporting reports to your smartphone?

One of the major problems with alerts in IT is that digitally oriented machines are generating so many that the Operators become desensitized to them. Projects to “clean up” alerts may end up filtering out necessary ones. I know of one project that was started to reduce alerts, which was intended to improve the alerting, and it created more problems than it solved. Many customers even ask for a single pane of glass to contain alerts and want them to be smarter. This can wind up being a single glass of pain, that adds no value if alerts don’t lead to actionable solutions.

Continue reading

Impact of z14 on Processor Cache and MLC Expenses

Todd Havekost

By Todd Havekost

Expense reduction initiatives among IT organizations typically prioritize efforts to reduce IBM Monthly License Charge (MLC) software expense, which commonly represents the single largest line item in the mainframe budget.

On current (z13 and z14) mainframe processors, at least one-third and often more than one-half of all machine cycles are spent waiting for instructions and data to be staged into level one processor cache so that they can be executed. Since such a significant portion of CPU consumption is dependent on processor cache efficiency, awareness of your key cache metrics and the actions you can take to improve cache efficiency are both essential.

This is the final article in a four-part series focusing on this vital but often overlooked subject area. (You can read Article 1, Article 2, and Article 3.) This article examines the changes in processor cache design for the z14 processor model. The z14 reflects evolutionary changes in processor cache from the z13 in contrast to the revolutionary changes that occurred between the zEC12 and z13. The cache design changes for the z14 were particularly designed to help workloads that place high demands on processor cache. These “high RNI” workloads frequently experienced a negative impact when migrating from the zEC12 to z13.

Continue reading

Optimizing MLC Software Costs with Processor Configurations

Todd Havekost

By Todd Havekost

This is the third article in a four-part series focusing largely on a topic that has the potential to generate significant cost savings, but which has not received the attention it deserves, namely processor cache optimization. (Read part one here and part two here.) Without an understanding of the vital role processor cache plays in CPU consumption and clear visibility into the key cache metrics in your environment, significant opportunities to reduce CPU consumption and MLC expense may not be realized.

This article highlights how optimizing physical hardware configurations can substantially improve processor cache efficiency and thus reduce MLC costs. Three approaches to maximizing work executing on Vertical High (VH) logical CPs through increasing the number of physical CPs will be considered. Restating one of the key findings of the first article, work executing on VHs optimizes processor cache effectiveness, because its 1-1 relationship with a physical CP means it will consistently access the same processor cache.

Continue reading

Reduce MLC Software Costs by Optimizing LPAR Configurations

Todd HavekostBy Todd Havekost

A prominent theme among IT organizations today is an intense focus on expense reduction. For mainframe departments, this routinely involves seeking to reduce IBM Monthly License Charge (MLC) software expense, which commonly represents the single largest line item in their budget.

This is the second article in a four-part series focusing largely on a topic that has the potential to generate significant cost savings but which has not received the attention it deserves, namely processor cache optimization. (Read part one here).  Without an understanding of the vital role processor cache plays in CPU consumption and clear visibility into the key cache metrics in your environment, significant opportunities to reduce CPU consumption and MLC expense may not be realized.[1]

This article focuses on changes to LPAR configurations that can improve cache efficiency, as reflected in lower RNI values. The two primary aspects covered will be optimizing LPAR topology, and increasing the amount of work executing on Vertical High (VH) CPs through optimizing LPAR weights. Restating one of the key findings of the first article, work executing on VHs optimizes processor cache effectiveness, because its 1-1 relationship with a physical CP means it will consistently access the same processor cache.[2]

Continue reading

Lower MLC Software Costs with Processor Cache Optimization

Todd-HavekostBy Todd Havekost

It is common in today’s challenging business environments to find IT organizations intensely focused on expense reduction. For mainframe departments, this typically results in a high priority expense reduction initiative for IBM Monthly License Charge (MLC) software, which usually represents the single largest line item in their budget.

This article begins a four-part series focusing largely on a topic that has the potential to generate significant cost savings but which has not received the attention it deserves, namely processor cache optimization. The magnitude of the potential opportunity to reduce CPU consumption and thus MLC expense available through optimizing processor cache is unlikely to be realized unless you understand the underlying concepts and have clear visibility into the key metrics in your environment.

Subsequent articles in the series will focus on ways to improve cache efficiency, through optimizing LPAR weights and processor configurations, and finally on the value of additional visibility into the data commonly viewed only through the IBM Sub-Capacity Reporting Tool (SCRT) report. Insights into the potential impact of various tuning actions will be brought to life with data from numerous real-life case studies, gleaned from experience gained from analyzing detailed processor cache data from 45 sites across 5 countries.

Processor cache utilization plays a significant role in CPU consumption for all z processors, but that role is more prominent than ever on z13 and z14 models. Achieving the rated 10% capacity increase on a z13 processor versus its zEC12 predecessor (despite a clock speed that is 10% slower) is very dependent on effective utilization of processor cache. This article will begin by introducing the key processor cache concepts and metrics that are essential for understanding the vital role processor cache plays in CPU consumption.

Continue reading

Making Use of Artificial Intelligence for IT Operations Analytics / AIOps

B._Phillips-web0By Brent PhillipsArtificial Intelligence for IT Operations Analytics

Enterprise computing systems and storage operations teams have a difficult job: manage the IT infrastructure so that application availability is always efficiently maintained. But this is virtually impossible due to the complexity and disparity of the meta-data and reporting tools for all the various infrastructure components. A lack of information is not the problem, rather the great need is to derive meaningful intelligence out of all the information.

But the cloud, for example, will not work for all applications due to performance and security requirements. And outsourcing doesn’t make infrastructure performance problems go away, in fact it can make them harder to resolve. So most enterprise organizations will still benefit from and require deep infrastructure performance analysis capabilities.

In recent years, a new class of products initially called IT Operations Analytics (ITOA) have come on the market with the design objective of providing a single interface into all the data generated from disparate devices, and more importantly, helping interpret what it really means for performance, availability, and efficiency.

The idea is to employ the computer to do more of the work of deriving meaningful intelligence out of all the data. If designed correctly, this is a type of artificial intelligence which is done by the machine and enables human IT operations teams to be more effective. In 2017 Gartner coined the term AIOps which is a nice nomenclature for the capability.

Continue reading

DB2 for z/OS Buffer Pool Simulation

By Jeff BergerDB2 z/os buffer pool computer memory

For many years the price of z/OS memory has been decreasing, and IBM has been pushing the idea of large amounts of memory. DB2® for z/OS has virtually eliminated its virtual storage constraints.

DB2 performs best when it has lots of memory (i.e. real memory). Memory is still not free, but large memory can save money by reducing CPU consumption while at the same time reducing DB2 transaction response time. More memory also increases DB2 availability in cases where it is necessary to dump the DB2 address space, because if dumping causes paging to occur, the dump will take longer, and DB2 is not available during that time.

DB2 Buffer Pool Analyzer for z/OS

The first thing that comes to mind for the use of large memory is to increase the size of DB2 buffer pools. This can reduce the number of synchronous I/Os by increasing the buffer hit ratio. Furthermore, reducing the number of synchronous I/Os will reduce CPU consumption, because I/Os cost CPU time.

Continue reading

IntelliMagic Sessions at SHARE Sacramento

IntelliMagic will be at SHARE Sacramento, March 11 – 16, hosting a Lunch & Learn and presenting four performance and capacity sessions you won’t want to miss! Join us at booth #205 to see the latest in intelligent analytics for performance and capacity planners.

IntelliMagic sessions at SHARE Sacramento

Continue reading

Credit Card Transaction Timeouts – IOSQ Analysis

By Joe HydeCredit Card Transactions - IOSQ Analysis

Black Friday is one of the busiest transaction days of the year, and it often seems like an easy payday for most participating companies. But have you ever wondered what performance preparations must be made to accommodate the overly inflated volume of credit card transactions?

A large global bank was struggling because their latest version of a credit card swipe application was failing at high volume load testing. In preparations for Black Friday they needed the application to handle a much higher number of credit card swipes, but periodically their credit card transactions were timing out.

When we became involved they had spent weeks on the issue, thousands of man-hours and had incurred significant financial penalties because of the delays. They had spent the past two weeks on day-long conference calls with over 100 people on the phone (often forcing some off the line so others could join) all pointing fingers at one another. The performance team, application team, storage team, and the vendor all blamed one another for the timeouts.

You see, the delays had a significant revenue impact to their business as any credit card approval that timed out had to be sent over a competitor’s exchange, incurring significant fees. After the two weeks of conference calls proved to be unsuccessful in determining the root cause of the problem, they called us in. We took a deep dive into some of the key storage metrics and were able to provide the key insight in determining root cause of the timeouts in a few days of research and additional data acquisition.

Continue reading

zHyperLink: The Holy Grail of Mainframe I/O?

GilbertzHyperLink

By Gilbert Houtekamer, Ph.D.

Now that it has become harder and harder to make the processor faster, IBM is looking for other ways to make their mainframes perform better.

This has resulted in new co-processors for compression and encryption and now also, with the z14 processor, in a new technology called zHyperLink. This new I/O connectivity aims to significantly reduce the I/O response time, while at the same time not increase the processor (CP) load.

This new technology comes with a set of promises and restrictions that will cause you to rethink the design of your storage and replication infrastructure. The days of distance limitations are back, which has big implications for synchronous replication in particular.

Continue reading

AI and z/OS Performance and Capacity Analysis: 2018 Predictions

B._Phillips-web0By Brent Phillips2018 Predictions on AI and z/OS Performance and Capacity Analysis

2018 is gearing up to be a watershed year for z/OS performance and capacity professionals.

Industry analysts have been talking for some years now about Artificial Intelligence (AI) and the role it will play in our work. But what that truly means, and its value in day-to-day operations has not yet been understood or realized by most professionals in this field.

There are many different types of AI, but not all are useful in making the computer do the kind of infrastructure performance and availability health assessment work that is no longer feasible for human analysts to proactively do every day. But when properly designed and deployed, it has proven very effective to implement automated, AI-driven decision making about what all the data means for identifying current or near-term performance problems and their root-causes.

Continue reading

Health Check for Your z/OS Systems Should Include Coupling Facility Activity

 

As another New Year begins, most of us are open to some new practices that improve life both at home and at work. Probably the most common challenge we face personally in the new year is our weight. I put on my share, and it’s never easy to get back on track! Reducing calories on the intake side is a good start, however, coupled with a regular regimen of exercise, diet has a multiplying effect. Suddenly, the effects of exercise go far beyond the goal that drove the change in activity, and it pervades other areas.

It’s no different in IT.

(Here’s a reference of the exercise multiplier and an example article showing the many other exercise benefits.)

Whether you have just completed your year-end budgets or not, it won’t be long before cost pressures, time pressures and all the other pressures of new projects squeeze many of us into ‘surviving the next day’ versus moving along a progressive path of improvement. I’ve been there.

Continue reading

What is AIOps? The Benefits Explained

By Morgan Oats

August 2017 ushered in a new term heralded by Gartner in the form of AIOps: Artificial Intelligence for IT Operations. The term has certainly generated a lot of market hype, but what exactly is AIOps, and how can it help support your business operations?

Gartner’s official definition for AIOps is:

“AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies, and presentation technologies.”

AIOps: Artificial Intelligence for IT Operations

Source: Gartner [https://blogs.gartner.com/andrew-lerner/2017/08/09/aiops-platforms/]

Continue reading

IntelliMagic Vision in GDPS Environments

Gilbert

By Gilbert Houtekamer, Ph.D.

GDPS is IBM’s most advanced Business Continuity solution for the zSeries platform. Its design leverages the availability of enough data processing components in multiple sites, such that IT services may continue even if one site suffers a complete failure. All critical data, disk or tape resident, is mirrored between sites using synchronous and/or asynchronous remote copy.

GDPS is about maximizing availability. However, as those in both performance and business can attest, availability is not always a binary event (such as a disaster). For example, if critical data is available on your Disk Subsystems but poor I/O response times are impacting your applications, are you meeting your availability objectives? The data storage and data replication involved in these types of environments is complex, and this is magnified by the difficulty of seeing how it is operating.

Continue reading

An Effective Solution to the Mainframe Skills Shortage for z/OS Performance and Capacity Professionals

Todd-Havekost

By Todd HavekostMainframe Skills Shortage

The mainframe skills shortage for z/OS performance analysts and capacity planners has left many organizations struggling to ensure availability. Current experts are often overworked and lack the manpower, resources, or tools necessary to effectively perform their jobs. This is often caused by a reliance on manual processes and the limitations of in-house developed solutions, rather than leveraging the built-in, automated capabilities provided by an effective performance solution.

This can put the availability of the infrastructure and applications at risk. Many enterprises are finding it to be difficult to replace or supplement z/OS performance skills that are becoming increasingly scarce.

In his blog, “Bridging the z/OS Performance & Capacity Skills Gap,” Brent Phillips wrote about the availability and efficiency benefits that can be gained from modernizing the analysis of the mainframe infrastructure using processes that leverage artificial intelligence.

Modernized analytics can also help solve the skills shortage by making current staff more productive and getting newer staff up to speed more rapidly. An effective analytics solution that will expedite the acquisition of skills for z/OS performance analysts and capacity planners needs 5 key attributes. These attributes are covered in detail with illustrations in the paper at the link at the bottom. In this blog I will briefly introduce 3 of the key attributes.

Continue reading

Root Cause Analysis for z Systems Performance – Down the Rabbit Hole

By Morgan OatsDown the Rabbit Hole - Root Cause Analysis

Finding the root cause of z Systems performance issues may often feel like falling down a dark and endless rabbit hole. There are many paths you can take, each leading to further possibilities, but clear indicators as to where you should really be heading to resolve the problem are typically lacking. Performance experts tend to rely on experience to judge where the problem most likely is, but this may not always be adequate, and in the case of disruptions, time is money.

Performance experts with years of experience are more likely able to resolve problems faster than newer members of the performance team. But with the performance and capacity skills gap the industry is experiencing, an approach is needed that doesn’t require decades of experience.

Rather than aimlessly meandering through mountains of static reports, charts, and alerts that do more to overwhelm our senses than assist in root cause analysis, performance experts need a better approach. An approach that not only shines a light down the rabbit hole, but tells us which path will lead us to our destination. Fortunately, IntelliMagic Vision can be your guide.

Continue reading

5 Reasons IBM z/OS Infrastructure Performance & Capacity Problems are Hard to Predict and Prevent

B._Phillips-web0By Brent Phillips

Solving z/OS infrastructure performance and capacity problems is difficult. Getting ahead of performance and capacity problems before they occur and preventing them is more difficult still. This is why it takes years, and decades even, for performance analysts and capacity planners to become experts.

And together, with the rapid retiring of the current experts, the difficulty in becoming an expert is why the performance and capacity discipline for the mainframe is experiencing a significant skills gap. It is simply too difficult and time consuming to understand what the data means for availability, let alone derive predictive intelligence about upcoming production problems within the complex IBM z Systems infrastructure.

The primary root causes of this performance and capacity management problem are:

Continue reading

Platform-Specific Views: Multi-Vendor SAN Infrastructure Part 2

By Brett AllisonPlatform specific views

Each distributed system platform has unique nuances. In Part 1 of this blog, I demonstrated how having a single view to manage your multi-vendor SAN infrastructure helped ensure performance and understand the overall health, performance and capacity. What is equally important to these common views is a solution that is capable of getting the detailed performance data capable of supporting vendor specific architectures.

New storage system platforms are popping up every year, and it’s impossible to stay ahead of all of them and provide the detailed, intelligent, performance views necessary to manage your SAN infrastructure and prevent incidents. However, IntelliMagic Vision supports a wide variety of SAN platforms for which we provide our end-to-end capabilities.

Continue reading

A Single View: Multi-Vendor SAN Infrastructure Part 1

By Brett Allison

One of the benefits of a SAN system is the fact that it is an open system. It’s always ready to communicate with other systems, and you can add storage and infrastructure from many different vendors as it suits your business and performance needs. However, just like a calculated job interview response, this strength can also be a weakness. Even if your distributed system can communicate with each other, it’s likely that your performance management solution is less “open” in this regard.

To properly manage the performance, connections, and capacity of your distributed system, you need something better than a bunch of vendor point solutions. You need to be able to manage your entire SAN infrastructure in a single view – otherwise the cost and hassle of having different performance solutions is not worth the benefits.

Continue reading

The Roots and Evolution of the RMF and SMF for Mainframe Performance Data (Part 2)

George DodsonBy George Dodson

This is part 2 of this blog. If you haven’t read the first section, you can read that here.

After being announced as a product in 1974, RMF was further expanded to provide more capabilities such as RMF Monitor 2 and RMF Monitor 3. These provided real time insight into the internal workings of z/OS to help understand and manage the performance of the z/OS infrastructure. The value of the RMF performance measurement data has been proven over the decades as it, or a compatible product from BMC named CMF, is used in every mainframe shop today. Many new record types have been added in recent years as the z/OS infrastructure capabilities continue to evolve.

A related product – Systems Management Facility or SMF – was originally created to provide resource usage information for chargeback purposes. SMF captured application usage statistics, but was not always able to capture the entire associated system overhead. Eventually, SMF and RMF were expanded to capture detailed statistics about all parts of the mainframe workloads and infrastructure operation, including details about third party vendor devices such as storage arrays. RMF and SMF now generate what is likely the most robust and detailed performance and configuration data of any commercial computing environment in the data center.

As the data sources to report on the performance of the workloads and the computer infrastructure grew, different performance tools were created to display and analyze the data. The information in the data was very complex and the total amount of data captured is overwhelming, creating challenges to identify performance problems. Typically, this requires analysts who have extensive insight into the specific infrastructure areas being analyzed, and an understanding of how they respond to different applications workloads. As applications have grown more complex, more real-time, with more platforms and components involved, the performance analysis task also has become more difficult.

Continue reading

The Roots and Evolution of the RMF and SMF for Mainframe Performance Data (Part 1)

George Dodson

By George Dodson

This blog originally appeared as an article in Enterprise Executive.

Computer professionals have been interested in determining how to make computer applications run faster and determine the causes of slow running applications for more than 50 years. In the early days, computer performance was in some ways easy because electronic components were soldered in place. To understand what was happening at any point in the circuitry, we simply attached a probe and examined the electronic wave information on an oscilloscope.

Eventually, we were able to measure activity at key points in the computer circuitry to determine things like CPU Utilization, Channel Utilization and Input/Output response times. However, this method still had many shortcomings. First, the number of probes was very small, usually less than 40. Secondly, this method gives no insight into operating system functions or application operations that may be causing tremendous overhead. And of course, when integrated circuits were developed, the probe points went away.

In 1966 I joined an IBM team that was focusing on a better way to conduct benchmarks in what was then named an IBM Systems Center. Customers considering computer upgrades would come to our data center to determine how their programs would operate on newly released hardware. But it was simply not possible to host every customer in this way.

Continue reading

RPO Replication for TS7700 Disaster Recovery

Merle SadlerBy Merle Sadler

This blog is on the topic of the impact of zero Recovery Point Objective (RPO) for Mainframe Virtual Tape Replication focusing on the IBM TS7700 replication capability.

Have you ever thought about how much money you will need to save for retirement? I was talking with my financial advisor the other day and decided that whatever you think you need you should double. You can plan on having social security but if social security fails then retirement plans start to look not so rosy.

budget tradoffs for RTO and RPO

The same thing applies to computer systems. Customers spend a lot of time and money on Disk replication, reducing both RPO and RTO. But what if an application corrupts the data or a virus is uploaded? Corrupted or infected data is replicated just as easily as good data. This lends to making offline backup copies of disk files which also need to be replicated.

Continue reading

6 Signs You Already Have a Skills Gap for z/OS Performance and Capacity Planning

B._Phillips-web0By Brent Phillips

The mainframe skills gap is a well-known issue, but most of the focus is on mainframe application development. A large z/OS mainframe organization may have thousands of application developers but only 20 or fewer performance & capacity planning staff. Even though fewer in number, these IT staff have an outsized impact on the organization.

The problem, however, is not just about recruiting new IT staff members to the team. The road to becoming a true z/OS performance and capacity (perf/cap) expert is far longer and more difficult than what is necessary for a programmer to learn to code in a mainframe programming language like COBOL. Consequently, it is not feasible to fill the performance and capacity planning gap with new recruits, and recruiting experienced staff from the short supply is difficult. Even teams that have all the headcount positions filled very often exhibit at least some of the signs that they are being negatively impacted by insufficient levels of expert staff.

A primary contributor to the problem is the antiquated way of understanding the RMF and SMF performance data that most sites still use. The way this data is processed and interpreted not only makes it difficult for new IT staff to learn the job, but it also makes the job for the existing experts more difficult and time consuming.

Here are six signs that indicate your z/OS performance and capacity team would benefit by modernizing analytics for your infrastructure performance and configuration data.

Continue reading

z/OS Performance Monitors – Why Real-Time is Too Late

By Morgan Oatsperformance monitor

Real-time z/OS performance monitors are often advertised as the top tier of performance management. Real-time monitoring means just that: system and storage administrators can view performance data and/or alerts indicating service disruptions continuously as they happen.

In theory, this enables administrators to quickly fix the problem. For some companies, service disruptions may not be too serious if they are resolved quickly enough. Even though those disruptions could be costing them a lot more than they think, they believe a real-time monitor is the best they can do to meet their business needs.

For other companies, optimal z/OS performance is essential for day-to-day operations: banks with billions of transactions per day, global retailers, especially on Black Friday or Cyber Monday, government agencies and insurance companies that need to support millions of customers at any given time, transportation companies with 24/7 online delivery tracking; the list goes on and on.

For these organizations and many others, real-time performance information is in fact, too late. They need information that enables them to prevent disruptions – not simply tell them when something is already broken.

Continue reading

Dragging the Right Information Out of SMF/RMF/CMF for z/OS Disk Performance Analysis

Dave Heggen

By Dave HeggenDragging

Internal processing in IntelliMagic Vision is performed on a Sysplex Boundary. We want the SMF data from all LPARS in a Sysplex, and if multiple Sysplexes attach to the same hardware, then we want these Sysplexes together in the same interest group. By processing the data in this manner, an interest group will provide an accurate representation of the hardware’s perspective of activity and allow an evaluation of whether this activity is below, equal to, or above the hardware’s capability. It’s also true that the shorter the interval, the more accurate the data will be in showing peaks and lulls. The shortest interval you can define is 1 minute. This would typically be the average of 60 samples (1 cycle per second). It’s always a balancing act between the accuracy of the data and the size/cost of storing and processing the data.

Continue reading

Finding Hidden Time Bombs in Your VMware Connectivity

By Brett Allisontime bomb Brett

Do you have any VMware connectivity risks? Chances are you do. Unfortunately, there is no way to see them. That’s because seeing the real end-to-end risks from the VMware guest through the SAN fabric to the Storage LUN is a difficult thing to do in practice as it requires many relationships from a variety of sources.

A complete end to end picture requires:

  • VMware guests to the ESX Hosts
  • ESX hosts initiators to targets
  • ESX hosts and datastores, VM guests and datastores, and ESX datastores to LUNs.
  • Zone sets
  • Target ports to host adapters and LUNs and storage ports.

For seasoned SAN professionals, none of this information is very difficult to comprehend. The trick is tying it all together in a cohesive way so you can visualize these relationships and quickly identify any asymmetry.

Why is asymmetry important? Let’s look at an actual example:

Continue reading

No Budget for an ITOA Performance Management Solution

By Morgan Oats

no budget

Every department in every industry has the same problem: how can I stretch my budget to get the necessary work done, make my team more effective, reduce costs, and stay ahead of the curve? This is equally true for performance and capacity planning teams. In many cases, it’s difficult to get budget approval to purchase the right software solution to help accomplish these goals. Management wants to stay under budget while IT is concerned with getting a solution that solves their problems. When trying to get approval for the right solution, it’s important to be able to show how you will get a good return on investment.

Continue reading

Bridging the z/OS Mainframe Performance & Capacity Skills Gap

B._Phillips-web0By Brent Phillips

Many, if not most organizations that depend on mainframes are experiencing the effects of the mainframe skills gap, or shortage. This gap is a result of the largely baby-boomer workforce that is now retiring without a new generation of experts in place who have the same capabilities. At the same time, the scale, complexity, and change in the mainframe environment continues to accelerate. Performance and capacity teams are a mission-critical function, and this performance skills gap represents a great risk to ongoing operations. It demands both immediate attention and a new, more effective approach to bridging the gap.

Bridging the z/OS Mainframe Performance and Capacity Skills Gap

Continue reading