Finding Hidden Time Bombs in Your VMware Connectivity

By Brett Allisontime bomb Brett

Do you have any VMware connectivity risks? Chances are you do. Unfortunately, there is no way to see them. That’s because seeing the real end-to-end risks from the VMware guest through the SAN fabric to the Storage LUN is a difficult thing to do in practice as it requires many relationships from a variety of sources.

A complete end to end picture requires:

  • VMware guests to the ESX Hosts
  • ESX hosts initiators to targets
  • ESX hosts and datastores, VM guests and datastores, and ESX datastores to LUNs.
  • Zone sets
  • Target ports to host adapters and LUNs and storage ports.

For seasoned SAN professionals, none of this information is very difficult to comprehend. The trick is tying it all together in a cohesive way so you can visualize these relationships and quickly identify any asymmetry.

Why is asymmetry important? Let’s look at an actual example:

SAN Connectivity

Figure 1: Single Path ESX Host

Symmetry in this context indicates that each node at each layer of the storage connectivity stack should have the same number of connections as any other node in that layer of the stack.

Notice the ESX Hosts layer. I have highlighted the asymmetry. This layer consists of four hosts: php10404, php00201, php00203 and php10203. Notice that php10404 has a single path to the DCX13 switch while the other three ESX hosts contain two paths through the fabric with one path going through DCX13, and one path going through DCX14.

You probably also noticed that the Logical Switch DCX14 only had three ports while it’s counterpart, DCX13, had four ports.

Because a VM has relationships with multiple ESX hosts it is not surprising that in this case there were many other VMs that had the same issue. Should the path from php10404 fail, any VM residing on the ESX host at the time of failure would lose connectivity to its data on the SAN fabric.

Further investigation revealed a problem with a switch port on DCX14.

Let’s take a look at another example:

Zoning and Port Issue

Figure 2: Zoning and Port Issue

In this example, we see a couple issues. I have highlighted the asymmetry at the ESX host level. Host xvd086 has three active paths to the fabric while the other ESX hosts associated with this VM have four paths (one through each Logical switch at the next level). Note that DCX11 only has three ports. This is the switch that does not have a connectivity relationship with host xvd086.

The challenge with this scenario is that when switch ports fail or misbehave the host may not be aware of the failure until the ESX host is restarted or a rescan for hardware occurs – leaving a huge blind spot for VMware administrators. Lastly, we see that there is asymmetry in the Target Ports level. SP-A contains four storage target ports but SP-B only has two. This turned out to be a zoning issue where some of the target ports for SP-B were not included in the zone set. This was easily resolved.

Free SAN Connectivity Audit

For a free connectivity audit of your SAN environment please send me an email at brett.allison@intellimagic.com

You can view the capabilities described in this blog in the video below:

Leave a Reply

Your email address will not be published. Required fields are marked *