Skip to content

Manage VSAN 5.5 with RVC Part 4 – Troubleshooting

Part 4 of the "Manage VSAN with RVC" series covers commands that are useful to troubleshoot VSAN configurations. The commands can measures performance metrics and fix configuration issues:

  • vsan.obj_status_report
  • vsan.check_state
  • vsan.fix_renamed_vms
  • vsan.reapply_vsan_vmknic_config
  • vsan.vm_perf_stats

vsan-performance-stats

To make commands look better I created marks for a Cluster, a Virtual Machine and an ESXi Hosts. This allows me to use ~cluster, ~vm and ~esx in my examples:

/localhost/DC> mark cluster ~/computers/VSAN-Cluster/
/localhost/DC> mark vm ~/vms/vma.virten.lab
/localhost/DC> mark esx ~/computers/VSAN-Cluster/hosts/esx1.virten.lab/

Troubleshooting VSAN

vsan.obj_status_report
Provides information about objects and their health status. With this command, you can identify that all object components are healthy, which means that witness and all mirrors are available and synced. It also identifies possibly orphaned objects.

Usage from help page:

/localhost/DC> help vsan.obj_status_report
Print component status for objects in the cluster.
  cluster_or_host: Path to a ClusterComputeResource or HostSystem
           --print-table, -t:   Print a table of object and their status, default all objects
      --filter-table, -f:   Filter the obj table based on status displayed in histogram, e.g. 2/3
           --print-uuids, -u:   In the table, print object UUIDs instead of vmdk and vm paths
  --ignore-node-uuid, -i:   Estimate the status of objects if all comps on a given host were healthy.
                  --help, -h:   Show this message

Example 1 - Simple component status histogram.
We can see 45 objects with 3/3 healthy components and 23 objects with 7/7 healthy components. With default policies, 3/3 are disks (2 mirror+witness) and 7/7 are namespace directories. We can also see an orphand object in this example.

/localhost/DC> vsan.obj_status_report ~cluster
2014-01-03 19:10:13 +0000: Querying all VMs on VSAN ...
2014-01-03 19:10:13 +0000: Querying all objects in the system from esx1.virten.lab ...
2014-01-03 19:10:14 +0000: Querying all disks in the system ...
2014-01-03 19:10:15 +0000: Querying all components in the system ...
2014-01-03 19:10:15 +0000: Got all the info, computing table ...

Histogram of component health for non-orphaned objects

+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 3/3                                 |  45                          |
| 7/7                                 |  23                          |
+-------------------------------------+------------------------------+
Total non-orphans: 68

Histogram of component health for possibly orphaned objects

+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 1/3                                 |  1                           |
+-------------------------------------+------------------------------+
Total orphans: 1

Example 2 - Add a table with all object and their status to the command output. That output reveals which object actually is orphaned.

/localhost/DC> vsan.obj_status_report ~cluster -t
2014-01-03 19:42:13 +0000: Querying all VMs on VSAN ...
2014-01-03 19:42:13 +0000: Querying all objects in the system from esx1.virten.lab ...
2014-01-03 19:42:14 +0000: Querying all disks in the system ...
2014-01-03 19:42:15 +0000: Querying all components in the system ...
2014-01-03 19:42:16 +0000: Got all the info, computing table ...

Histogram of component health for non-orphaned objects

+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 3/3                                 |  45                          |
| 7/7                                 |  23                          |
+-------------------------------------+------------------------------+
Total non-orphans: 68

Histogram of component health for possibly orphaned objects

+-------------------------------------+------------------------------+
| Num Healthy Comps / Total Num Comps | Num objects with such status |
+-------------------------------------+------------------------------+
| 1/3                                 |  1                           |
+-------------------------------------+------------------------------+
Total orphans: 1

+-----------------------------------------------------------------------------+---------+---------------------------+
| VM/Object                                                                   | objects | num healthy / total comps |
+-----------------------------------------------------------------------------+---------+---------------------------+
| perf9                                                                       | 1       |                           |
|    [vsanDatastore] 735ec152-da7c-64b1-ebfe-eca86bf99b3f/perf9.vmx           |         | 7/7                       |
| perf8                                                                       | 1       |                           |
|    [vsanDatastore] 195ec152-92a3-491b-801d-eca86bf99b3f/perf8.vmx           |         | 7/7                       |
[...]
+-----------------------------------------------------------------------------+---------+---------------------------+
| Unassociated objects                                                        |         |                           |
[...]
|    795dc152-6faa-9ae7-efe5-001b2193b9a4                                     |         | 3/3                       |
|    d068ab52-b882-c6e8-32ca-eca86bf99b3f                                     |         | 1/3*                      |
|    ce5dc152-c5f6-3efb-9a44-001b2193b3b0                                     |         | 3/3                       |
+-----------------------------------------------------------------------------+---------+---------------------------+

+------------------------------------------------------------------+
| Legend: * = all unhealthy comps were deleted (disks present)     |
|         - = some unhealthy comps deleted, some not or can't tell |
|         no symbol = We cannot conclude any comps were deleted    |
+------------------------------------------------------------------+

Example 3 - Add a filtered table with unhealthy components only. We filter the table to show 1/3 health components only

/localhost/DC> vsan.obj_status_report ~cluster -t -f 1/3 
[...]
+-----------------------------------------+---------+---------------------------+
| VM/Object                               | objects | num healthy / total comps |
+-----------------------------------------+---------+---------------------------+
| Unassociated objects                    |         |                           |
|    d068ab52-b882-c6e8-32ca-eca86bf99b3f |         | 1/3*                      |
+-----------------------------------------+---------+---------------------------+
[...]
/localhost/DC> vsan.obj_status_report ~cluster -t -u 
[...]
+-----------------------------------------+---------+---------------------------+
| VM/Object                               | objects | num healthy / total comps |
+-----------------------------------------+---------+---------------------------+
| perf9                                   | 1       |                           |
|    735ec152-da7c-64b1-ebfe-eca86bf99b3f |         | 7/7                       |
| perf8                                   | 1       |                           |
|    195ec152-92a3-491b-801d-eca86bf99b3f |         | 7/7                       |
| perf11                                  | 1       |                           |
|    d65ec152-7d27-f441-39bb-eca86bf99b3f |         | 7/7                       |
[...]vsan.check_state
Checks state of VMs and VSAN objects. This command can also re-register vms where objects are out of sync. I can't reproduce to get VMs out of sync, so I couldn't test that for now. I am going to update that post when i have any further information. 

Example 1 - Check state
/localhost/DC>  vsan.check_state ~cluster
2014-01-03 19:53:36 +0000: Step 1: Check for inaccessible VSAN objects
Detected 1 objects to not be inaccessible
Detected d068ab52-b882-c6e8-32ca-eca86bf99b3f on esx2.virten.lab to be inaccessible

2014-01-03 19:53:37 +0000: Step 2: Check for invalid/inaccessible VMs

2014-01-03 19:53:37 +0000: Step 3: Check for VMs for which VC/hostd/vmx are out of sync
Did not find VMs for which VC/hostd/vmx are out of sync

vsan.fix_renamed_vms
This command fixes VMs that are renamed by the vCenter in case of storage inaccessibility when they are get renamed to their vmx file path.
In this a best effort command, as the real VM name is unknown.

Example 1 - Fix a renamed VM
vsan-rename-vm

/localhost/DC>  vsan.fix_renamed_vms ~/vms/%2fvmfs%2fvolumes%2fvsanDatastore%2fvma.virten.lab%2fvma.virten.lab.vmx/
Continuing this command will rename the following VMs:
%2fvmfs%2fvolumes%2fvsanDatastore%2fvma.virten.lab%2fvma.virten.lab.vmx -> vma.virten.lab
Do you want to continue [y/N]?
y
Renaming...
Rename %2fvmfs%2fvolumes%2fvsanDatastore%2fvma.virten.lab%2fvma.virten.lab.vmx: success

vsan.reapply_vsan_vmknic_config
Re-enables VSAN on vmk ports. Could be useful when you have network configuration problems in your VSAN Cluster.

Example 1 - Unbinds and rebinds VSAN on an host

/localhost/DC> vsan.reapply_vsan_vmknic_config ~esx
Host: esx1.virten.lab
  Reapplying config of vmk1:
    AgentGroupMulticastAddress: 224.2.3.4
    AgentGroupMulticastPort: 23451
    IPProtocol: IPv4
    InterfaceUUID: 776ca852-6660-c6d8-c9f4-001b2193b9a4
    MasterGroupMulticastAddress: 224.1.2.3
    MasterGroupMulticastPort: 12345
    MulticastTTL: 5
  Unbinding VSAN from vmknic vmk1 ...
  Rebinding VSAN to vmknic vmk1 ...

vsan.vm_perf_stats
Displays performance statistics from a virtual machine. The following metrics are supported:

  • IOPS (read/write)
  • Throughput‎ in KB/s (read/write
  • Latency in ms (read/write)

Example 1 - Display performance stats (Default: 20sec interval)

/localhost/DC> san.vm_perf_stats ~vm
2014-01-03 20:23:44 +0000: Querying info about VMs ...
2014-01-03 20:23:44 +0000: Querying VSAN objects used by the VMs ...
2014-01-03 20:23:45 +0000: Fetching stats counters once ...
2014-01-03 20:23:46 +0000: Sleeping for 20 seconds ...
2014-01-03 20:24:06 +0000: Fetching stats counters again to compute averages ...
2014-01-03 20:24:07 +0000: Got all data, computing table
+-----------+--------------+------------------+--------------+
| VM/Object | IOPS         | Tput (KB/s)      | Latency (ms) |
+-----------+--------------+------------------+--------------+
| win7      | 325.1r/80.6w | 20744.8r/5016.8w | 2.4r/12.2w   |
+-----------+--------------+------------------+--------------+

Example 2 - Display all VM objects performance stats with an interval of 5.

/localhost/DC> vsan.vm_perf_stats ~vm --show-objects --interval=5
2014-01-03 20:29:08 +0000: Querying info about VMs ...
2014-01-03 20:29:08 +0000: Querying VSAN objects used by the VMs ...
2014-01-03 20:29:09 +0000: Fetching stats counters once ...
2014-01-03 20:29:09 +0000: Sleeping for 5 seconds ...
2014-01-03 20:29:14 +0000: Fetching stats counters again to compute averages ...
2014-01-03 20:29:15 +0000: Got all data, computing table
+-----------------------------------------------------+-------------+----------------+--------------+
| VM/Object                                           | IOPS        | Tput (KB/s)    | Latency (ms) |
+-----------------------------------------------------+-------------+----------------+--------------+
| win7                                                |             |                |              |
|    ddc4a952-c91c-a777-771f-001b2193b9a4/win7.vmx    | 0.0r/0.3w   | 0.0r/0.2w      | 0.0r/26.1w   |
|    ddc4a952-c91c-a777-771f-001b2193b9a4/win7.vmdk   | 166.6r/3.9w | 10597.2r/13.2w | 5.2r/23.5w   |
|    ddc4a952-c91c-a777-771f-001b2193b9a4/win7_1.vmdk | 0.0r/72.6w  | 0.0r/4583.7w   | 0.0r/13.1w   |
+-----------------------------------------------------+-------------+----------------+--------------+

Manage VSAN with RVC Series

Tags:

2 thoughts on “Manage VSAN 5.5 with RVC Part 4 – Troubleshooting”

Leave a Reply to Alex Cancel reply

Your email address will not be published. Required fields are marked *