In a VSAN ESXi environment, one of the hosts kept having issues. The virtual machines showed as 'disconnected' within VCentre and the host showed as unavailable in the console. The guests were still actually running. I could SSH to the host and the screen (via ILO) was showing eveything was ok. Here's what to do to troubleshoot:
Check for network issues. Can VCentre see the host which is having issues and all the connection paths?
IF DRS is enabled, try to put the host into maintenance mode to start a vmotion for the servers to another host.
-Check Health status in VCenter under vSAN Cluster > monitoring > VSAN > Health
-IF ESX host is not responding in vcenter (and guests show as 'disconnected', although they may still be running):
ILO to host (via IE) F2->Troubleshoot->restart management agents
-SSH to restart management agents:
-SSH to Host to check disk status on RAID array:
Use the following commands:
/opt/hp/hpssacli/bin/hpssacli ctrl all show detail
/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show status
/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 pd all show detail
/opt/hp/hpssacli/bin/hpssacli ctrl slot=0 ld all show status
-Log location on host:
Check these to see if anything obvious is going on.
-Use the VMWare ruby (RVC) console (prob should set this up on VCenter) to check status of things like VSAN:
Filesystem is like a basic menu system, so do something like this ('ls' to see options):
cd 1 (localhost)
cd 1 (Computers)
Use the following commands to show stats of services:
help vsan (to get other commands and help info.)
-Check if basic ESX commands are working on server:
# esxcli vms vm list
World ID: 73506118
Process ID: 0
VMX Cartel ID: 73547074
UUID: 42 1c fc 60 92 12 85 3c-ec 69 1f c7 06 e7 ec 92
Display Name: examplevm
Config File: /vmfs/volumes/3334a421-c1470e95/examplevm/examplevm.vmx
Power off VM using a soft shutdown:
# esxcli vms vm kill --type soft --world-id 73506118
Or a hard shutdown:
# esxcli vms vm list | grep -B1 73506118
Finally, you may need to force reboot the server using a cold boot on the ILO.
Once everything is back up: check vSAN health in vCenter.