I have been wanting to put together a kind of a series of vVol (I still have a hard time typing vVol instead of VVol) troubleshooting, tips or some things I’ve been seeing. I’ve gotten a couple of those already, but I should be getting more out and maybe actually help out someone.
One of the more common issues I have seen when getting started with vVols is when an ESXi host is unable to authenticate with the VASA Providers. Even though the vCenter Server has registered the Storage Providers successfully. There can be a few reasons for this happening, so I wanted to dig into some of those reasons, how to troubleshoot them and hopefully fix them.
The Setup and Issue
Let’s dive into what we are wanting to look at. In this example I am just getting vVols setup with the vCenter. In this vCenter I have just registered the storage providers and everything there is healthy. The protocol endpoint is connected to all of the hosts in the ESXi Cluster and I just created the vVol Datastore. Now here is when I see the issue, I get a report that the vVol Datastore “inaccessible” for 1 or 2 of the hosts in the ESXi Cluster. Well, that’s strange, it’s fine on all of the other hosts in the cluster, why is it broken with this host?
The first place I generally will be looking at is the vvold log on the ESXi host that the vVol Datastore reports as inaccessible.
The Empty VP URL Issue
2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::AddOrRef VP (sn1-m20-c12-25-ct0) Uid () URL (https://10.21.88.113:8084/version.xml) 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] DlgSoapCtxPool: Created new DlgSoapCtxPool ([14DlgSoapCtxPool:0x000000c29694e690]) for VP (sn1-m20-c12-25-ct0) session (000000c29694e520) 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr:UpdateFirewallRules: Firewall operation add on rule id:VVOLD_8084 success 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml 2019-08-12T22:59:18.128Z warning vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: failed to get endpoint, err=SSL Exception: Verification parameters: --> PeerThumbprint: 0B:2F:E1:52:F3:85:EB:CD:BE:B4:FF:E3:B2:64:1B:FB:F5:BA:55:F4 --> ExpectedThumbprint: --> ExpectedPeerName: 10.21.88.113 --> The remote host certificate has these problems: --> --> * unable to get local issuer certificate, using default 2019-08-12T22:59:18.128Z info vvold[2298793] [Originator@6876 sub=Default] VasaSession::Initialize url is empty 2019-08-12T22:59:18.128Z warning vvold[2298793] [Originator@6876 sub=Default] VasaSession::DoSetContext: Empty VP URL for VP (sn1-m20-c12-25-ct0)! 2019-08-12T22:59:18.128Z info vvold[2298793] [Originator@6876 sub=Default] Initialize: Failed to establish connection https://10.21.88.113:8084/version.xml 2019-08-12T22:59:18.128Z error vvold[2298793] [Originator@6876 sub=Default] Initialize: Unable to init session to VP sn1-m20-c12-25-ct0 state: 0
I see that when the host is attempting to authenticate with the VP (VASA Provider) it is unable to locate the VP’s issuer certificate, and because that issuer certificate is missing the VP URL for Initialization is missing the host is unable to authenticate with the VASA Provider.
A look at a Good Authentication
One of my favorite troubleshooting workflows is to compare the problem with a healthy/normal working example. So, next I want to take a look at the vvold log for the host that everything is working for just fine. I’ll see that the request to GetEndPoint is made and succeeds right away.
2019-08-12T22:59:18.086Z info vvold[2571765] [Originator@6876 sub=Default] VendorProviderMgr::AddOrRef VP (sn1-m20-c12-25-ct0) Uid () URL (https://10.21.88.113:8084/version.xml) 2019-08-12T22:59:18.086Z info vvold[2571765] [Originator@6876 sub=Default] DlgSoapCtxPool: Created new DlgSoapCtxPool ([14DlgSoapCtxPool:0x0000002177da0d40]) for VP (sn1-m20-c12-25-ct0) session (0000002177da0bd0) 2019-08-12T22:59:18.087Z info vvold[2571765] [Originator@6876 sub=Default] VendorProviderMgr:UpdateFirewallRules: Firewall operation add on rule id:VVOLD_8084 success 2019-08-12T22:59:18.087Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::Parse xml: id=4, loc=/vasa, myVer=4, ver=0 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::ParseVersionXml: Effective VP endpoint is https://10.21.88.113:8084/vasa (version 4) 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::DoSetContext: Setting GUID: 'D2CCA827-514D-E611-0000-78654321006B' VC UID: '1143a5e7-f7de-4311-b0d8-781e84d1c908' (lastEventId was -1) 2019-08-12T22:59:18.323Z warning vvold[2571765] [Originator@6876 sub=Default] VasaSession::SetState: VP sn1-m20-c12-25-ct0 [OutOfSync -> Connected], state change locked! 2019-08-12T22:59:18.323Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::DoSetContext: master soap's original keep_alive = 1 2019-08-12T22:59:18.324Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::DoSetContext: VP sn1-m20-c12-25-ct0 url https://10.21.88.113:8084/vasa: success eventTimer 5000000 maxBatchsize 1 maxConcurrentRequestsPerSession 5 _sessionId [440680165843882267] _genId 1
Notice the difference between the two when the VP URL is looked up with the GetEndPoint.
2019-08-12T22:59:18.087Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::Parse xml: id=4, loc=/vasa, myVer=4, ver=0 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::ParseVersionXml: Effective VP endpoint is https://10.21.88.113:8084/vasa (version 4) ===== vs ===== 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml 2019-08-12T22:59:18.128Z warning vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: failed to get endpoint, err=SSL Exception: Verification parameters: --> PeerThumbprint: 0B:2F:E1:52:F3:85:EB:CD:BE:B4:FF:E3:B2:64:1B:FB:F5:BA:55:F4 --> ExpectedThumbprint: --> ExpectedPeerName: 10.21.88.113 --> The remote host certificate has these problems: --> --> * unable to get local issuer certificate, using default
The immediate failure is the SSL Exception. More often than not, there is one of two outcomes at play here. The first being that vvold has the local trusted root certs cached from before the storage provider was registered. I’ve seen this more often with earlier builds of 6.5, but with 6.5 U3 it should be an issue any more. The second would be that the vCenter hasn’t pushed the Trusted Certs to the ESXi host.
What can I do to fix this?
With stale or cached certificate information in vvold we want to go ahead and clear that out. I will usually start off just issue the following to the affected ESXi Host:
/etc/init.d/vvold ssl_reset
/etc/init.d/vvold restart
With both of those getting ran this will make sure that there isn’t any cached cert locations or details in vvold. Here is an example of running through this:
2019-08-12T23:14:34.663Z info vvold[2298782] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml 2019-08-12T23:14:34.666Z warning vvold[2298782] [Originator@6876 sub=Default] VasaSession::GetEndPoint: failed to get endpoint, err=SSL Exception: Verification parameters: --> PeerThumbprint: 0B:2F:E1:52:F3:85:EB:CD:BE:B4:FF:E3:B2:64:1B:FB:F5:BA:55:F4 --> ExpectedThumbprint: --> ExpectedPeerName: 10.21.88.113 --> The remote host certificate has these problems: --> --> * unable to get local issuer certificate, using default 2019-08-12T23:14:34.667Z info vvold[2298782] [Originator@6876 sub=Default] VasaSession::Initialize url is empty 2019-08-12T23:14:34.667Z warning vvold[2298782] [Originator@6876 sub=Default] VasaSession::DoSetContext: Empty VP URL for VP (sn1-m20-c12-25-ct0)! 2019-08-12T23:14:34.667Z info vvold[2298782] [Originator@6876 sub=Default] Initialize: Failed to establish connection https://10.21.88.113:8084/version.xml 2019-08-12T23:14:34.667Z error vvold[2298782] [Originator@6876 sub=Default] Initialize: Unable to init session to VP sn1-m20-c12-25-ct0 state: 0 ========== /etc/init.d/vvold ssl_reset is ran ========== 2019-08-12T23:14:34.817Z [/etc/init.d/vvold[2299926]: /etc/init.d/vvold ssl_reset, PID 2299926] 2019-08-12T23:14:34.821Z info vvold[2298809] [Originator@6876 sub=Default] Reading from pipe returned 1 2019-08-12T23:14:34.821Z warning vvold[2298809] [Originator@6876 sub=Default] VVold caught signal 1 2019-08-12T23:14:34.821Z warning vvold[2298793] [Originator@6876 sub=Default] ReloadSslCert: Resetting all VASA sessions on receiving SIGHUP 2019-08-12T23:14:34.821Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::ResetVPSessions: Resetting only SSL sessions 2019-08-12T23:14:34.821Z warning vvold[2298793] [Originator@6876 sub=Default] VVolServiceInstance::SetSSLContext: Skipping CRL check as env variable VVOLD_DO_CRL_CHECK is not set! 2019-08-12T23:14:34.822Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::ResetVPSessions: Skipping non-SSL session to VP (sn1-405-c12-21-ct0) 2019-08-12T23:14:34.822Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::ResetVPSessions: Skipping non-SSL session to VP (sn1-m20-c12-25-ct0) 2019-08-12T23:14:34.822Z [/etc/init.d/vvold[2299926]: vvold signalled for SSL reset] 2019-08-12T23:14:34.975Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.117:8084/version.xml 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::Parse xml: id=4, loc=/vasa, myVer=4, ver=0 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::ParseVersionXml: Effective VP endpoint is https://10.21.88.117:8084/vasa (version 4) 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::DoSetContext: Setting GUID: 'D2CCA827-514D-E611-0000-78654321009B' VC UID: '1143a5e7-f7de-4311-b0d8-781e84d1c908' (lastEventId was -1) 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA wwpn: 20:00:00:25:b5:22:00:0d wwnn: 20:00:00:25:b5:44:00:47 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA wwpn: 20:00:00:25:b5:22:00:3d wwnn: 20:00:00:25:b5:44:00:47 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA wwpn: 20:00:00:25:b5:66:00:0d wwnn: 20:00:00:25:b5:44:00:47 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA wwpn: 20:00:00:25:b5:66:00:3d wwnn: 20:00:00:25:b5:44:00:47 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding iSCSI HBA [iqn:iqn.1998-01.com.vmware:esxi-1] 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap: Master soap client created successfully 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap: _masterSoap=000000c2e0724010, iomode=33558544 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap VVold using 15 secs for soap connect timeout 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap VVold using 75 secs for soap receive timeout 2019-08-12T23:14:35.240Z warning vvold[2298809] [Originator@6876 sub=Default] vvol_ssl_auth_init: Will skip CRL check as env variable VVOLD_DO_CRL_CHECK is not set! 2019-08-12T23:14:35.240Z warning vvold[2298809] [Originator@6876 sub=Default] VasaSession:DoSetContext: Setting VASAVERSION cookie to "3.0" 2019-08-12T23:14:35.240Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::KillAllConnections VP (sn1-405-c12-21-ct0), purged 0 connections, 0 currently active, new genId (1) (broadcast wakeup to all threads waiting for free connection) 2019-08-12T23:14:35.334Z warning vvold[2298809] [Originator@6876 sub=Default] VasaSession::SetState: VP sn1-405-c12-21-ct0 [OutOfSync -> Connected], state change locked! 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::DoSetContext: master soap's original keep_alive = 1 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::DoSetContext: VP sn1-405-c12-21-ct0 url https://10.21.88.117:8084/vasa: success eventTimer 5000000 maxBatchsize 1 maxConcurrentRequestsPerSession 5 _sessionId [-1520153752992228692] _genId 1 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::CheckForceParams: Environment variable VVOLD_FORCE_MAX_CONNECTIONS_PER_SESSION not set, using 5 conns/session! 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] --> VasaOp::SetPEContextInt [#1]: ===> Issuing 'setPEContext' to VP sn1-405-c12-21-ct0 (#outstanding 0/5) [session state: Connected] 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::GetFreeConn: VP (sn1-405-c12-21-ct0), added new connection ([14VasaConnection:0x000000c2969a0c10]) to the pool (size now 1/5), _genId 1! 2019-08-12T23:14:35.335Z info vvold[2298784] [Originator@6876 sub=Default] --> VasaOp::EventPollerCB [#2]: ===> Issuing 'getEvents' to VP sn1-405-c12-21-ct0 (#outstanding 1/5) [session state: Connected] 2019-08-12T23:14:35.336Z info vvold[2298784] [Originator@6876 sub=Default] VasaSession::GetFreeConn: VP (sn1-405-c12-21-ct0), added new connection ([14VasaConnection:0x000000c296966eb0]) to the pool (size now 2/5), _genId 1! 2019-08-12T23:14:35.348Z error vvold[2298784] [Originator@6876 sub=Default] --> VasaOp::ThrowFromSessionError [#2]: ===> FINAL SUCCESS getEvents VP (sn1-405-c12-21-ct0) Container (sn1-405-c12-21-ct0) timeElapsed=12 msecs (#outstanding 1)
Right after the ssl_reset completes, we see that the GetEndPoint is able to complete without a failure and the VP is successfully initialized. If I recall correctly you only need to run the ssl_reset, but I tend to just run both in tandem to be safe.
Restarting vvold didn’t fix it, what next?
What about if that didn’t actually fix anything and the problem still persists and the same error is in vvold? This could be due to the Trusted CA Certs not actually getting pushed from vCenter to the ESXi hosts. I’ve seen this most often when importing a new Trusted CA Cert to vCenter after ESXi hosts are already in vCenter. Essentially the trusted certs aren’t successfully getting pushed to the ESXi hosts in the vCenter automatically after getting added.
There are a couple of ways that you can manually push the Trusted Root certs to the ESXi hosts. One way would be to right click on the host in the GUI, navigate to certificates and then choose the option to Refresh CA Certificates. This will in turn push all certificates in the TRUSTED_ROOTS storage to the ESXi host. Another way would be to leverage PowerCLI and do this. Here is a workflow that I have been using in these cases with PowerShell and PowerCLI:
## Connect to the vCenter Server ## Connect-VIServer -server prod-vcsa ## Get the ESXi hosts and set it to a variable ## $hosts = get-vmhost ## Start the Service Instance ## $si = Get-View ServiceInstance ## Start the certificate Manager view ## $certMgr = Get-View -Id $si.Content.CertificateManager ## Using the Cert Manager, refresh the ESXi hosts Certs ## ## This pushes all certificates in the TRUSTED_ROOTS store in the vCenter Server VECS store to the host. ## $certMgr.CertMgrRefreshCACertificatesAndCRLs($Hosts.ExtensionData.MoRef) ## Now in vCenter the vvol datastore should be accessible for each of those hosts. No need to do the ssl_reset and restart on VVold ##
Then an example of actually running through it.
PS C:> Connect-VIServer -Server dev-vcsa Name Port User ---- ---- ---- dev-vcsa 443 ALEX\Carver PS C:> Get-Cluster -Name "Dev Cluster" Name HAEnabled HAFailover DrsEnabled DrsAutomationLevel Level ---- --------- ---------- ---------- ------------------ Dev Cluster True 1 True FullyAutomated PS C:> $ESXi_Cluster = Get-Cluster -Name "Dev Cluster" PS C:> $ESXi_Cluster | Get-VMHost Name ConnectionState PowerState NumCpu CpuUsageMhz CpuTotalMhz MemoryUsageGB MemoryTotalGB Version ---- --------------- ---------- ------ ----------- ----------- ------------- ------------- ------- esxi-7.alex.pures… Connected PoweredOn 16 151 38304 14.586 255.897 6.7.0 esxi-6.alex.pures… Connected PoweredOn 20 141 43880 16.166 255.892 6.7.0 esxi-4.alex.pures… Connected PoweredOn 20 94 43880 8.945 255.892 6.7.0 PS C:> $hosts = $ESXi_Cluster | Get-VMHost PS C:> $hosts Name ConnectionState PowerState NumCpu CpuUsageMhz CpuTotalMhz MemoryUsageGB MemoryTotalGB Version ---- --------------- ---------- ------ ----------- ----------- ------------- ------------- ------- esxi-7.alex.pures… Connected PoweredOn 16 151 38304 14.586 255.897 6.7.0 esxi-6.alex.pures… Connected PoweredOn 20 141 43880 16.166 255.892 6.7.0 esxi-4.alex.pures… Connected PoweredOn 20 94 43880 8.945 255.892 6.7.0 PS C:> $si = Get-View ServiceInstance PS C:> $certMgr = Get-View -Id $si.Content.CertificateManager PS C:> $certMgr.CertMgrRefreshCACertificatesAndCRLs($Hosts.ExtensionData.MoRef) PS C:>
Then check the vvold log on the host that had the issue and you should see that everything is good again. I would recommend that you check to make sure that the Root Cert Refresh completed correctly in the Tasks. One issue that can happen is that vCenter is failing to push the Trusted Certs to the ESXi host.
This can happen because of stale entries in the PSC/SSO or some other combination that the communication between the ESXi host and vCenter is hitting an error. In the end, if vCenter is unable to push the Trusted Certs to the ESXi host, then the host will be unable to authenticate with the VASA Provider. VMware has updated their KB that covers how to check if there is an issue with the VECS or Expired/outdated certificates: https://kb.vmware.com/s/article/2146011
There is the possibility that this symptom is indicative of another issue, but more often than not the VP URL being empty is because of one of the two issues above.
Pingback: Troubleshooting Virtual Volume Setup – Cody Hosterman
/etc/init.d/vvold ssl_reset
/etc/init.d/vvold restart
I spent a day and a half ramming my head into a wall trying to figure out why the datastore kept falling into an inaccessible state. These 2 commands solved all of my issues. Thank you!!