vVol Troubleshooting – What to do with a Empty VP URL

I have been wanting to put together a kind of a series of vVol (I still have a hard time typing vVol instead of VVol) troubleshooting, tips or some things I’ve been seeing. I’ve gotten a couple of those already, but I should be getting more out and maybe actually help out someone.

One of the more common issues I have seen when getting started with vVols is when an ESXi host is unable to authenticate with the VASA Providers. Even though the vCenter Server has registered the Storage Providers successfully. There can be a few reasons for this happening, so I wanted to dig into some of those reasons, how to troubleshoot them and hopefully fix them.

The Setup and Issue

Let’s dive into what we are wanting to look at. In this example I am just getting vVols setup with the vCenter. In this vCenter I have just registered the storage providers and everything there is healthy. The protocol endpoint is connected to all of the hosts in the ESXi Cluster and I just created the vVol Datastore. Now here is when I see the issue, I get a report that the vVol Datastore “inaccessible” for 1 or 2 of the hosts in the ESXi Cluster. Well, that’s strange, it’s fine on all of the other hosts in the cluster, why is it broken with this host?

The first place I generally will be looking at is the vvold log on the ESXi host that the vVol Datastore reports as inaccessible.

The Empty VP URL Issue

2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::AddOrRef VP (sn1-m20-c12-25-ct0) Uid () URL (https://10.21.88.113:8084/version.xml)
 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] DlgSoapCtxPool: Created new DlgSoapCtxPool ([14DlgSoapCtxPool:0x000000c29694e690]) for VP (sn1-m20-c12-25-ct0) session (000000c29694e520)
 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr:UpdateFirewallRules: Firewall operation add on rule id:VVOLD_8084 success

 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml
 2019-08-12T22:59:18.128Z warning vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: failed to get endpoint, err=SSL Exception: Verification parameters:
 --> PeerThumbprint: 0B:2F:E1:52:F3:85:EB:CD:BE:B4:FF:E3:B2:64:1B:FB:F5:BA:55:F4
 --> ExpectedThumbprint:
 --> ExpectedPeerName: 10.21.88.113
 --> The remote host certificate has these problems:
 -->
 --> * unable to get local issuer certificate, using default
 2019-08-12T22:59:18.128Z info vvold[2298793] [Originator@6876 sub=Default] VasaSession::Initialize url is empty
 2019-08-12T22:59:18.128Z warning vvold[2298793] [Originator@6876 sub=Default] VasaSession::DoSetContext: Empty VP URL for VP (sn1-m20-c12-25-ct0)!
 2019-08-12T22:59:18.128Z info vvold[2298793] [Originator@6876 sub=Default] Initialize: Failed to establish connection https://10.21.88.113:8084/version.xml
 2019-08-12T22:59:18.128Z error vvold[2298793] [Originator@6876 sub=Default] Initialize: Unable to init session to VP sn1-m20-c12-25-ct0 state: 0

I see that when the host is attempting to authenticate with the VP (VASA Provider) it is unable to locate the VP’s issuer certificate, and because that issuer certificate is missing the VP URL for Initialization is missing the host is unable to authenticate with the VASA Provider.

A look at a Good Authentication

One of my favorite troubleshooting workflows is to compare the problem with a healthy/normal working example. So, next I want to take a look at the vvold log for the host that everything is working for just fine. I’ll see that the request to GetEndPoint is made and succeeds right away.

2019-08-12T22:59:18.086Z info vvold[2571765] [Originator@6876 sub=Default] VendorProviderMgr::AddOrRef VP (sn1-m20-c12-25-ct0) Uid () URL (https://10.21.88.113:8084/version.xml)
 2019-08-12T22:59:18.086Z info vvold[2571765] [Originator@6876 sub=Default] DlgSoapCtxPool: Created new DlgSoapCtxPool ([14DlgSoapCtxPool:0x0000002177da0d40]) for VP (sn1-m20-c12-25-ct0) session (0000002177da0bd0)
 2019-08-12T22:59:18.087Z info vvold[2571765] [Originator@6876 sub=Default] VendorProviderMgr:UpdateFirewallRules: Firewall operation add on rule id:VVOLD_8084 success
 2019-08-12T22:59:18.087Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml
 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::Parse xml: id=4, loc=/vasa, myVer=4, ver=0
 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::ParseVersionXml: Effective VP endpoint is https://10.21.88.113:8084/vasa (version 4)
 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::DoSetContext: Setting GUID: 'D2CCA827-514D-E611-0000-78654321006B'  VC UID: '1143a5e7-f7de-4311-b0d8-781e84d1c908' (lastEventId was -1)
 2019-08-12T22:59:18.323Z warning vvold[2571765] [Originator@6876 sub=Default] VasaSession::SetState: VP sn1-m20-c12-25-ct0 [OutOfSync -> Connected], state change locked!
 2019-08-12T22:59:18.323Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::DoSetContext: master soap's original keep_alive = 1
 2019-08-12T22:59:18.324Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::DoSetContext: VP sn1-m20-c12-25-ct0 url https://10.21.88.113:8084/vasa: success eventTimer 5000000  maxBatchsize 1 maxConcurrentRequestsPerSession 5 _sessionId [440680165843882267] _genId 1

Notice the difference between the two when the VP URL is looked up with the GetEndPoint.

2019-08-12T22:59:18.087Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml
 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::Parse xml: id=4, loc=/vasa, myVer=4, ver=0
 2019-08-12T22:59:18.098Z info vvold[2571765] [Originator@6876 sub=Default] VasaSession::ParseVersionXml: Effective VP endpoint is https://10.21.88.113:8084/vasa (version 4)

=====
vs
=====


 2019-08-12T22:59:18.124Z info vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml
 2019-08-12T22:59:18.128Z warning vvold[2298793] [Originator@6876 sub=Default] VasaSession::GetEndPoint: failed to get endpoint, err=SSL Exception: Verification parameters:
 --> PeerThumbprint: 0B:2F:E1:52:F3:85:EB:CD:BE:B4:FF:E3:B2:64:1B:FB:F5:BA:55:F4
 --> ExpectedThumbprint:
 --> ExpectedPeerName: 10.21.88.113
 --> The remote host certificate has these problems:
 -->
 --> * unable to get local issuer certificate, using default

The immediate failure is the SSL Exception. More often than not, there is one of two outcomes at play here. The first being that vvold has the local trusted root certs cached from before the storage provider was registered. I’ve seen this more often with earlier builds of 6.5, but with 6.5 U3 it should be an issue any more. The second would be that the vCenter hasn’t pushed the Trusted Certs to the ESXi host.

What can I do to fix this?

With stale or cached certificate information in vvold we want to go ahead and clear that out. I will usually start off just issue the following to the affected ESXi Host:

/etc/init.d/vvold ssl_reset
/etc/init.d/vvold restart

With both of those getting ran this will make sure that there isn’t any cached cert locations or details in vvold. Here is an example of running through this:

2019-08-12T23:14:34.663Z info vvold[2298782] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.113:8084/version.xml
 2019-08-12T23:14:34.666Z warning vvold[2298782] [Originator@6876 sub=Default] VasaSession::GetEndPoint: failed to get endpoint, err=SSL Exception: Verification parameters:
 --> PeerThumbprint: 0B:2F:E1:52:F3:85:EB:CD:BE:B4:FF:E3:B2:64:1B:FB:F5:BA:55:F4
 --> ExpectedThumbprint:
 --> ExpectedPeerName: 10.21.88.113
 --> The remote host certificate has these problems:
 -->
 --> * unable to get local issuer certificate, using default
 2019-08-12T23:14:34.667Z info vvold[2298782] [Originator@6876 sub=Default] VasaSession::Initialize url is empty
 2019-08-12T23:14:34.667Z warning vvold[2298782] [Originator@6876 sub=Default] VasaSession::DoSetContext: Empty VP URL for VP (sn1-m20-c12-25-ct0)!
 2019-08-12T23:14:34.667Z info vvold[2298782] [Originator@6876 sub=Default] Initialize: Failed to establish connection https://10.21.88.113:8084/version.xml
 2019-08-12T23:14:34.667Z error vvold[2298782] [Originator@6876 sub=Default] Initialize: Unable to init session to VP sn1-m20-c12-25-ct0 state: 0

==========
/etc/init.d/vvold ssl_reset is ran
==========

 2019-08-12T23:14:34.817Z [/etc/init.d/vvold[2299926]: /etc/init.d/vvold ssl_reset, PID 2299926]
 2019-08-12T23:14:34.821Z info vvold[2298809] [Originator@6876 sub=Default] Reading from pipe returned 1
 2019-08-12T23:14:34.821Z warning vvold[2298809] [Originator@6876 sub=Default] VVold caught signal 1
 2019-08-12T23:14:34.821Z warning vvold[2298793] [Originator@6876 sub=Default] ReloadSslCert: Resetting all VASA sessions on receiving SIGHUP
 2019-08-12T23:14:34.821Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::ResetVPSessions: Resetting only SSL sessions
 2019-08-12T23:14:34.821Z warning vvold[2298793] [Originator@6876 sub=Default] VVolServiceInstance::SetSSLContext: Skipping CRL check as env variable VVOLD_DO_CRL_CHECK is not set!
 2019-08-12T23:14:34.822Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::ResetVPSessions: Skipping non-SSL session to VP (sn1-405-c12-21-ct0)
 2019-08-12T23:14:34.822Z info vvold[2298793] [Originator@6876 sub=Default] VendorProviderMgr::ResetVPSessions: Skipping non-SSL session to VP (sn1-m20-c12-25-ct0)
 2019-08-12T23:14:34.822Z [/etc/init.d/vvold[2299926]: vvold signalled for SSL reset]
 2019-08-12T23:14:34.975Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::GetEndPoint: with url https://10.21.88.117:8084/version.xml
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::Parse xml: id=4, loc=/vasa, myVer=4, ver=0
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::ParseVersionXml: Effective VP endpoint is https://10.21.88.117:8084/vasa (version 4)
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::DoSetContext: Setting GUID: 'D2CCA827-514D-E611-0000-78654321009B'  VC UID: '1143a5e7-f7de-4311-b0d8-781e84d1c908' (lastEventId was -1)
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA  wwpn: 20:00:00:25:b5:22:00:0d wwnn: 20:00:00:25:b5:44:00:47
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA  wwpn: 20:00:00:25:b5:22:00:3d wwnn: 20:00:00:25:b5:44:00:47
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA  wwpn: 20:00:00:25:b5:66:00:0d wwnn: 20:00:00:25:b5:44:00:47
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding FC HBA  wwpn: 20:00:00:25:b5:66:00:3d wwnn: 20:00:00:25:b5:44:00:47
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] GetVasaHostInitiators: Adding iSCSI HBA [iqn:iqn.1998-01.com.vmware:esxi-1]
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap: Master soap client created successfully
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap: _masterSoap=000000c2e0724010, iomode=33558544
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap VVold using 15 secs for soap connect timeout
 2019-08-12T23:14:34.988Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::InitSoap VVold using 75 secs for soap receive timeout
 2019-08-12T23:14:35.240Z warning vvold[2298809] [Originator@6876 sub=Default] vvol_ssl_auth_init: Will skip CRL check as env variable VVOLD_DO_CRL_CHECK is not set!
 2019-08-12T23:14:35.240Z warning vvold[2298809] [Originator@6876 sub=Default] VasaSession:DoSetContext: Setting VASAVERSION cookie to "3.0"
 2019-08-12T23:14:35.240Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::KillAllConnections VP (sn1-405-c12-21-ct0), purged 0 connections, 0 currently active, new genId (1) (broadcast wakeup to all threads waiting for free connection)
 2019-08-12T23:14:35.334Z warning vvold[2298809] [Originator@6876 sub=Default] VasaSession::SetState: VP sn1-405-c12-21-ct0 [OutOfSync -> Connected], state change locked!
 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::DoSetContext: master soap's original keep_alive = 1
 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::DoSetContext: VP sn1-405-c12-21-ct0 url https://10.21.88.117:8084/vasa: success eventTimer 5000000  maxBatchsize 1 maxConcurrentRequestsPerSession 5 _sessionId [-1520153752992228692] _genId 1
 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::CheckForceParams: Environment variable VVOLD_FORCE_MAX_CONNECTIONS_PER_SESSION not set, using 5 conns/session!
 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default]
 --> VasaOp::SetPEContextInt [#1]: ===> Issuing 'setPEContext' to VP sn1-405-c12-21-ct0 (#outstanding 0/5) [session state: Connected]
 2019-08-12T23:14:35.335Z info vvold[2298809] [Originator@6876 sub=Default] VasaSession::GetFreeConn: VP (sn1-405-c12-21-ct0), added new connection ([14VasaConnection:0x000000c2969a0c10]) to the pool (size now 1/5), _genId 1!
 2019-08-12T23:14:35.335Z info vvold[2298784] [Originator@6876 sub=Default]
 --> VasaOp::EventPollerCB [#2]: ===> Issuing 'getEvents' to VP sn1-405-c12-21-ct0 (#outstanding 1/5) [session state: Connected]
 2019-08-12T23:14:35.336Z info vvold[2298784] [Originator@6876 sub=Default] VasaSession::GetFreeConn: VP (sn1-405-c12-21-ct0), added new connection ([14VasaConnection:0x000000c296966eb0]) to the pool (size now 2/5), _genId 1!
 2019-08-12T23:14:35.348Z error vvold[2298784] [Originator@6876 sub=Default]
 --> VasaOp::ThrowFromSessionError [#2]: ===> FINAL SUCCESS getEvents VP (sn1-405-c12-21-ct0) Container (sn1-405-c12-21-ct0) timeElapsed=12 msecs (#outstanding 1)

Right after the ssl_reset completes, we see that the GetEndPoint is able to complete without a failure and the VP is successfully initialized. If I recall correctly you only need to run the ssl_reset, but I tend to just run both in tandem to be safe.

Restarting vvold didn’t fix it, what next?

What about if that didn’t actually fix anything and the problem still persists and the same error is in vvold? This could be due to the Trusted CA Certs not actually getting pushed from vCenter to the ESXi hosts. I’ve seen this most often when importing a new Trusted CA Cert to vCenter after ESXi hosts are already in vCenter. Essentially the trusted certs aren’t successfully getting pushed to the ESXi hosts in the vCenter automatically after getting added.

There are a couple of ways that you can manually push the Trusted Root certs to the ESXi hosts. One way would be to right click on the host in the GUI, navigate to certificates and then choose the option to Refresh CA Certificates. This will in turn push all certificates in the TRUSTED_ROOTS storage to the ESXi host. Another way would be to leverage PowerCLI and do this. Here is a workflow that I have been using in these cases with PowerShell and PowerCLI:

## Connect to the vCenter Server ##
Connect-VIServer -server prod-vcsa

## Get the ESXi hosts and set it to a variable ##
$hosts = get-vmhost

## Start the Service Instance ##
$si = Get-View ServiceInstance

## Start the certificate Manager view ##
$certMgr = Get-View -Id $si.Content.CertificateManager

## Using the Cert Manager, refresh the ESXi hosts Certs ##
## This pushes all certificates in the TRUSTED_ROOTS store in the vCenter Server VECS store to the host. ##
$certMgr.CertMgrRefreshCACertificatesAndCRLs($Hosts.ExtensionData.MoRef)

## Now in vCenter the vvol datastore should be accessible for each of those hosts.  No need to do the ssl_reset and restart on VVold ## 

Then an example of actually running through it.

PS C:> Connect-VIServer -Server dev-vcsa
 Name                           Port  User                          
 ----                           ----  ----                          
 dev-vcsa                       443   ALEX\Carver                   
 PS C:> Get-Cluster -Name "Dev Cluster"
 Name                           HAEnabled  HAFailover DrsEnabled DrsAutomationLevel  
                                           Level                                     
 ----                           ---------  ---------- ---------- ------------------  
 Dev Cluster                    True       1          True       FullyAutomated      
 PS C:> $ESXi_Cluster = Get-Cluster -Name "Dev Cluster"
 PS C:> $ESXi_Cluster | Get-VMHost
 Name                 ConnectionState PowerState NumCpu CpuUsageMhz CpuTotalMhz   MemoryUsageGB   MemoryTotalGB Version
 ----                 --------------- ---------- ------ ----------- -----------   -------------   ------------- -------
 esxi-7.alex.pures… Connected       PoweredOn      16         151       38304          14.586         255.897   6.7.0
 esxi-6.alex.pures… Connected       PoweredOn      20         141       43880          16.166         255.892   6.7.0
 esxi-4.alex.pures… Connected       PoweredOn      20          94       43880           8.945         255.892   6.7.0
 PS C:> $hosts = $ESXi_Cluster | Get-VMHost
 PS C:> $hosts
 Name                 ConnectionState PowerState NumCpu CpuUsageMhz CpuTotalMhz   MemoryUsageGB   MemoryTotalGB Version
 ----                 --------------- ---------- ------ ----------- -----------   -------------   ------------- -------
 esxi-7.alex.pures… Connected       PoweredOn      16         151       38304          14.586         255.897   6.7.0
 esxi-6.alex.pures… Connected       PoweredOn      20         141       43880          16.166         255.892   6.7.0
 esxi-4.alex.pures… Connected       PoweredOn      20          94       43880           8.945         255.892   6.7.0
 PS C:> $si = Get-View ServiceInstance
 PS C:> $certMgr = Get-View -Id $si.Content.CertificateManager
 PS C:> $certMgr.CertMgrRefreshCACertificatesAndCRLs($Hosts.ExtensionData.MoRef)
 PS C:> 

Then check the vvold log on the host that had the issue and you should see that everything is good again. I would recommend that you check to make sure that the Root Cert Refresh completed correctly in the Tasks. One issue that can happen is that vCenter is failing to push the Trusted Certs to the ESXi host.

This can happen because of stale entries in the PSC/SSO or some other combination that the communication between the ESXi host and vCenter is hitting an error. In the end, if vCenter is unable to push the Trusted Certs to the ESXi host, then the host will be unable to authenticate with the VASA Provider. VMware has updated their KB that covers how to check if there is an issue with the VECS or Expired/outdated certificates: https://kb.vmware.com/s/article/2146011

There is the possibility that this symptom is indicative of another issue, but more often than not the VP URL being empty is because of one of the two issues above.

Posted in Virtual Volumes, VMware.

One Comment

  1. Pingback: Troubleshooting Virtual Volume Setup – Cody Hosterman

Leave a Reply

Your email address will not be published.