This article will cover how to update existing NVMe-TCP target controller connections keep alive value. Unfortunately you can not just update the active controller connection and have to disconnect and then reconnect that controller with the updated values. This really applies for NVMe-TCP controller connections created in vSphere 8.0 U2 or earlier. The default keep alive value is change din 8.0 U3 from 30 to 10 seconds.
Context and Background
Over the past couple years I’ve dabbled a bit with NVMe-oF with vSphere. I didn’t particularly love how NVMe-TCP worked with 7.0 U3 and it wasn’t much better with the 8.0 GA release. With 8.0 U1 thought things were getting a lot better. VMware implemented end to end NVMe support, from guest to storage, no more SCSI translation. The overall reliability of NVMe-TCP felt much better with 8.0 U1 as well.
However, during path failover testing my team and I noticed that the path failover timing was worse than it had been with iSCSI. Specifically I found the failover timing when the switch ports for the array were shutdown. Instead of disabling the array ethernet ports or ESXi uplinks. Initially when I tested these failure scenarios I focused more on the array ports going down or the host ports going down, rather than shutting down the ports at the switch level. I should have learned my lesson during my iSCSI failure testing, but that was several years ago.
How to use PowerShell to update NVMe-TCP Controller Keep Alive
The easiest way that I found to update the keep alive setting for existing NVMe-TCP connections was to do it with PowerShell and PowerCLI. The process is straightforward enough:
- Put the ESXi host into maintenance mode
- This ensures that there won’t be any impact to active data paths.
- Run the PowerShell workflow that will disconnect and then connect the NVMe-TCP controller connections.
- Take the ESXi host out of maintenance mode
- Repeat these steps with the next hosts
I could have scripted out step 1, 3 and 4 but I did not for the initial example. I might come back and show how to do that later though. Here is the PowerShell method for updating the NVMe-TCP target controller connections with comments on the steps:
## Connecting to the vCenter Server that the hosts are in ## Connect-VIServer -server dr-vcsa.alex.purestorage.com ## Setting the varible for the ESXi host I want to redo the controller connections ## $VMHost = Get-VMHost -name 'esxi-4.alex.purestorage.com' ## In order to do this, you have to end up using esxcli on the host that you can pass through PowerCLI. ## $esxcli = Get-EsxCli -VMHost $VMHost -V2 ## Getting the controller output for the host. You could do a filter in here to only look for the pure nqn, I didn't bother putting that through since all controllers were already pure, you should skip the vvol connections though ## $nvmeCtrlList = $esxcli.nvme.controller.list.CreateArgs() $nvmeCtrlList.skipvvols = $true $controllerOutput = $esxcli.nvme.controller.list.Invoke($nvmeCtrlList) ## Making sure we only grab TCP controller connections. The controller list only allows one arg, thanks esxcli ## $tcpControllerOutput = $controllerOutput | Where-Object TransportType -like TCP ## Here is the for loop for each of the controllers on the list. Right here I am first getting the variables I'll need from the controller list. Then I am disconnecting that controller connection. Then connecting it with the desired values. Here the timeout is 10, but you can change it to 5 or 15 if you want for testing. But 10 is the default in 8.0 U3 and the recommendation right now. ## forEach ($ctrlOutput in $tcpControllerOutput) { $nqn,$vmhba,$ipPort = ($ctrlOutput.Name.split("#")) $ip,$port = $ipPort.Split(":") $ctrl = $ctrlOutput.ControllerNumber $nvme_timeout = '5' $nvmeDisconnect = $esxcli.nvme.fabrics.disconnect.CreateArgs() $nvmeConnect = $esxcli.nvme.fabrics.connect.CreateArgs() $nvmeDisconnect.adapter = $vmhba $nvmeDisconnect.controllernumber = $ctrl $esxcli.nvme.fabrics.disconnect.Invoke($nvmeDisconnect) $nvmeConnect.ipaddress = $ip $nvmeConnect.portnumber = $port $nvmeConnect.keepalivetimeout = $nvme_timeout $nvmeConnect.adapter = $vmhba $nvmeConnect.subsystemnqn = $nqn $esxcli.nvme.fabrics.connect.Invoke($nvmeConnect) } ## Then you can just change the host value and run through this again. Sure you could do a for loop for all hosts in the cluster, and also put in there to put the host into maintenance mode, but that can also just be done in the vCenter GUI ##
Overall I had a lot of fun dipping back into problem solving with PowerShell. I’ll try to come back and add some more steps for automating this at larger scale, but this should at least get you started if you can’t upgrade to 8.0 U3.