This evening I thought I’d update my homelab to the latest vSphere 7.0u1d. I had some issues with updating my VCSA via the VAMI. Ultimately that was all resolved by performing the update using the cli without issue. However when I came to update my hosts, two of them seemingly updated fine but the third one got stuck.
It didn’t take a whole lot of digging to discover that the 10GbE interfaces were completely missing from the updated hosts and as such my vMotion kernel ports had no physical ports to use!
My 10GbE NICs are Dell badged Broadcom BCM57402 cards which I got for a bargain price but have a history of causing issues. Officially they’re not supported since vSphere 6.7 but they’d been working so far with bnxtnet driver version 184.108.40.206 (that is, after I’d found that the firmware version I had on them originally was stripping VLAN headers from packets!)
The update to vSphere 7.0U1d however had updated the bnxtnet driver to version 220.127.116.11. I had a good hunt through the release notes and couldn’t find any documented reason this driver should prevent my NICs from working, however I know that these cards are pretty old now and updates within the driver bundle for this card stopped some time ago.
So my first hunch was to revert to the previous driver version, which thankfully did solve the problem. So here’s how I did that:
First off, put a host in to maintenance mode and enable SSH. The connect to the host via SSH and run the following:
esxcli software vib remove -n bnxtnet
esxcli software vib remove -n bnxtroce
After that, reboot the host. When it comes back up confirm in vCenter that the bnxt packages are gone from the host
Now, as I updated from vSphere 6.7 I was still using traditional Baselines in Lifecycle Manager. So at first I grabbed the driver I wanted (using the relevant link in the HCL), unzipped it and imported it in to Lifecycle Manager
I then added this extension to an extension baseline and attached the baseline to my cluster.
Finally, I ran a compliance check on one host and found it non-compliant with my extension baseline
Remediating the specific extension baseline on the host installs the required driver and reboots the host. Then, on checking, the NICs are back.
However, here there lies a small problem (at least for those of us with a modicum of OCD!). Not all our Baselines are now compliant as the predefined “Non-Critical Host Patches” baseline wants use to update to the version 18.104.22.168 driver again. I’m pretty certain that this behaviour is different from vSphere 6.7.
As I understand it, switching to Single Image updates for my cluster will help with this as I should be able to pick and choose the components installed on each host. I’ll detail that though in another post.