During a rebuild of my home lab recently I stumbled on to problem which means I need to update the VMware Workstation \ Fusion ESXi build instructions I have previously issued. Due to me having to rebuild the hosts one at a time to ensure that the USB drives use the mbr partition instead of GPT I followed my previous instructions of using a VM in VMware Workstation to rebuild all three USB drives, all was good with the first host running but as soon as I added hosts 2 and 3 I started experiencing issues with the hosts not working well with each other. Each host was able to ping components outside of themselves but were unable to ping the other hosts, making things even stranger still was that I couldn’t add the hosts to vCenter either, having done the obvious and made sure I hadn’t made any mistakes with networking (all using the correct IP addressing, subnet masks and DNS servers etc) I put a post out to Twitter asking for some advice, this wasn’t a new build having had been used previously to build out the Shuttle servers so I knew that the ISO image was fine so then someone asked what happened with a single server being on line, powering down two of the hosts I found myself able to ping the remaining host without issue, however as soon as I powered up another host I encountered dropped pings all over the place.
Thinking perhaps that I had a faulty NIC card (it happens and is the reason I have a spare one sitting around) I started playing with all three hosts to see which one of the three could be causing the issue, as it turns out none of them and all of them. Individually each host performs as expected, it’s 100% pingable, able to join vCenter and works as expected but adding any other host to the environment would cause the entire environment to break down and become un-responsive.
Knowing that the ISO used was a known good working file I had to dig a little deeper to discover what the issue was because going back to my Windows days the issues sounded like those of a machine with duplicate IP addresses, however knowing that I had specifically installed and addressed the hosts individually I knew that not only were they named individually but also the networking on each host was unique (at this stage they weren’t even part of the vCenter environment, had a single NIC configured and were using a vSS rather than a vDS).
At this stage the only common thing between each host build was the ISO image (or so I thought) so I downloaded a fresh copy of the latest 5.5u2 image from the VMware site and proceeded to create a new image via my VMware Workstation VM.
With everything powered off I powered up the freshly created host and configured it, rebooted it to ensure everything was working and then I powered up a second host, as soon as the networking stack on the second host loaded I had issues with the first host dropping ICMP packets and becoming unresponsive again, well that ruled out the ISO being at fault.
At this stage I had only one common denominator left. The VMware Workstation VM, now this VM doesn’t actually have any VMDK associated with it, the 40GB drive that was initially created was deleted because I knew I was going to use this VM to create my USB instances, that should have been fine, little was I to know that each individual VM is unique and that somehow gets written to the USB key during the installation process, how do I know this? Well I decided to see what happens if I created a new USB key on my MBP, I created the VM, installed the OS onto the USB drive and having left host 1 powered up I built up host 2 with the new image, pings on host 1 were 100% successful, pings to host 2 were also 100% successful, not only that but both hosts could ping each other and were able to be added to vCenter as well.
Going back to my laptop (rather than the MBP) I built up a new ESXi 5.5 VM and built up the last USB key with the ISO file, inserted it to the last host and powered it up, both host 1 and 2 were still 100% success on the ping front and host 3 worked as expected as well.
It appears that during the build process of the USB key some unique ID has been placed on the device, replicating the build out to a number of USB drives creates multiple instances of the same unique ID and causes an issue with each host sharing that same ID thereby causing issues when more than one host is powered up.
Why hadn’t I experienced this in the past? Well I purchased my hosts over a period of time rather than all at once, because of that I had always created a new VM every time I needed to build up my next host and never experienced this issue previously. In-fact talking to colleagues on Twitter they also used the VMware Workstation (or Fusion) method to build out home lab environments and have never experienced this previously either.
Due to this newly discovered anomaly I will be updating my USB \ VMware Workstation page to ensure that no one experiences this in the future.