One of my colleague reported me that there is an error in the host preparation tab of NSX. When I looked into it in host preparation tab, under VXLAN it was showing an error for the compute and Edge clusters. When I moved a cursor over to an error, it was stating as “VTEP was assigned link local IPv4 adderss” as shows below
Before i write about how I resolved the issue, Let me explain the NSX configuration & setup done here. This NSX environment is built on the VPLEX stretched clusters. Two Datacenters (Site A & Site B) are located 10 miles away. All the hosts built from the Cisco UCS B series servers. For each cluster in the NSX environment which you see in the screenshot above have few servers picked up from each of the two different sites and formed a stretched cluster. VPLEX stretched storage is presented to these hosts. Current NSX version running is 6.2.7 and vSphere & vCenter version is 6.0.x. Okay, I think i have written pretty much about the configuration and architecture of this setup here which will help to map the environment details wherever I write below for issue resolution.
As the error was stating about the VXLAN and the VTEP IPs, I thought to check under Logical Network preparation –> VXLAN transport for the VMKnic IP addressing assignment details for each hosts part of the NSX managed clusters. When i looked into it, 1st thing i noticed is the APIPA IP assignment for the couple of hosts, and the actual IP assignments for the rest of the hosts.
I got little clue from the above the situation of the error. In this scenario, there was no private IP pool created for the VTEPs assignment. Instead DHCP server is integrated, where pools are created in the DHCP, IP are fetched from the DHCP server. One more important point to say here is, there are two DHCP servers created for each sites.
The couple of hosts which you see in the screenshot above have got a right IPs from the respective VLAN segment are actually located in the Site A DataCenter, and IPs are received from the Site A DHCP server. However, the couple of hosts got an APIPA IPs in the screenshot above are part of the Site B DataCenter. So, this strange scenario made me to check the DHCP servers status. I did check the reachabality using ping command. Site B DHCP server was not reachable.
This helped to to check the DHCP server. DHCP server service was running on the Redhat Linux OS deployed as a VM. When I looked into the VM status, its VM network status was inactive.
Here is the issue. Site B DHCP server is not connected to its VM network, (it was created earlier, probably during host re-configuration phase forgot to re-create back) Hence, Site B hosts are not getting the VTEP VLAN IPs defined in the DHCP address range.
once I re-created the VM network port group for the Ste B DHCP server and mapped. DHCP server started communicating and reachable to its network.
Later I verified the in the logical network preparation tab, all the hosts in the Site B starting receiving the actual IPs from the Site B DHCP server. Once all the hosts received the IPs, VXLAN Error also disappeared automatically. NSX manager health came back to a normal condition.