Vagrant/DHCP problem
-
Hi folks,
Long post...
So, we have a DevOps colleague who has setup automation with Vagrant to one of our Hyper-V hosts. I am not familiar Vagrant at all, but the allows his code to build VMs on one of our Hyper-V hosts. I am trying to rule out DHCP being the issue.
Most of the time the VMs are made successfully and get a DHCP address. Every once in a while, one of the automated VM fail to get a DHCP IP (well, they actually don't). (from being generated new)
We setup Wireshark to capture traffic for one of these failed events. When this fails I can see:
1 - the broadcast from the mac of the VM asking for any DHCP servers on the subnet
2 - the DHCP offer with the MAC of the client and the IP available
3 - the actual DHCP request from the client MAC with the IP which is available from step 2
4 - the DHCP Ack from the DHCP server confirming allocation completeI have checked the DHCP server and can see the record. Correct IP, MAC, and a machine name. Now, at this point I believe the entire flow has been successful. Now, this is where it goes wrong.
I connect to his VM in Hyper-V and login with the password he gave and type ipconfig /all
I get no IP, no subnet, but I do see DHCP = Yes, Gateway and DNS servers. I now type 'hostname', and the hostname is different to what is in DHCP!What I suspect:
- Vagrant makes the machine, it boots, gets an IP and is written to DHCP
- At some point, the code written gets the VM to change its host name and the VM reboots
- The VM (with its new name) asks DHCP server for the same IP
- DHCP server refuses as the hostname does not match the IP in its records
- Vagrant VM is left in a stage having no IP
Does this sound reasonable? I assumed if this were the case though that the DHCP server would send a NACK or something refusing the IP renewal/request, but do not see the traffic in Wireshark.
If I restart the VM or do ipconfig /renew, it does get the correct IP, and DHCP updates with the new name of the machine.
This once in a while happens to his Linux VM and his Windows Server VM which are made via Vagrant, leading me to believe the issue is Vagrant.The fact DHCP has a record of the IP, Name and MAC of the host before name change makes me think the issue is with Vagrant/his code rather than DHCP server.
Cheers
-
Just for understanding, why Vagrant on the remote machine and not Terraform? Vagrant in my experience had been for local dev. Not saying it doesn't work and I thought I saw recently about remote systems with Vagrant, but terraform would most likely work much better.
-
@jimmy9008 Is this happening wiht a Windows VM or Linux?
If Windows, edit the base VM and go to the registry:
HKLM\System\CurrentControlSet\Service\tcpip\Parameters
Add a D-Word entry by the name of "ArpRetryCount" and set the value to 0.
That will help the DHCP issue.
-
@jimmy9008 said in Vagrant/DHCP problem:
Hi folks,
Long post...
So, we have a DevOps colleague who has setup automation with Vagrant to one of our Hyper-V hosts. I am not familiar Vagrant at all, but the allows his code to build VMs on one of our Hyper-V hosts. I am trying to rule out DHCP being the issue.
Most of the time the VMs are made successfully and get a DHCP address. Every once in a while, one of the automated VM fail to get a DHCP IP (well, they actually don't). (from being generated new)
We setup Wireshark to capture traffic for one of these failed events. When this fails I can see:
1 - the broadcast from the mac of the VM asking for any DHCP servers on the subnet
2 - the DHCP offer with the MAC of the client and the IP available
3 - the actual DHCP request from the client MAC with the IP which is available from step 2
4 - the DHCP Ack from the DHCP server confirming allocation completeI have checked the DHCP server and can see the record. Correct IP, MAC, and a machine name. Now, at this point I believe the entire flow has been successful. Now, this is where it goes wrong.
I connect to his VM in Hyper-V and login with the password he gave and type ipconfig /all
I get no IP, no subnet, but I do see DHCP = Yes, Gateway and DNS servers. I now type 'hostname', and the hostname is different to what is in DHCP!What I suspect:
- Vagrant makes the machine, it boots, gets an IP and is written to DHCP
- At some point, the code written gets the VM to change its host name and the VM reboots
- The VM (with its new name) asks DHCP server for the same IP
- DHCP server refuses as the hostname does not match the IP in its records
- Vagrant VM is left in a stage having no IP
Does this sound reasonable? I assumed if this were the case though that the DHCP server would send a NACK or something refusing the IP renewal/request, but do not see the traffic in Wireshark.
If I restart the VM or do ipconfig /renew, it does get the correct IP, and DHCP updates with the new name of the machine.
This once in a while happens to his Linux VM and his Windows Server VM which are made via Vagrant, leading me to believe the issue is Vagrant.The fact DHCP has a record of the IP, Name and MAC of the host before name change makes me think the issue is with Vagrant/his code rather than DHCP server.
Cheers
You might be confused by the hostname without it being a problem.
Normally it's the DHCP client that tells the server it's hostname in the DHCPREQUEST package.
So the hostname is set in the OS during installation and it's communicated to the DHCP server. The DHCP server stores this information in it's lease table. The hostname and IP can then be communicated from the DHCP server to DNS as well.
But it's also possible for the DHCP client to change the OS hostname based on a hostname the DHCP server sends - usually when having static DHCP reservations.
It's however not a requirement for the dhcp client to sends its hostname to the dhcp server and it's not a requirement that the dhcp client changes the hostname based on the hostname the DHCP server provided either. These are options that can be enabled or not.
Whatever the setup is, you can have the hostname inside the VM and the hostname on DHCP/DNS be different without anything being wrong.
I think you're on the right track using wireshark to figure out what is happening. I would have a close look on the MAC addresses to see what VM is doing what.
I don't know Vagrant but I don't see any reason for the DHCP server to supply a hostname to the VM when Vagrant is perfectly capable of setting the VMs hostname itself.
I would have a look at the DHCP server settings. Are you for instance using static reservations and are you setting hostnames from the DHCP server?
-
@stacksofplates said in Vagrant/DHCP problem:
Just for understanding, why Vagrant on the remote machine and not Terraform? Vagrant in my experience had been for local dev. Not saying it doesn't work and I thought I saw recently about remote systems with Vagrant, but terraform would most likely work much better.
Interesting question. I actually am not sure. The host and VM are development hardware. The tool made by the Devops guy is to spin up development environments of our various software we make, so assume that's why he uses vagrant. I do not know specifically though as I'm not in the development team.
-
@dafyre said in Vagrant/DHCP problem:
@jimmy9008 Is this happening wiht a Windows VM or Linux?
If Windows, edit the base VM and go to the registry:
HKLM\System\CurrentControlSet\Service\tcpip\Parameters
Add a D-Word entry by the name of "ArpRetryCount" and set the value to 0.
That will help the DHCP issue.
Both. Randomly. Maybe 1 in 50/100 runs.
-
@pete-s said in Vagrant/DHCP problem:
@jimmy9008 said in Vagrant/DHCP problem:
Hi folks,
Long post...
So, we have a DevOps colleague who has setup automation with Vagrant to one of our Hyper-V hosts. I am not familiar Vagrant at all, but the allows his code to build VMs on one of our Hyper-V hosts. I am trying to rule out DHCP being the issue.
Most of the time the VMs are made successfully and get a DHCP address. Every once in a while, one of the automated VM fail to get a DHCP IP (well, they actually don't). (from being generated new)
We setup Wireshark to capture traffic for one of these failed events. When this fails I can see:
1 - the broadcast from the mac of the VM asking for any DHCP servers on the subnet
2 - the DHCP offer with the MAC of the client and the IP available
3 - the actual DHCP request from the client MAC with the IP which is available from step 2
4 - the DHCP Ack from the DHCP server confirming allocation completeI have checked the DHCP server and can see the record. Correct IP, MAC, and a machine name. Now, at this point I believe the entire flow has been successful. Now, this is where it goes wrong.
I connect to his VM in Hyper-V and login with the password he gave and type ipconfig /all
I get no IP, no subnet, but I do see DHCP = Yes, Gateway and DNS servers. I now type 'hostname', and the hostname is different to what is in DHCP!What I suspect:
- Vagrant makes the machine, it boots, gets an IP and is written to DHCP
- At some point, the code written gets the VM to change its host name and the VM reboots
- The VM (with its new name) asks DHCP server for the same IP
- DHCP server refuses as the hostname does not match the IP in its records
- Vagrant VM is left in a stage having no IP
Does this sound reasonable? I assumed if this were the case though that the DHCP server would send a NACK or something refusing the IP renewal/request, but do not see the traffic in Wireshark.
If I restart the VM or do ipconfig /renew, it does get the correct IP, and DHCP updates with the new name of the machine.
This once in a while happens to his Linux VM and his Windows Server VM which are made via Vagrant, leading me to believe the issue is Vagrant.The fact DHCP has a record of the IP, Name and MAC of the host before name change makes me think the issue is with Vagrant/his code rather than DHCP server.
Cheers
You might be confused by the hostname without it being a problem.
Normally it's the DHCP client that tells the server it's hostname in the DHCPREQUEST package.
So the hostname is set in the OS during installation and it's communicated to the DHCP server. The DHCP server stores this information in it's lease table. The hostname and IP can then be communicated from the DHCP server to DNS as well.
But it's also possible for the DHCP client to change the OS hostname based on a hostname the DHCP server sends - usually when having static DHCP reservations.
It's however not a requirement for the dhcp client to sends its hostname to the dhcp server and it's not a requirement that the dhcp client changes the hostname based on the hostname the DHCP server provided either. These are options that can be enabled or not.
Whatever the setup is, you can have the hostname inside the VM and the hostname on DHCP/DNS be different without anything being wrong.
I think you're on the right track using wireshark to figure out what is happening. I would have a close look on the MAC addresses to see what VM is doing what.
I don't know Vagrant but I don't see any reason for the DHCP server to supply a hostname to the VM when Vagrant is perfectly capable of setting the VMs hostname itself.
I would have a look at the DHCP server settings. Are you for instance using static reservations and are you setting hostnames from the DHCP server?
Sorry. I wrote a lot. The DHCP server is not controlling the hostname of the client VM in any way.
What I meant was vagrant creates a VM with a vNIC, and a hostname called say 'vagrant-123', it boots, gets DHCP successfully, talks to the vagrant server/control/source (I'm not sure how that works). That orchestration 'thing' rolls out whatever is needed to the VM, then changes its name to say 'DevEnv15'.
Changing its host name makes it reboot. At that point, it sometimes no longer has its DHCP address. DHCP lists the original name.Upon ipconfig renew, the VM gets its IP back.
-
@jimmy9008 said in Vagrant/DHCP problem:
@pete-s said in Vagrant/DHCP problem:
@jimmy9008 said in Vagrant/DHCP problem:
Hi folks,
Long post...
So, we have a DevOps colleague who has setup automation with Vagrant to one of our Hyper-V hosts. I am not familiar Vagrant at all, but the allows his code to build VMs on one of our Hyper-V hosts. I am trying to rule out DHCP being the issue.
Most of the time the VMs are made successfully and get a DHCP address. Every once in a while, one of the automated VM fail to get a DHCP IP (well, they actually don't). (from being generated new)
We setup Wireshark to capture traffic for one of these failed events. When this fails I can see:
1 - the broadcast from the mac of the VM asking for any DHCP servers on the subnet
2 - the DHCP offer with the MAC of the client and the IP available
3 - the actual DHCP request from the client MAC with the IP which is available from step 2
4 - the DHCP Ack from the DHCP server confirming allocation completeI have checked the DHCP server and can see the record. Correct IP, MAC, and a machine name. Now, at this point I believe the entire flow has been successful. Now, this is where it goes wrong.
I connect to his VM in Hyper-V and login with the password he gave and type ipconfig /all
I get no IP, no subnet, but I do see DHCP = Yes, Gateway and DNS servers. I now type 'hostname', and the hostname is different to what is in DHCP!What I suspect:
- Vagrant makes the machine, it boots, gets an IP and is written to DHCP
- At some point, the code written gets the VM to change its host name and the VM reboots
- The VM (with its new name) asks DHCP server for the same IP
- DHCP server refuses as the hostname does not match the IP in its records
- Vagrant VM is left in a stage having no IP
Does this sound reasonable? I assumed if this were the case though that the DHCP server would send a NACK or something refusing the IP renewal/request, but do not see the traffic in Wireshark.
If I restart the VM or do ipconfig /renew, it does get the correct IP, and DHCP updates with the new name of the machine.
This once in a while happens to his Linux VM and his Windows Server VM which are made via Vagrant, leading me to believe the issue is Vagrant.The fact DHCP has a record of the IP, Name and MAC of the host before name change makes me think the issue is with Vagrant/his code rather than DHCP server.
Cheers
You might be confused by the hostname without it being a problem.
Normally it's the DHCP client that tells the server it's hostname in the DHCPREQUEST package.
So the hostname is set in the OS during installation and it's communicated to the DHCP server. The DHCP server stores this information in it's lease table. The hostname and IP can then be communicated from the DHCP server to DNS as well.
But it's also possible for the DHCP client to change the OS hostname based on a hostname the DHCP server sends - usually when having static DHCP reservations.
It's however not a requirement for the dhcp client to sends its hostname to the dhcp server and it's not a requirement that the dhcp client changes the hostname based on the hostname the DHCP server provided either. These are options that can be enabled or not.
Whatever the setup is, you can have the hostname inside the VM and the hostname on DHCP/DNS be different without anything being wrong.
I think you're on the right track using wireshark to figure out what is happening. I would have a close look on the MAC addresses to see what VM is doing what.
I don't know Vagrant but I don't see any reason for the DHCP server to supply a hostname to the VM when Vagrant is perfectly capable of setting the VMs hostname itself.
I would have a look at the DHCP server settings. Are you for instance using static reservations and are you setting hostnames from the DHCP server?
Sorry. I wrote a lot. The DHCP server is not controlling the hostname of the client VM in any way.
What I meant was vagrant creates a VM with a vNIC, and a hostname called say 'vagrant-123', it boots, gets DHCP successfully, talks to the vagrant server/control/source (I'm not sure how that works). That orchestration 'thing' rolls out whatever is needed to the VM, then changes its name to say 'DevEnv15'.
Changing its host name makes it reboot. At that point, it sometimes no longer has its DHCP address. DHCP lists the original name.Upon ipconfig renew, the VM gets its IP back.
Okay. Well, if you have a standard dhcp server and are not using static reservations, then the hostname has no influence on dhcp. It's the MAC address that determines what IP you are given.
If you want to check for the same VM doing several attempts at dhcp you should look for the same MAC address in wireshark. It highly unlikely that Vagrant changes the mac address after the VM has been created.
-
@stacksofplates said in Vagrant/DHCP problem:
Just for understanding, why Vagrant on the remote machine and not Terraform? Vagrant in my experience had been for local dev. Not saying it doesn't work and I thought I saw recently about remote systems with Vagrant, but terraform would most likely work much better.
Yeah exactly. Vagrant is the wrong tool here. It's great for testing locally. I use it to test automated scripting for immutable builds, but even in dev environments I use terraform to deploy resources that I expect to have around longer than a few hours.
-
@jimmy9008 said in Vagrant/DHCP problem:
@stacksofplates said in Vagrant/DHCP problem:
Just for understanding, why Vagrant on the remote machine and not Terraform? Vagrant in my experience had been for local dev. Not saying it doesn't work and I thought I saw recently about remote systems with Vagrant, but terraform would most likely work much better.
Interesting question. I actually am not sure. The host and VM are development hardware. The tool made by the Devops guy is to spin up development environments of our various software we make, so assume that's why he uses vagrant. I do not know specifically though as I'm not in the development team.
Neither is he, he's DevOps. That's infrastructure.
-
@irj said in Vagrant/DHCP problem:
@stacksofplates said in Vagrant/DHCP problem:
Just for understanding, why Vagrant on the remote machine and not Terraform? Vagrant in my experience had been for local dev. Not saying it doesn't work and I thought I saw recently about remote systems with Vagrant, but terraform would most likely work much better.
Yeah exactly. Vagrant is the wrong tool here. It's great for testing locally. I use it to test automated scripting for immutable builds, but even in dev environments I use terraform to deploy resources that I expect to have around longer than a few hours.
Yeah, and the #1 person who should know this and tell you it is wrong is... the DevOps guys!
-
@pete-s said in Vagrant/DHCP problem:
@jimmy9008 said in Vagrant/DHCP problem:
@pete-s said in Vagrant/DHCP problem:
@jimmy9008 said in Vagrant/DHCP problem:
Hi folks,
Long post...
So, we have a DevOps colleague who has setup automation with Vagrant to one of our Hyper-V hosts. I am not familiar Vagrant at all, but the allows his code to build VMs on one of our Hyper-V hosts. I am trying to rule out DHCP being the issue.
Most of the time the VMs are made successfully and get a DHCP address. Every once in a while, one of the automated VM fail to get a DHCP IP (well, they actually don't). (from being generated new)
We setup Wireshark to capture traffic for one of these failed events. When this fails I can see:
1 - the broadcast from the mac of the VM asking for any DHCP servers on the subnet
2 - the DHCP offer with the MAC of the client and the IP available
3 - the actual DHCP request from the client MAC with the IP which is available from step 2
4 - the DHCP Ack from the DHCP server confirming allocation completeI have checked the DHCP server and can see the record. Correct IP, MAC, and a machine name. Now, at this point I believe the entire flow has been successful. Now, this is where it goes wrong.
I connect to his VM in Hyper-V and login with the password he gave and type ipconfig /all
I get no IP, no subnet, but I do see DHCP = Yes, Gateway and DNS servers. I now type 'hostname', and the hostname is different to what is in DHCP!What I suspect:
- Vagrant makes the machine, it boots, gets an IP and is written to DHCP
- At some point, the code written gets the VM to change its host name and the VM reboots
- The VM (with its new name) asks DHCP server for the same IP
- DHCP server refuses as the hostname does not match the IP in its records
- Vagrant VM is left in a stage having no IP
Does this sound reasonable? I assumed if this were the case though that the DHCP server would send a NACK or something refusing the IP renewal/request, but do not see the traffic in Wireshark.
If I restart the VM or do ipconfig /renew, it does get the correct IP, and DHCP updates with the new name of the machine.
This once in a while happens to his Linux VM and his Windows Server VM which are made via Vagrant, leading me to believe the issue is Vagrant.The fact DHCP has a record of the IP, Name and MAC of the host before name change makes me think the issue is with Vagrant/his code rather than DHCP server.
Cheers
You might be confused by the hostname without it being a problem.
Normally it's the DHCP client that tells the server it's hostname in the DHCPREQUEST package.
So the hostname is set in the OS during installation and it's communicated to the DHCP server. The DHCP server stores this information in it's lease table. The hostname and IP can then be communicated from the DHCP server to DNS as well.
But it's also possible for the DHCP client to change the OS hostname based on a hostname the DHCP server sends - usually when having static DHCP reservations.
It's however not a requirement for the dhcp client to sends its hostname to the dhcp server and it's not a requirement that the dhcp client changes the hostname based on the hostname the DHCP server provided either. These are options that can be enabled or not.
Whatever the setup is, you can have the hostname inside the VM and the hostname on DHCP/DNS be different without anything being wrong.
I think you're on the right track using wireshark to figure out what is happening. I would have a close look on the MAC addresses to see what VM is doing what.
I don't know Vagrant but I don't see any reason for the DHCP server to supply a hostname to the VM when Vagrant is perfectly capable of setting the VMs hostname itself.
I would have a look at the DHCP server settings. Are you for instance using static reservations and are you setting hostnames from the DHCP server?
Sorry. I wrote a lot. The DHCP server is not controlling the hostname of the client VM in any way.
What I meant was vagrant creates a VM with a vNIC, and a hostname called say 'vagrant-123', it boots, gets DHCP successfully, talks to the vagrant server/control/source (I'm not sure how that works). That orchestration 'thing' rolls out whatever is needed to the VM, then changes its name to say 'DevEnv15'.
Changing its host name makes it reboot. At that point, it sometimes no longer has its DHCP address. DHCP lists the original name.Upon ipconfig renew, the VM gets its IP back.
Okay. Well, if you have a standard dhcp server and are not using static reservations, then the hostname has no influence on dhcp. It's the MAC address that determines what IP you are given.
If you want to check for the same VM doing several attempts at dhcp you should look for the same MAC address in wireshark. It highly unlikely that Vagrant changes the mac address after the VM has been created.
Funny enough, this environment is on its own dedicated vLAN with its own dedicated DHCP server. The DHCP server, Vagrant System, and the VMs that are created, are all on the same Hyper-V host.
Understood regarding name. We can drop that as it sounds like a red herring. From the Wireshark logs, I can see the entire DHCP request which shows as working successfully. When the built VM reboots following a name change (although the name change is a red herring), there are no follow up requests from the VM to the DHCP server. Upon rebooting, should the VM poll the DHCP server to see if the IP that was assigned moments ago is still available? Or, as the lease is longer will the VM just use it as it still has a lease?
Either way, the fact ipconfig shows the correct DNS servers shows me that the first attempt worked prior to restart, otherwise the DNS settings would be blank because that is given via DHCP. So, the problem looks to be the Vagrant VM losing or not holding the DHCP IP and Subnet in its config upon restart - once in a while...
-
@jimmy9008 said in Vagrant/DHCP problem:
Upon rebooting, should the VM poll the DHCP server to see if the IP that was assigned moments ago is still available? Or, as the lease is longer will the VM just use it as it still has a lease?
It should ask the dhcp server again.
This is from the standard https://datatracker.ietf.org/doc/html/rfc2131 :
When clients should use DHCP
A client SHOULD use DHCP to reacquire or verify its IP address and network parameters whenever the local network parameters may have changed; e.g., at system boot time or after a disconnection from the local network, as the local network configuration may change without the client's or user's knowledge.
If a client has knowledge of a previous network address and is unable to contact a local DHCP server, the client may continue to use the previous network address until the lease for that address expires. If the lease expires before the client can contact a DHCP server, the client must immediately discontinue use of the previous network address and may inform local users of the problem.
-
@scottalanmiller said in Vagrant/DHCP problem:
@irj said in Vagrant/DHCP problem:
@stacksofplates said in Vagrant/DHCP problem:
Just for understanding, why Vagrant on the remote machine and not Terraform? Vagrant in my experience had been for local dev. Not saying it doesn't work and I thought I saw recently about remote systems with Vagrant, but terraform would most likely work much better.
Yeah exactly. Vagrant is the wrong tool here. It's great for testing locally. I use it to test automated scripting for immutable builds, but even in dev environments I use terraform to deploy resources that I expect to have around longer than a few hours.
Yeah, and the #1 person who should know this and tell you it is wrong is... the DevOps guys!
Exactly even Hashicorp makes it clear that vagrant is for dev environments
-
This is how he should be deploying.