DNS Update Issue
-
@Dashrender said in DNS Update Issue:
And it doesn't matter that public is in use here. This applies equally to other internal servers, too. What if you failed to a slow DNS over a throttled WAN link and now are stuck with it because Windows never goes back to local primary?
OK - you do have a point here. though trying each and everytime does seem like overkill and lag inducing. I could see checking once a min or something.
It might seem like overkill, but it's not. It's the simplest, fastest solution. I think the crux here is that you perceive that delay as being far more dramatic and important than it is. And I suspect that you believe DNS failures are more common and long term than they typically are.
The impact of that "trying every time" is undetectable to normal users, remember their local systems cache so it's super trivial to have it do this in the real world. And normal failures for DNS are insanely short lived, like seconds or a minute as a server reboots, typically.
In the real world, doing secondary lookups for a full minute when the server is already back is the actual overkill, on average.
-
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@wirestyle22 said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@wirestyle22 said in DNS Update Issue:
Does anyone know what event causes this in Windows?
Cause what, the NIC to flip? I've heard Windows people say that it's just a bug and it does it randomly. I know that it could happen from a DNS server being unavailable for a split second, just long enough to fail a lookup.
That was my initial thought. So what--Linux OSes are checking periodically to see if they are using the first entry and Windows doesn't care until there's a hiccup?
Linux checks every time, I believe. That's the expected behaviour. It always uses its list top to bottom, it doesn't "change" primary just because it wants to.
See this just seems odd to me - why add in that delay every time.
You said that it seemed odd to you, "why add in that delay every time."
It shouldn't be odd, it should be super obvious as by far the best way. And that "delay every time" is an imperceptible delay .001% of the time. It only seems like "Every time" if you assume random DNS choices like people keep saying that Windows makes (I'm not convinced of this). Since Linux DNS is deterministic, it only adds that minuscule delay under failure conditions which in this day and age are super, duper rare (unless, apparently, you have Windows then the desktop seems to inject a server-like failure condition on its own.)
You make it sound like this is a foolish approach, but it fixes the problems everyone is reporting with essentially no downsides.
Well, I've missed the recent posts where people had sorta messed up DNS configs (Wirestyle's were completely hosed, not just public as a secondary issue), so I'm not sure where the recent issue is coming from - I just must have missed them.
The Linux way is also assuming that the failure most likely was simply intermittent and that the primary will be back online nearly instantly, and frankly, using public DNS that totally makes sense. But we could hope that wouldn't be the case on a local network - and again, I'm not sure it still is a real issue.
Does the linux way make things more transparent to the user? Sure does. And the cost, as you said, it pretty damned low... So fine - I'll give you all that, and if Windows changed to that method I definitely wouldn't complain.
-
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
And it doesn't matter that public is in use here. This applies equally to other internal servers, too. What if you failed to a slow DNS over a throttled WAN link and now are stuck with it because Windows never goes back to local primary?
OK - you do have a point here. though trying each and everytime does seem like overkill and lag inducing. I could see checking once a min or something.
It might seem like overkill, but it's not. It's the simplest, fastest solution. I think the crux here is that you perceive that delay as being far more dramatic and important than it is. And I suspect that you believe DNS failures are more common and long term than they typically are.
The impact of that "trying every time" is undetectable to normal users, remember their local systems cache so it's super trivial to have it do this in the real world. And normal failures for DNS are insanely short lived, like seconds or a minute as a server reboots, typically.
In the real world, doing secondary lookups for a full minute when the server is already back is the actual overkill, on average.
you undoubtedly have data that shows DNS outages are that short lived, I assume.
I know I know - you'll ask me for data that shows that DNS outages are longer.. tit for tat.
-
@Dashrender said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
And it doesn't matter that public is in use here. This applies equally to other internal servers, too. What if you failed to a slow DNS over a throttled WAN link and now are stuck with it because Windows never goes back to local primary?
OK - you do have a point here. though trying each and everytime does seem like overkill and lag inducing. I could see checking once a min or something.
It might seem like overkill, but it's not. It's the simplest, fastest solution. I think the crux here is that you perceive that delay as being far more dramatic and important than it is. And I suspect that you believe DNS failures are more common and long term than they typically are.
The impact of that "trying every time" is undetectable to normal users, remember their local systems cache so it's super trivial to have it do this in the real world. And normal failures for DNS are insanely short lived, like seconds or a minute as a server reboots, typically.
In the real world, doing secondary lookups for a full minute when the server is already back is the actual overkill, on average.
you undoubtedly have data that shows DNS outages are that short lived, I assume.
I know I know - you'll ask me for data that shows that DNS outages are longer.. tit for tat.
The average DNS outage is a server reboot. Think about an AD environment with two AD servers. You do updates and reboot all of the time, that's an outage to the clients looking at that specific server. In the Linux case, it would only use the backup entry for the moments while the service is restarting. In Windows, apparently, it simply abandones that server until it has no choice but to return.
-
@Dashrender said in DNS Update Issue:
The Linux way is also assuming that the failure most likely was simply intermittent and that the primary will be back online nearly instantly, and frankly, using public DNS that totally makes sense. But we could hope that wouldn't be the case on a local network - and again, I'm not sure it still is a real issue.
Even private DNS, what kind of failure do you have where you assume that the outage will be a long time, but not so long that DHCP updates are in order? That's a pretty rare, small window of failures. DNS restarts (outages) are common. Total failures are once every 5-10 years if we are talking enterprise AD DNS setups. Typically it would be totally dead hardware - but only in a case where a backup and restore aren't an option.
DNS is something that restarts very quickly, and can be restored very quickly. And can normally be adjusted almost instantly via DHCP or state management, however you manage DNS in your environment.
So even in pretty extreme failures, a DNS failures is usually intermittent, even in a purely internal DNS situation.
-
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
The Linux way is also assuming that the failure most likely was simply intermittent and that the primary will be back online nearly instantly, and frankly, using public DNS that totally makes sense. But we could hope that wouldn't be the case on a local network - and again, I'm not sure it still is a real issue.
Even private DNS, what kind of failure do you have where you assume that the outage will be a long time, but not so long that DHCP updates are in order? That's a pretty rare, small window of failures. DNS restarts (outages) are common. Total failures are once every 5-10 years if we are talking enterprise AD DNS setups. Typically it would be totally dead hardware - but only in a case where a backup and restore aren't an option.
DNS is something that restarts very quickly, and can be restored very quickly. And can normally be adjusted almost instantly via DHCP or state management, however you manage DNS in your environment.
So even in pretty extreme failures, a DNS failures is usually intermittent, even in a purely internal DNS situation.
We both agree that Windows NEVER switching back is bad. let's move past that. Now the question is - is it worth it to test on every single DNS query.
From a coding POV, it's probably much simpler to test every time than setting a time variable and waiting for that to expire before trying the primary again - so fine.. you win. -
@Dashrender said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
The Linux way is also assuming that the failure most likely was simply intermittent and that the primary will be back online nearly instantly, and frankly, using public DNS that totally makes sense. But we could hope that wouldn't be the case on a local network - and again, I'm not sure it still is a real issue.
Even private DNS, what kind of failure do you have where you assume that the outage will be a long time, but not so long that DHCP updates are in order? That's a pretty rare, small window of failures. DNS restarts (outages) are common. Total failures are once every 5-10 years if we are talking enterprise AD DNS setups. Typically it would be totally dead hardware - but only in a case where a backup and restore aren't an option.
DNS is something that restarts very quickly, and can be restored very quickly. And can normally be adjusted almost instantly via DHCP or state management, however you manage DNS in your environment.
So even in pretty extreme failures, a DNS failures is usually intermittent, even in a purely internal DNS situation.
We both agree that Windows NEVER switching back is bad. let's move past that. Now the question is - is it worth it to test on every single DNS query.
From a coding POV, it's probably much simpler to test every time than setting a time variable and waiting for that to expire before trying the primary again - so fine.. you win.A wait "called a stand off period" would be easy, not AS easy, but trivially easy. But I think in the real world, it's not as ideal. With how DNS works today (not in the 1990s) I think it is what you would want. Having any stand off period would introduce more overhead (on average) that it would resolve. Because normal outages are so tiny, and so much DNS is cached.
-
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
-
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
-
-
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
That's a pretty awful process. I mean... horrendous. Kill a server just to get clients back to where you want them to be?
ANd since it is random, that doesn't even work.
-
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
-
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
@scottalanmiller is describing my setup because he has seen it.
-
@Donahue said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
@scottalanmiller is describing my setup because he has seen it.
It's a common, real world setup that makes sense. But non-deterministic DNS behaviour from Windows would be less than ideal for use in that environment. Not a show stopper, especially with a Gig link between sites, but a silly problem to have that doesn't need to exist.
-
@scottalanmiller the only problem with Microsoft Windows DNS Clients when used for authentication that it is so random to choose which DC to login to which makes it so unpredictable. But I know that it out of the main scope of this discussion, but wanted to clarify that.
-
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
I've never seen this. We have a DC at one site and a DC at another site. DNS at the computers primary site is always preferred.
Perhaps the whole issue is because AD sites aren't set up properly.
I'll test this now in a properly set up AD environment.
-
@dbeato said in DNS Update Issue:
@scottalanmiller the only problem with Microsoft Windows DNS Clients when used for authentication that it is so random to choose which DC to login to which makes it so unpredictable. But I know that it out of the main scope of this discussion, but wanted to clarify that.
Performance of the DNS itself and traffic on a limited WAN link are also concerns.
-
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
I've never seen this. We have a DC at one site and a DC at another site. DNS at the computers primary site is always preferred.
Perhaps the whole issue is because AD sites aren't set up properly.
I'll test this now in a properly set up AD environment.
I've not seen it either. But it is the reason behind most everyone's decision making around WIndows DNS clients. So my point is that either that reason is false and Windows doesn't actually do this, or it's significant.
-
@scottalanmiller said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
I've never seen this. We have a DC at one site and a DC at another site. DNS at the computers primary site is always preferred.
Perhaps the whole issue is because AD sites aren't set up properly.
I'll test this now in a properly set up AD environment.
I've not seen it either. But it is the reason behind most everyone's decision making around WIndows DNS clients. So my point is that either that reason is false and Windows doesn't actually do this, or it's significant.
Okay, so... my conclusive results.
I did some testing the only way I could figure out... if there's more I can do, let me know.
(EDIT: I did some additional testing noted below the numbered list)On a Windows machine, I:
- Ping / test nslookup to a server. Make note of DNS server used for NSLookup.
- Block IP of primary DC/DNS/Domain server via firewall.
- ipconfig /flushdns
- Test ping / test nslookup a server. Make note of DNS server used.
- UNblock IP of primary DC/DNS/Domain server.
- Test ping / test nslookup a server. Make note of DNS server used.
Results:
While the IP of primary DC/DNS/Domain server was blocked, an nslookup resulted in error, that the DNS request timed out, however, I was still able to ping all internal devices by name, as well as external websites. Then I unblocked it, and was still able to ping by name and nslookup was working normally again.Additional Testing:
To test for real DNS blockage, I added all of our internal DNS servers to the block list and tested pinging stuff by name. Unable to ping any internal device by name.So now I unblocked, let's call it DC/DNS-2, and left everything else blocked. I was now able to instantly ping things by name, guaranteed to using secondary DC/DNS server.
Upon unblocking the primary DC/DNS server, nslookup worked instantly and I was also able to continue pinging things. Same when blocking the other DNS server again.
Keep in mind that before each test, I performed a
ipconfig /flushdns
command to be sure.I wasn't yet convinced, so to completely mitigate any unknowns, as a last path, I enabled Debug Logging on the primary and secondary DC/DNS server filtering stuff from the workstation i'm testing from. I had two powershell windows open side by side, -Tailing the DNS log file from each DC/DNS server.
From there, I continued having the primary DC/DNS server blocked (all DC/DNS servers except the secondary). I removed the primary DC/DNS ip address from being blocked on the workstation, performed a flushdns, and pinged something.
Within a minute I flushed DNS then pinged something internal by name, and it used the primary DNS server, not the secondary one that it was using before I unblocked the primary DNS ip.
This proves it is instant (or nearly instant, as there was about a minute between the time I unblocked the IP and sent out a ping), at least on Windows 10 Pro 1803, in a MS AD Domain environment.
-
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
@Dashrender said in DNS Update Issue:
@Obsolesce said in DNS Update Issue:
@scottalanmiller said in DNS Update Issue:
In Windows, apparently, it simply abandones that server until it has no choice but to return.
I don't see any issue there. You're getting DNS either way, what's it matter what it's from if they are the same? If clients are getting DNS from the failover DNS server and you don't want it to, turn off the DNS service on that server then, and clients will fail back... if you even care.
The problem happens when your secondary server isn't part of your internal network (assuming your primary is part of your internal network). When using the secondary you won't get resolution for internal network resources.
That's the BIG problem. But not the only one. Take a common manufacturing plant with one AD at one site, and the other one at a different site. If you can't choose primary or secondary, then failover means slow DNS over a WAN link - potentially for weeks or months at a time. Sometimes for no reason at all, or something as simple as having rebooting the local one.
It's not just wanting to use a public source, that clouds the issue. Lots of people don't want to use public ever, so ignore that. It's bad behaviour regardless.
I've never seen this. We have a DC at one site and a DC at another site. DNS at the computers primary site is always preferred.
Perhaps the whole issue is because AD sites aren't set up properly.
I'll test this now in a properly set up AD environment.
I've not seen it either. But it is the reason behind most everyone's decision making around WIndows DNS clients. So my point is that either that reason is false and Windows doesn't actually do this, or it's significant.
Okay, so... my conclusive results.
I did some testing the only way I could figure out... if there's more I can do, let me know.
(EDIT: I did some additional testing noted below the numbered list)On a Windows machine, I:
- Ping / test nslookup to a server. Make note of DNS server used for NSLookup.
- Block IP of primary DC/DNS/Domain server via firewall.
- ipconfig /flushdns
- Test ping / test nslookup a server. Make note of DNS server used.
- UNblock IP of primary DC/DNS/Domain server.
- Test ping / test nslookup a server. Make note of DNS server used.
Results:
While the IP of primary DC/DNS/Domain server was blocked, an nslookup resulted in error, that the DNS request timed out, however, I was still able to ping all internal devices by name, as well as external websites. Then I unblocked it, and was still able to ping by name and nslookup was working normally again.Additional Testing:
To test for real DNS blockage, I added all of our internal DNS servers to the block list and tested pinging stuff by name. Unable to ping any internal device by name.So now I unblocked, let's call it DC/DNS-2, and left everything else blocked. I was now able to instantly ping things by name, guaranteed to using secondary DC/DNS server.
Upon unblocking the primary DC/DNS server, nslookup worked instantly and I was also able to continue pinging things. Same when blocking the other DNS server again.
Keep in mind that before each test, I performed a
ipconfig /flushdns
command to be sure.I wasn't yet convinced, so to completely mitigate any unknowns, as a last path, I enabled Debug Logging on the primary and secondary DC/DNS server filtering stuff from the workstation i'm testing from. I had two powershell windows open side by side, -Tailing the DNS log file from each DC/DNS server.
From there, I continued having the primary DC/DNS server blocked (all DC/DNS servers except the secondary). I removed the primary DC/DNS ip address from being blocked on the workstation, performed a flushdns, and pinged something.
Within a minute I flushed DNS then pinged something internal by name, and it used the primary DNS server, not the secondary one that it was using before I unblocked the primary DNS ip.
This proves it is instant (or nearly instant, as there was about a minute between the time I unblocked the IP and sent out a ping), at least on Windows 10 Pro 1803, in a MS AD Domain environment.
I'm confused, if you used /flushdns, how does it prove anything? No one is questioning what happens when DNS is manually manipulated. What would it do if an actual end user was using it, is the question. From what I can tell, you never tested the scenarios we are talking about.
Try it naturally, without flushing in between changes. Does DNS change instantly, on its own? Delayed, on its own? Never without intervention?