Restoring a domain controller
-
@Carnival-Boy said:
I have removed the other DNS server from the network adapter. But it shouldn't be necessary should it? Isn't one of the features of having two DNS servers listed that if one is not contactable the other will be used?
Yep it should, but that isn't always the case - at least in my experience. I've had windows clients that took 20+ mins to log into the domain because they had two DNS entries and the first one was offline. Once I changed the DNS order, the problem went away. (different time, but similar problem if the Primary DNS entry on the only remaining DC wasn't pointing to itself (either it's own IP or 127.0.0.1).
@Carnival-Boy said:
I'm not sure what you mean by manually switching to look at the restored DC. On the live DC, DNS manager only lists the DC on the left hand side.
Which DC is it listing? To help our understanding let's use some names: DC-01 and DC-02, assuming you only have two DCs. We'll also assume that you're restoring DC-01.
When you launch DNS Manger on DC-01, which server shows up there? FYI, it could be either DC-01 or DC-02. You can change it to look at the other by right clicking on DNS at the top, then choose connect to DNS server.
If for example, before the backup was taken of DC-01, you opened DNS Manager and pointed DNS Manager at DC-02, then took a backup and did a restore - the restored server should be trying to open DNS Manger pointing to DC-02, which in your case will fail because it's not part of your temp network. This is why I suggest that after DNS Manger is open on the restored DC-01, that you make sure it's pointed to itself - then close it, and reopen it. It should open faster this time. If not, you have other DNS issues (probably the one noted above).
-
What are the chances that DC-01 does not have all the FSMO roles? You're restoring into a vacuum and might be missing other critical roles on other servers.
-
Nope. That was one of the things I check already.
-
Are you still having issues after you changed the DNS settings on the IP configuration page?
-
Yeah, still no go. DNS Manager on DC-01 was set to look at DC-01, so no issues there. It still hung then errored looking for DC-02, but despite that error it was still looking at DC-01 as the primary DNS server. Removing DC-02 altogether means DNS manager loads instantly. But AD is still screwed.
In the network settings, DC-01 had itself as the primary DNS server, and DC-02 as the secondary. I guess that should be the other way round, although I've read arguments for doing it that way. Either way, I've removed DC-02 as the secondary on the restored DC-01.
A bit more background. The guy who set all this up also tried to get DirectAccess working. He spent an unbelievable 5 days working on DirectAccess and failed completely. I suspect that during this process he hacked around with AD and as a result did something to break it. This is only a hunch, and doesn't really help me now. He's not on the scene anymore.
-
No you had it right. It should point to itself as the primary DNS and only go over the network if its own DNS server fails. This dramatically reduces latency and load on the network.
-
@Reid-Cooper said:
No you had it right. It should point to itself as the primary DNS and only go over the network if its own DNS server fails. This dramatically reduces latency and load on the network.
I suppose, but in the SMB latency shouldn't be that big of an issue. I'd rather my DC boot faster by having it point to another DNS server as the primary and itself as a secondary.
-
CB - if you can afford the downtime, take DC-01 offline and make an image of it using something like Clonezilla. Then restore that image into your test environment and see if you have the same issues.
-
I can shut the server down, back it up, and then restore it, and it works just fine. It's just backing it up whilst online that causes the problem.
-
Have you opened a case with Veeam? Since a cold image works it definitely sounds like an issue with the way Veeam is backing things up.
-
I'm not sure about that. If I shut it down, services are shut down cleanly. If I backup live, it needs to boot into AD services non-authoritative restore mode. My understanding is that this is a Windows process and not really anything to do with Veeam.
I'd rather hold off calling Veeam until I've explored a few more avenues. I could test it with another backup product like Unitrends, I suppose. That could eliminate Veeam being the cause.
-
Some success:
I restored the PDC and let it boot twice and do it's non-authoritive restore thingy. As I mentioned in the OP, AD initially looks ok but after a few minutes it fails and I can't open AD users & computers.
I then restored the second DC. This DC doesn't have any primary roles.
After restoring the second DC, everything appears to be working. I can open AD users & computers on both DCs and I can add a PC to the domain.
I shouldn't have to restore the second DC, should I? The PDC should fix itself if it can't find it, shouldn't it?
So what do you think might be going on?
-
Correct, restoring a secondary DC is not recommended. Once a main DC is up and working, subsequent DCs should be built fresh rather than restored to avoid database issues.
-
are you sure of the locations of all of the roles including the Global Catalog?
-
I'm not sure of anything! Will check and report back.....
-
netdom /query FSMO shows that all roles are on the PDC.
AD Sites & Services shows that both DCs are Global Catalogs.
Anything else I should check?
-
When the restored DC is failing, what does Active Directory Best Practices Analyzer tell you is going on?
-
On both the live and restored DC, BPA is only giving one error - "The PDC emulator operations master in this forest is not configured to correctly synchronize time from a valid time source"
Could time be an issue?
Other than that, there are two other warnings on both the live and restored DC - "All OUs in this domain should be protected from accidental deletion" and "The DC should comply with the recommended best practices guidelines because it is running on a VM"
I also get a few warnings on the restored DC relating to the fact that AD hasn't been backed within the last 8 days, which I assume is because I'm restoring an old backup, and can be safely ignored.
-
Whoops. I ran BPA too soon and didn't give AD time to properly fail. Ran it again and get a load of errors beginning with "BPA is not able to collect data about...". The first one being "BPA is not able to collect data about.the name of the forest from the domain controller DC-01." and so on and so on.
I guess it can't analyze AD if AD isn't working.
-
This is just odd.
I'm currently out of ideas. I'd say open a case with Veeam and/or Microsoft (yeah it will cost ya).