Restoring a domain controller
-
Nope. That was one of the things I check already.
-
Are you still having issues after you changed the DNS settings on the IP configuration page?
-
Yeah, still no go. DNS Manager on DC-01 was set to look at DC-01, so no issues there. It still hung then errored looking for DC-02, but despite that error it was still looking at DC-01 as the primary DNS server. Removing DC-02 altogether means DNS manager loads instantly. But AD is still screwed.
In the network settings, DC-01 had itself as the primary DNS server, and DC-02 as the secondary. I guess that should be the other way round, although I've read arguments for doing it that way. Either way, I've removed DC-02 as the secondary on the restored DC-01.
A bit more background. The guy who set all this up also tried to get DirectAccess working. He spent an unbelievable 5 days working on DirectAccess and failed completely. I suspect that during this process he hacked around with AD and as a result did something to break it. This is only a hunch, and doesn't really help me now. He's not on the scene anymore.
-
No you had it right. It should point to itself as the primary DNS and only go over the network if its own DNS server fails. This dramatically reduces latency and load on the network.
-
@Reid-Cooper said:
No you had it right. It should point to itself as the primary DNS and only go over the network if its own DNS server fails. This dramatically reduces latency and load on the network.
I suppose, but in the SMB latency shouldn't be that big of an issue. I'd rather my DC boot faster by having it point to another DNS server as the primary and itself as a secondary.
-
CB - if you can afford the downtime, take DC-01 offline and make an image of it using something like Clonezilla. Then restore that image into your test environment and see if you have the same issues.
-
I can shut the server down, back it up, and then restore it, and it works just fine. It's just backing it up whilst online that causes the problem.
-
Have you opened a case with Veeam? Since a cold image works it definitely sounds like an issue with the way Veeam is backing things up.
-
I'm not sure about that. If I shut it down, services are shut down cleanly. If I backup live, it needs to boot into AD services non-authoritative restore mode. My understanding is that this is a Windows process and not really anything to do with Veeam.
I'd rather hold off calling Veeam until I've explored a few more avenues. I could test it with another backup product like Unitrends, I suppose. That could eliminate Veeam being the cause.
-
Some success:
I restored the PDC and let it boot twice and do it's non-authoritive restore thingy. As I mentioned in the OP, AD initially looks ok but after a few minutes it fails and I can't open AD users & computers.
I then restored the second DC. This DC doesn't have any primary roles.
After restoring the second DC, everything appears to be working. I can open AD users & computers on both DCs and I can add a PC to the domain.
I shouldn't have to restore the second DC, should I? The PDC should fix itself if it can't find it, shouldn't it?
So what do you think might be going on?
-
Correct, restoring a secondary DC is not recommended. Once a main DC is up and working, subsequent DCs should be built fresh rather than restored to avoid database issues.
-
are you sure of the locations of all of the roles including the Global Catalog?
-
I'm not sure of anything! Will check and report back.....
-
netdom /query FSMO shows that all roles are on the PDC.
AD Sites & Services shows that both DCs are Global Catalogs.
Anything else I should check?
-
When the restored DC is failing, what does Active Directory Best Practices Analyzer tell you is going on?
-
On both the live and restored DC, BPA is only giving one error - "The PDC emulator operations master in this forest is not configured to correctly synchronize time from a valid time source"
Could time be an issue?
Other than that, there are two other warnings on both the live and restored DC - "All OUs in this domain should be protected from accidental deletion" and "The DC should comply with the recommended best practices guidelines because it is running on a VM"
I also get a few warnings on the restored DC relating to the fact that AD hasn't been backed within the last 8 days, which I assume is because I'm restoring an old backup, and can be safely ignored.
-
Whoops. I ran BPA too soon and didn't give AD time to properly fail. Ran it again and get a load of errors beginning with "BPA is not able to collect data about...". The first one being "BPA is not able to collect data about.the name of the forest from the domain controller DC-01." and so on and so on.
I guess it can't analyze AD if AD isn't working.
-
This is just odd.
I'm currently out of ideas. I'd say open a case with Veeam and/or Microsoft (yeah it will cost ya).
-
@Carnival-Boy said:
I can shut the server down, back it up, and then restore it, and it works just fine. It's just backing it up whilst online that causes the problem.
That's just how databases work. They can't be backed up live reliably. They need to be taken offline to get a reliable backup typically.
-
@Dashrender said:
Have you opened a case with Veeam? Since a cold image works it definitely sounds like an issue with the way Veeam is backing things up.
Veeam doesn't handle the snapshot, that is the hypervisor. Veeam backs up what it is given.