Domain Controller Down (VM)
- 
 @Dashrender said in Domain Controller Down (VM): I didn't know what kind of medical facility @wirestyle22 was in.. If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in. Medical facilities with beds have generators and fuel. HVAC for something this small can be covered for redundancy with a spot cooler (I have this in my own house for my lab, so If I can afford it, you have to be a tiny outfit to not be able to afford it). I agree its a process, and the biggest piece is having a MSP to back you up, and having 24/7 dispatched resources to help you with the persistent layer. Not having redundancy at the people level is the biggest issue to address. While I normally advocate some kind of offsite ready to fire off DR, in the case of a facility like this its not actually as important (beyond BC reasons) because if the whole facility blows up the need for the system goes with it. Still there are a bazillion Veeam/VCAN partners who can cover this piece for cheap so why not. 
- 
 @scottalanmiller said in Domain Controller Down (VM): @Dashrender said in Domain Controller Down (VM): If HA is fully thought out and is felt is needed (don't forget about the power situation, and cooling, etc, etc, etc, - remember HA isn't a product, it's a process) then they should fully realize it. I'm guessing by the fact that the switches were 100 Mb that it really wasn't fully thought out, instead someone in the place of authority thought it sounded good and they tossed what they have in today in. It's as simple as "there was no HA and no attempt made at it." It would take me about 5 minutes to explain to a 3rd grader why the system he has isn't redundant is bad. The fact that it continues to exist shows that either... - Management has the intellectual capacity below a 3rd grader (possible)
- No one in non-jargon english explained how bad this configuration was. (more likely).
 
- 
 @John-Nicholson said in Domain Controller Down (VM): Its a medical facility that has beds occupied 24/7 so yes. That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way. Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements. 
- 
 @Dashrender said in Domain Controller Down (VM): As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT. 
 Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.The real cost of doing IT right is cheaper. Simply not having an onsite FTE, and having a MSP manage this stuff is likely cheaper (FTE's are expensive!). This outage might have been embarrassing enough for them to loose a patient or two (or worse someone die, and they get hit with a million wrongful death dollar lawsuit that spikes their premiums). Doing IT RIGHT includes understanding the capex and opex costs, and associated risks and external costs of doing IT right or wrong. Doing IT Wrong means wasting tons of money and getting an output that causes other costs. IT budgets do NOT exist in a vacuum to the rest of the operations and their output (Especially in 2016!). 
- 
 @John-Nicholson said in Domain Controller Down (VM): A proper MSP is like having an enterprise support army in your back pocket for less than the cost of a FTE. Honestly as a SMB you shouldn't hire an in house resource before you hire a MSP first, and any shop that doesn't want to pay for a MSP but will pay for a FTE is a GIANT red flag that they lack any level of competence in IT governance, budgeting, or common sense. I agree. Anyone going into an FTE role in an SMB should probably ask what their MSP ecosystem of support is like BEFORE accepting a position. That's something that we never talk about but is a great idea. They should either have a great answer (and the MSP should be likely part of the interview process) or they should be like "that's why we are bringing you in, to help us find those good resources." 
- 
 @John-Nicholson said in Domain Controller Down (VM): @Dashrender said in Domain Controller Down (VM): As for the rest, I generally agree with you. It shows the real costs of DOING IT RIGHT - but as most of us know - few SMBs are really willing to do what's right in IT. 
 Hell, just look at all of the threads in SW talking about print shops that couldn't upgrade their XP machines because their 10K+ printers didn't support anything newer. it's a never ending problem of knowing the real costs of doing something right.The real cost of doing IT right is cheaper. Simply not having an onsite FTE, and having a MSP manage this stuff is likely cheaper (FTE's are expensive!). This outage might have been embarrassing enough for them to loose a patient or two (or worse someone die, and they get hit with a million wrongful death dollar lawsuit that spikes their premiums). Doing IT RIGHT includes understanding the capex and opex costs, and associated risks and external costs of doing IT right or wrong. Doing IT Wrong means wasting tons of money and getting an output that causes other costs. IT budgets do NOT exist in a vacuum to the rest of the operations and their output (Especially in 2016!). "IT Right" isn't even a thing. IT is just part of the business. It's "running the business right." 
- 
 We actually did a video on that last night, it is being edited right now. 
- 
 @scottalanmiller said in Domain Controller Down (VM): @John-Nicholson said in Domain Controller Down (VM): Its a medical facility that has beds occupied 24/7 so yes. That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way. Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements. EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes. - 
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards) 
- 
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly. 
 
- 
- 
 @John-Nicholson said in Domain Controller Down (VM): @scottalanmiller said in Domain Controller Down (VM): @John-Nicholson said in Domain Controller Down (VM): Its a medical facility that has beds occupied 24/7 so yes. That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way. Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements. EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes. - 
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards) 
- 
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly. 
 That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority. We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage. 
- 
- 
 I totally understand that there are medical situations where high availability and high uptime are considered necessary and make sense in a business context. And I totally agree that this has the potential to be one of them. I'm only saying that it being possible doesn't make it so and that all indications from reading back their previous decisions, investments and behaviour suggest that they do not agree with that assessment. 
- 
 @John-Nicholson said in Domain Controller Down (VM): EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes. - 
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards) 
- 
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly. 
 1 - take your word for it at this point 
 2 - what prevents you from documenting on paper and then entering when the system comes up - every one I know operates this way, and they do get paid for those things that are transposed to electronic after the fact.
- 
- 
 @scottalanmiller said in Domain Controller Down (VM): @John-Nicholson said in Domain Controller Down (VM): @scottalanmiller said in Domain Controller Down (VM): @John-Nicholson said in Domain Controller Down (VM): Its a medical facility that has beds occupied 24/7 so yes. That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way. Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements. EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes. - 
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards) 
- 
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly. 
 That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority. We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage. This type of argument is something I see you make all the time. Just because the system didn't perform in the manner that they wanted/needed - doesn't mean that they weren't trying to obtain it just the same. What it does mean is that whoever they hired to accomplish that goal lied to them (assuming that really was the goal). If you're the business owner and don't know squat about IT, so you hire George the IT consultant - how is owner suppose to know that George did the job right or wrong? Unless you're telling me that the owner should be hiring a second consultant to look over George's work to make sure it was what the owner really wanted? 
- 
- 
 @scottalanmiller said in Domain Controller Down (VM): I totally understand that there are medical situations where high availability and high uptime are considered necessary and make sense in a business context. And I totally agree that this has the potential to be one of them. I'm only saying that it being possible doesn't make it so and that all indications from reading back their previous decisions, investments and behaviour suggest that they do not agree with that assessment. Again - read my previous post - Assuming the owner's aren't IT personal - how are they SUPPOSED to know? It was like John asking why WS didn't refresh the ISCSI connection instead of rebooting the whole switch - if he's never done it before, how's he suppose to know? All they can do is trust those that they hire to do what was asked. 
- 
 @Dashrender said in Domain Controller Down (VM): @scottalanmiller said in Domain Controller Down (VM): @John-Nicholson said in Domain Controller Down (VM): @scottalanmiller said in Domain Controller Down (VM): @John-Nicholson said in Domain Controller Down (VM): Its a medical facility that has beds occupied 24/7 so yes. That doesn't mean that. We can equally say they didn't have 24x7 IT staff so they don't need it. What they need, we have no way of knowing. If we read back what we know about their environment, it tells us that they didn't think that they needed HA in any way whatsoever. But that's all we have to go on. They operate around the clock, but that isn't an HA concern. And they implemented something so far from HA that it is laughable. So all we know is that they implemented anti-HA and spent a lot to do it. That's it. We have no indication that HA is warranted in any way. Just because a shop is 24x7 medical doesn't tell us that a specific system is needed 24x7 or that it needs to be available at all times. Those are very different requirements. EMR's on the system, I've yet to meet a medical facility who's SLA accepts a 12 hour outage for that, that has 24/7 manned beds, and ultimately its two things that drive medical standards and outcomes. - 
In America, a Jury in a wrongful death situation that are the arbitrators of what was acceptable or not in medical spending and outcomes (Lawsuits drive medical standards) 
- 
The federal government's willingness to reimburse you for spending. Any patent care administered while the system was down and not recording is not paid for, and while people have downtime procedures the risk of missing out on some juicy procedures or pills or other things means this can add up quickly. 
 That's fine, BUT the ONLY thing we know for certain is what they were willing to implement previously. We don't know what kind of medicine the work in, what risks there are, what EMR dependencies there are. Sure they can't bill for twelve hours, but that might cost them nothing while uptime costs something. All depends. What we DO know is that they didn't have the hardware, planning, documentation, staff or support organizations for anything other than what they got. So based on the sole information that we have, we can't assume that their business believes in uptime. Even during the outage, they made it VERY clear that getting it fixed was not a priority but that status updates, conversations and even other IT needs were the priority. We have a pretty uniform picture that uptime on this system is not perceived as important by the business decision makes, even during the panic fire of a real outage. This type of argument is something I see you make all the time. Just because the system didn't perform in the manner that they wanted/needed - doesn't mean that they weren't trying to obtain it just the same. What it does mean is that whoever they hired to accomplish that goal lied to them (assuming that really was the goal). I didn't say that it did. I said that it was the only information that we have and that every decision both planned and triage pointed to the same conclusion - that they don't care about uptime. That's it, period. ANYTHING other than this is someone here injecting personal opinion into the mix. Pushing HA where no HA is suggested. We have no reason to suspect that they ever felt that HA was going to happen. That's an assumption based on nothing at all. That doesn't mean that they didn't, it only means that there is zero evidence to suggest it. All evidence that we have points away. It's that simple. They took no actions towards HA, they didn't state that they wanted HA, they didn't provide documentation as to why HA would be needed, they didn't behave in an HA way. 
- 
- 
 @Dashrender said in Domain Controller Down (VM): If you're the business owner and don't know squat about IT, so you hire George the IT consultant - how is owner suppose to know that George did the job right or wrong? Unless you're telling me that the owner should be hiring a second consultant to look over George's work to make sure it was what the owner really wanted? You are only making an argument for why the evidence that they don't want HA is not very strong. I never said that it was. You are not even slightly making an argument that they wanted HA, only that we don't know much based on the evidence based on the assumption that they CEO is an moron and can't do his job. Other than that being a moderately safe assumption as it is generally the case in SMBs, it tells us nothing. I never stated anything to the contrary, so pointing this out doesn't dispute my point. 
- 
 What that DOES suggest, however, is that whatever body is in charge of the CEO felt that the CEO was able to do their job. At any step like this, you are assuming that the investors are not holding the board accountable, that the board is not holding the CEO accountable, that the CEO is not able to manage and hire at a business like level, that all decisions and oversight has been bad, that all management and results in real time are not what is wanted. Is all of that possible? Of course. But that is a lot of assumption to hoist onto a company based on zero evidence, just assumption that they "must want that." Given that all the evidence points away from them feeling that HA was warranted in any way, that even SA was not needed, and that they continued to act that way during the outage, de-prioritized the outage, basically ignored the outage and have never suggested otherwise, I think it's early to make such a sweeping assumption. 
- 
 @scottalanmiller said in Domain Controller Down (VM): @Dashrender said in Domain Controller Down (VM): If you're the business owner and don't know squat about IT, so you hire George the IT consultant - how is owner suppose to know that George did the job right or wrong? Unless you're telling me that the owner should be hiring a second consultant to look over George's work to make sure it was what the owner really wanted? You are only making an argument for why the evidence that they don't want HA is not very strong. I never said that it was. You are not even slightly making an argument that they wanted HA, only that we don't know much based on the evidence based on the assumption that they CEO is an moron and can't do his job. Other than that being a moderately safe assumption as it is generally the case in SMBs, it tells us nothing. I never stated anything to the contrary, so pointing this out doesn't dispute my point. These discussions always come into HA whenever a SAN is involved though - perhaps the discussion of HA should always be left at the door unless the OP specifically says something to the effect of - this was supposed to be an HA solution, why is it not - at which point you get to determine 1) was it really an HA solution 2) if it was, well then the answer to the "why is it not" is - even HA has a small % chance of failure, and unfortunately, you found that small chance. 
- 
 @Dashrender said in Domain Controller Down (VM): These discussions always come into HA whenever a SAN is involved though No they don't, it's the opposite. The SAN isn't even remotely HA here, it's not even a standard server but a lower level (but still decent) commodity server. It's not HA in the slightest, but anti-HA. So that the SAN is there tells us that they were okay with less than Standard Availability only, it in no way suggests anything of the sort that you are assuming. Now, as I keep saying, that doesn't stop it from being the case that everyone involved was a total screw up and they hired con artists who sold them out, no one reviewed the design, they got taken for a ride, they never looked into this, etc. There is just zero evidence of it. You are making an assumption on their behalf that they have not said, suggested or acted as if it were true. ALL existing evidence is to the contrary. None of it proves the point, but simply forcing an unfounded and unsuggested assumption on them is incorrect here. That they even thought of HA as an idea is something we have made up and assigned to them ourselves, it does not come from them at all... at least up to this point. 
- 
 @Dashrender said in Domain Controller Down (VM): perhaps the discussion of HA should always be left at the door unless the OP specifically says something to the effect of - this was supposed to be an HA solution, why is it not - Perhaps. Often someone says something that leads us to believe that there might have been a reason for HA. But this is a case where just nothing suggests it. Not even the lack of panic or concern once things were down. They really seemed to have no care that things were down - which is fine, their design suggests that. In this case, with @wirestyle22 not having been there, he too would have to decide how the system should be viewed. But he needs the information to take back to management "Oh, you thought that this would be HA? THat's odd as it is designed exactly the opposite of that in every way and all of the policies and procedures go against that as well. How were you expecting HA without HA processes, staffing, technology, design or setup? What led you to believe something so contrary would achieve HA?" 

