RAID 10, 20 Disks, How Many Hot Spares



  • This question was asked and I can't get the details of it right now, but while there is no way to get the details I figured that I would get the ball rolling on behalf of the person that was asking it. So the little that I know about the scenario is that there is a single RAID array of 20 spinning disks in RAID 10 and the person asking wants to know how many hot spares would be recommended.



  • Zero with a couple of cold spares is what I'd suggest

    Thats a lot of disks in a single array



  • Generally, I would say that no hot spares would be needed. Mirrored RAID is insanely reliable and rebuilds very quickly and does not experience the increases drive failure risks of parity RAID. Also the risks of RAID 10 do not grow exponentially like parity RAID but linearly. Each mirrored set is discrete. So with rare exception, I would stick with zero hot spares.



  • In a study of 160,000 RAID 1 array years, we lost zero arrays. Using hot spares is a crutch for people on parity RAID where risks are much higher. Going to mirrored, one of the key benefits is a lack of needing hot spares.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    Zero with a couple of cold spares is what I'd suggest

    If you have cold spares and you have available drive bays, you want them to be hot spares.



  • @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    Zero with a couple of cold spares is what I'd suggest

    If you have cold spares and you have available drive bays, you want them to be hot spares.

    Why is that?

    Aren't they still running and thus adding to the MTBF number?



  • @BRRABill said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    Zero with a couple of cold spares is what I'd suggest

    If you have cold spares and you have available drive bays, you want them to be hot spares.

    Why is that?

    Aren't they still running and thus adding to the MTBF number?

    MTBF is a total myth, it's essentially useless. And no, they are not running, they are just sitting there, so they are not spinning and wearing out. But even if they were, it doesn't work like that. Yes they would wear out eventually, but at a fraction of the speed as if they were in actual use. But they are not, just sitting there idle.



  • @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    Zero with a couple of cold spares is what I'd suggest

    If you have cold spares and you have available drive bays, you want them to be hot spares.

    Why wouldn't you want them to be part of the array? WTF is the point of hot spares in an array that does not need to resilver?

    aka 20 drive array, 4 free slots -> populate free slots, redo array to 24 drives



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    Zero with a couple of cold spares is what I'd suggest

    If you have cold spares and you have available drive bays, you want them to be hot spares.

    Why wouldn't you want them to be part of the array? WTF is the point of hot spares in an array that does not need to resilver?

    aka 20 drive array, 4 free slots -> populate free slots, redo array to 24 drives

    Because the larger the array, the higher the risk. So adding them to the array works against the purpose of the spares. Having them be cold doesn't provide as much value as being hot. Having them be hot, once you've paid for the protection, provides the best value in terms of safety.



  • @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    Zero with a couple of cold spares is what I'd suggest

    If you have cold spares and you have available drive bays, you want them to be hot spares.

    Why wouldn't you want them to be part of the array? WTF is the point of hot spares in an array that does not need to resilver?

    aka 20 drive array, 4 free slots -> populate free slots, redo array to 24 drives

    Because the larger the array, the higher the risk. So adding them to the array works against the purpose of the spares. Having them be cold doesn't provide as much value as being hot. Having them be hot, once you've paid for the protection, provides the best value in terms of safety.

    That's a slippery slope argument against having more IOPS/space. I disagree and would populate the array to as many drives as I could. Hot spares have little or no value in SMB space where it's on premises and you have easy access.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    That's a slippery slope argument against having more IOPS/space.

    Not in the least. The purpose of the spares was for protection against failure. To buy something for one purpose and then use it for a counter-purpose is crazy. That's like saying "well we bought these bullets to protect ourselves against invaders, but since we have them let's shoot ourself in the foot with them so that we get use out of them." The goal is one thing, then you didn't just use the drives for a different purpose, but one that goes directly against the goal.



  • @scottalanmiller Then we disagree on how much protection vs space/IOPS is warranted on a theoretical array. I see no value in wasting slots you paid a F(@*# ton of money for.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    I disagree and would populate the array to as many drives as I could. Hot spares have little or no value in SMB space where it's on premises and you have easy access.

    Hot spares and the SMB have no relationship. SMBs should be on premises the least, not the most. Combining mistakes doesn't make sense. You are making several assumptions that are wrong...

    • That more IOPS are important enough to increase risk over past the point of the array being spec'd out already.
    • That additional capacity has benefits beyond the specification point.
    • That hot spares have no value, they always have some value in a mirrored array.
    • That SMBs will be on premises.

    All of those are wrong or potentially wrong. What we know in the OP's case is that the array was spec'd, now they are looking to invest in additional protection. Your recommendation is not just to invest, but to invest against protection and re-spec the array based on no data of the needs of the array at all. What if it is already way more IOPS and capacity than needed and any additional is just waste, but their risk aversion is high and the server is hosted on an island with no easy access?



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller Then we disagree on how much protection vs space/IOPS is warranted on a theoretical array.

    No, you are having a discussion about protection vs space/IOPS and I am not. It's that simple. You are making a point that doesn't related to the question. The question is about investing in protection. You believe that "more capacity" is always better, even if there is no use for it?



  • @scottalanmiller Argue a use case then, don't dance around making me chase you.

    My use case is on prem easy access. Define yours and maybe we can agree on something.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    I see no value in wasting slots you paid a F(@*# ton of money for.

    No one said to waste them. They are talking about investing in additional protection.

    Putting drives into the array when the IOPS and capacity are not needed is 100% wasted. So you just defeated your own point, there.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller Argue a use case then, don't dance around making me chase you.

    DOn't need a use case, risk aversion is the key. IOPS and capacity are spec'd properly, no more needed. Risk of the array is a concern. Hot spares would lower the risk, enlarging the array would increase the risk. This isn't complex. There is a goal: reducing risk. Your proposal is to undermine the goal for what reason? What makes you believe that risk protection is always bad and that higher risk is always good? Where would you stop with that logic? Always buy the biggest, fastest drives in the biggest possible arrays?



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    My use case is on prem easy access. Define yours and maybe we can agree on something.

    1. No one even suggested that on prem was going on, that's a totally false assumption. So you can't make up a use case and then use it to make the "it's always this way."
    2. Just because on prem is easy doesn't make off hours easy.
    3. Just because on prem is easy doesn't mean that wasting money on cold spares makes sense when hot spares are more reliable and less effort.
    4. Just because on prem is easy doesn't mean that we should increase risk for no known reason when the goal was to reduce risk.


  • @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller Argue a use case then, don't dance around making me chase you.

    DOn't need a use case, risk aversion is the key.

    Bullshit, this is how you gish gallop all over anyone who disagrees with you - moving the goal posts. It's irritating as fuck tbh.

    IOPS and capacity are spec'd properly, no more needed.

    Again, this is crap - I know you can do better.

    Give a real world example.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller Argue a use case then, don't dance around making me chase you.

    DOn't need a use case, risk aversion is the key.

    Bullshit, this is how you gish gallop all over anyone who disagrees with you - moving the goal posts. It's irritating as fuck tbh.

    Sorry, but thats exactly what didn't happen. The goal never moved, at all. The goal was to reduce risk, you have a personal agenda that risk should never be reduced only increased and you are saying anything, including now making a personal attack, to support it. But you are not at all looking at the needs of the OP, just interjecting some personal goal that doesn't align.

    No moving goal posts, none. You made up a new goal that didn't exist.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    IOPS and capacity are spec'd properly, no more needed.

    Again, this is crap - I know you can do better.

    Give a real world example.

    Not crap at all, it's how we do IT. You believe that "more is always better", no matter what. But only in IOPS and capacity, not in protection? By that logic, RAID 0 is always the best choice, right?



  • @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    My use case is on prem easy access. Define yours and maybe we can agree on something.

    1. No one even suggested that on prem was going on, that's a totally false assumption. So you can't make up a use case and then use it to make the "it's always this way."

    No one said it wasn't

    1. Just because on prem is easy doesn't make off hours easy.

    It does in my case

    1. Just because on prem is easy doesn't mean that wasting money on cold spares makes sense when hot spares are more reliable and less effort.

    Sure it does, in some circumstances - this is why you should define a use case so we can have a real discussion

    1. Just because on prem is easy doesn't mean that we should increase risk for no known reason when the goal was to reduce risk.

    Sure it does. This is not a black and white case, there are shades of grey.



  • Why do you need a scenario? We are past scenarios, we know the goal within the context which is to reduce risk. It's that simple. You are trying to make it complex so that you can take an arbitrary scenario and hope to shoot it down when we don't know the exact scenario, only the goal of risk reduction. Why are you so opposed to someone having a properly designed array for speed and capacity and considering lowering their risks? What's actually going on?



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    My use case is on prem easy access. Define yours and maybe we can agree on something.

    1. No one even suggested that on prem was going on, that's a totally false assumption. So you can't make up a use case and then use it to make the "it's always this way."

    No one said it wasn't

    So because you inject your own details and no one specifically disputes them, they become true?



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    1. Just because on prem is easy doesn't make off hours easy.

    It does in my case

    And your case is not in question, so this is a red herring.



  • @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    My use case is on prem easy access. Define yours and maybe we can agree on something.

    1. No one even suggested that on prem was going on, that's a totally false assumption. So you can't make up a use case and then use it to make the "it's always this way."

    No one said it wasn't

    So because you inject your own details and no one specifically disputes them, they become true?

    That seems to be what you do 😛



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    1. Just because on prem is easy doesn't mean that wasting money on cold spares makes sense when hot spares are more reliable and less effort.

    Sure it does, in some circumstances - this is why you should define a use case so we can have a real discussion

    Nope, cold spares don't work that way. If you have that magic use case, you can provide it. I know of no case where cold spares are better than hot ones except when the array is full for other reasons (not the case here - so we have your example case right now) or where you need to share them between many arrays (no reason to inject that odd assumption here.)

    There is zero need for a use case, we know the factors already. That you CAN come up with a use case where these things are not true based on changing the fundamental goals is totally non-applicable to the situation.



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    @scottalanmiller said in RAID 10, 20 Disks, How Many Hot Spares:

    @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    My use case is on prem easy access. Define yours and maybe we can agree on something.

    1. No one even suggested that on prem was going on, that's a totally false assumption. So you can't make up a use case and then use it to make the "it's always this way."

    No one said it wasn't

    So because you inject your own details and no one specifically disputes them, they become true?

    That seems to be what you do 😛

    Okay, what detail did I interject? I'm working from the OP and nothing else. What have I added?



  • @MattSpeller said in RAID 10, 20 Disks, How Many Hot Spares:

    1. Just because on prem is easy doesn't mean that we should increase risk for no known reason when the goal was to reduce risk.

    Sure it does. This is not a black and white case, there are shades of grey.

    Whoa, you just said that "sure it does" meaning it's black and white and is always one thing. Then you say that there are shades of grey . Which is it, it can't be both. I made the case that it wasn't black and white, you disagreed and then said I was right.



  • The OP is asking about one thing... how many hot spares to add to data protection in an array of this size. That's it. There are zero questions about needing more capacity or performance. None, zero. There is no info on where the array is hosted, none. The question is about one thing... risk. Risk and only risk. How much risk reduction is generally recommended.

    Obviously the OP didn't provide enough info for anything but general cases and general guidelines. But what we know from the asking of the question is that their concern is "how much do they need to lower their risk." That's the only thing that they are asking. They aren't asking how to "best use additional drives", if they needed more drives we can assume that they would have a larger array than they do and would be asking about how many hot spares on a larger array.

    We don't know if hot spares make sense, we don't have enough details. We only know that they rarely make sense in a 20 disk RAID 10. We do know that hot spares are always better than cold spares if the slots are empty otherwise, unless the cold spares need to be shared to other chassis to save money. But that's it. And since the question is about a single array, not a group of arrays, we have to ignore the use case where cold spares are a consideration. We also know that there are at least two open slots or else the question could not be asked at all.

    So given what we know about the question, we know that the possible answers are no hot spares, one or more hot spares, and that is all. If we start suggesting things like "buy drives but instead of using them as hot spares, make your array bigger" we change everything. Not only do we make wild, unfounded assumptions about their risk profile which we are not in a position to make whatsoever, but we also go a massive step farther and start to make assumptions about their best use case of money.

    So now, not only do we suggest that they increase risk rather than lower it like they were trying to do (based on what I keep asking, we know nothing to give us this leniency) but we then also take the money that they might have invested in risk protection and suggest not that they use it "where the business can most use it" but suggest that the only possible use case for that money is to invest it in disks? We know nothing about the cost of those disks, the utility of those disks, the finances of the company, where that money could be spent and the valuation of different investment strategies.

    In no way could we make that recommendation without knowing a lot more. What we can, and indeed the only thing that we can tell the OP is how hot spares react, what their investment percentage is, and how often or rarely they are applicable in this type of array and what factors may or may not make them more or less valuable.