Revisiting ZFS and FreeNAS in 2019

scottalanmiller

Even FreeNAS doesn't say things like this. FreeNAS states very clearly that ZFS' RAIDZ is parity:

"ZFS is designed for data integrity from top to bottom. RAID-Z, the software RAID that is part of ZFS, offers single parity protection like RAID 5, but without the “write hole” vulnerability thanks to the copy-on-write architecture of ZFS. The additional levels RAID-Z2 and RAID-Z3 offer double and triple parity protection, respectively."

https://www.freenas.org/about/features/

scottalanmiller

https://www.ixsystems.com/blog/zfs-pools-in-freenas/

Screenshot from 2019-05-19 19-47-58.png

scottalanmiller

@xrobau said in Changes at Sangoma:

Just because people REFER to it as parity does not mean it is such. That is the 'simplification confusion'.

So your hypothesis is that people, who all know how RAID and mirroring (copying) works, and who all know what parity is, and all know their RAID levels and know ZFS.... "simplify" by getting the term backwards? And that that makes it "simpler", why?

I think your desire to justify the belief that there is no parity is a well known parity system is leading you to concoct some truly outrageous beliefs. Simplification is the farthest thing from an explanation for the entire industry stating something complex and backwards, when what you claim to be true would be the simplest, easiest thing ever.

This just doesn't make any sense, you must see how crazy this sounds.

scottalanmiller

Nexenta is a pretty big player in the ZFS space, they document it as parity as well...

Youtube Video

scottalanmiller

Here is SUN's original 2007 documentation stating that it did mirroring and parity. SUN used RAID to refer solely to parity. But notice that SUN themselves call it RAID. Statements that parity and RAID don't mean parity and RAID go a bit far. Every document that we can find anywhere all agree. And instead of providing any documentation for your position, you are just resorting to saying that all of the source documents are wrong and that none of the words, terms, or functionality means what is stated and that all of it means something other than what is said?

http://web.archive.org/web/20071015014209/http://www.sun.com/2004-0914/feature/

Also, you stated that the self healing was made possible by the copies, but even in 2007 SUN stated that the parity made self healing possible as well. So even your position that the copies are necessary for that feature are incorrect (which we knew, it's just that I now provided a specific reference for that.)

I think you should read Predictably Irrational, Revised and Expanded Edition: The Hidden Forces That Shape Our Decisions , it's a great volume for understanding how humans tend to get misinformation or a random belief and then in an attempt to justify something that we believed without foundation, will begin to formulate false reasons for it after the fact to make it seem like what we believed makes sense. If are you familiar with this process, we can see that happening in this thread.

You started with some mildly incorrect beliefs based on common marketing. Then to defend those beliefs you started adding in technical details that were incorrect, but could be chalked up to misreading some docs or overhearing something and misunderstanding. But then to defend those you just get getting crazier and crazier. As we go down through the thread, we can see the conversation devolving from simple misunderstandings to bizarre claims. By the end, you are basically calling every IT whose ever worked with ZFS a liar, like there is some secret conspiracy to make everyone confused about a really simple technology that we use every day.

scottalanmiller

https://blogs.oracle.com/ahl/double-parity-raid-z

"Double-Parity RAID-Z

As of build 42 of OpenSolaris, RAID-Z comes in a double-parity version to complement the existing single-parity version -- and it only took about 400 additional lines of code. "

xrobau

Sigh.

@scottalanmiller said in Changes at Sangoma:

You started with some mildly incorrect beliefs based on common marketing.

I'm a sysadmin. I've done courses. This is not marketing, this is training. Now, maybe I've been trained wrong, or maybe I've forgotten my training - I was actually looking through the ZoL source code to see if I was wrong (and, it looks like I may be!), but I'm just not interested any more.

You're so hung up about HOW BLOCKS ARE STORED ON THE DISK that you've ignored everything else I've said.

So. Whatever. I'm REALLY done this time.

Oh, and here's the link to the code where they ARE doing math, which is when I was about to come in and go 'Whoops, looks like I was wrong', but turns out you're just interested in being an arsehole, rather than actually engaging in discussion.

https://github.com/zfsonlinux/zfs/blob/a8577bdb32e091645df901d8501e44ef50748389/module/zfs/vdev_raidz_math_impl.h#L525

scottalanmiller

@xrobau said in Changes at Sangoma:

You're so hung up about HOW BLOCKS ARE STORED ON THE DISK that you've ignored everything else I've said.

But everything you've said was good, based on your beliefs of how the data was stored on the disk. That was your foundation for refuting things that we had researched thoroughly in the past.

You are acting like I'm ignoring you, that's the absolute last thing that I've done. I've paid so much attention, from the very first post.

scottalanmiller

@xrobau said in Changes at Sangoma:

Oh, and here's the link to the code where they ARE doing math, which is when I was about to come in and go 'Whoops, looks like I was wrong', but turns out you're just interested in being an arsehole, rather than actually engaging in discussion.

How exactly am I being an ass here? I have engaged well since the beginning and worked hard, even to the point of documenting this years ago, to help educate and train. You put me in the position of having to defend the truth. You didn't believe me, or anyone, in the industry. No amount of me providing documentation helped. Then you call me an ass?

I am not the one who ignored what was written. And I'm not the one making wild claims. All I did was defend the truth and try to educate you. You had misinformation and instead of providing references, attempted to brow beat me into accepting them no matter how much they were not based on fact. You are still trying to attack me personally because I wasn't wrong and didn't let you promote a product based on a misunderstanding of it.

scottalanmiller

@xrobau said in Changes at Sangoma:

I'm a sysadmin. I've done courses. This is not marketing, this is training. Now, maybe I've been trained wrong, or maybe I've forgotten my training

In a situation like this, we all will forget things that we don't use regularly. Even an "every day full time Solaris admin on ZFS" will potentially forget details if they aren't making RAID decisions all the time (you can implement a pre-chosen standard all the time and not have to think about exactly what it is doing or what the alternatives are.) Forgetting isn't a big deal. You gotta use very specific stuff often to have any live memory of it, normally. And the number of people implementing RAID levels or deciding between them is very small, and of that number, the number that do it often is much smaller still.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

ZFS Is an Alternative to RAID - Yes. It's a DIFFERENT TYPE of what normal people think of as 'RAID' - or specifically, RAID5/RAID6. They use Parity, and when a disk is broken/missing, it does calculations to figure out the missing data. ZFS uses copies of the data. Striping and Mirroring is obviously the same.

"ZFS Raid Levers aren't standard" - I have no idea what that means.

Scrubbing is special - Yes it is! If you have a disk that is faulty, scrubbing will pick it up and repair the damage. That can only happen with ZFS, because it has COPIES of the data, and every copy has a checksum. If one of the copies doesn't match the checksum, it'll repair it by writing over it with one of the good copies.

So now that we got to this point, you can see why the myth's document has those items and why I was defending them. ZFS is not an alternative, but just standard RAID levels, and yes I went back and corrected the typo. And the scrubbing isn't special because it's the same as anywhere else.

My point has never been that ZFS is bad, it's amazing. It was the first filesystem of its type, it broke new ground, it remains the most stable and advanced implementation of its family (which includes BtrFS and ReFS), but that there are so many myths around it that people tend to feel that it is absolutely astounding how safe and special it is, but mostly the things that it does are just normal features of filesystems, LVMs, and RAID. That it is stuff lumped into one makes it easy to promote as doing something special because people assign the features to the filesystem layer and then say "can your filesystem do that" when they should be saying "can your storage layer do that?"

The danger is ZFS is getting overwhelmed and feeling that it does more than it does, or believing that other systems do less than they do.

I compare it to Texas. Hold on, this actually makes sense. In Texas they make a huge deal of teaching students that "Texas is a republic" and that that makes it special compared to the other states. But in the other 49 states we are taught how US law requires all individual states to be republics before joining the union as a state. So while the statement that Texas is a republic is true, it is stated in the misleading way to imply that the other 49 states are not. So what people feel makes Texas special, actually just makes them run of the mill.

I've been fortunate enough to get to work with the ZFS team themselves from nearly the very beginning as they stepped in assist with the SAM-SD project in 2007. ZFS was young and mostly unknown then. Great people doing brilliant work. I got to see Thumper before it was released on the market and it was the inspiration for my SAM-SD work in many ways (but with different goals, we couldn't use ZFS for SAM-SD for performance reasons.) I was one of the original cheerleaders trying to get the word out about ZFS and software RAID and open storage practices back when almost everyone was sure that software RAID was a bad thing and that ZFS was crazy.

But around 2012 things flipped and suddenly the Cult of ZFS stuff started, seemingly out of the FreeNAS community, and tons of weird misinformation started coming out that took the hesitance to deploy ZFS and started acting like it was crazy to use anything else and basically stating that ZFS was magic and doing all kinds of things that it wasn't doing, wasn't meant to do, etc. So for the same reasons that I was a big ZFS cheerleader for most of a decade, I found myself having to constantly temper peoples' emotions about ZFS and downplay it rather than promote it because we went from people emotionally avoiding it because they tied it to software RAID mentally (which was only an option) to people believing that it wasn't RAID and was doing all kinds of magic and never asking what that magic was.

Soon we found people who would never have implemented RAID 5 on spinning drives because they knew the risks well, would implement it under the RAIDZ brand name because they were convinced that RAIDZ wasn't RAID 5, and didn't carry the risks that they knew about, and never asked how or why it could do that. That's why people like John Nicholson and I did all of our risk analysis for RAID 5 using RAIDZ to make sure that people understood that all the publications about parity RAID risks and problems were based on ZFS and if people felt that ZFS was better than any other RAID option (and it basically is) then all others would be equal or worse, never better.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

"XXX is special" - Don't know why you think someone would say that. So let's just skip over that.

I know, right? But if you go on places like the Spiceworks community or the FreeNAS forums, people would state that stuff constantly about caching, snapshots, etc. Basic stuff that we had had for a long time. The biggest thing seemed to be anything that was held in the LVM layer was unknown to many people and when ZFS offered it, they felt like they were discovering something new. So there were countless weird conversations of people acting like they had just discovered the snapshot, lol.

Here is an example from as late as 2016: "The ZFS snapshots and ZFS send/receive features make this a very enterprise and mature file system. I find it hard not to choose ZFS when given an option."

Basically he thought that because it could do snaps, that that alone made it mature and he was picking it based on that. The send/receive is unique, and neat, but ultimately not a big deal. Handy, but that's about it. If he knew that other filesystems were being snapped many years earlier (making them more "mature") would he then have ruled out ZFS based on the same logic?

scottalanmiller

Here is another one, this time back to 2011: "I am testing ZFS is FreeNAS. ...My main reason for wanting ZFS is the awesome snapshot capablity."

Already at that point you can tell from the responses that having things like snaps listed as key differentiation features were a big thing that had already been discussed a bit. It's an odd thing, but people were really pushing it.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

If you have a HDD with bad cache ram on it, ZFS (and, to a lesser extent, btrfs) is the only thing that's going to warn you that something is wrong with that device (and it's not like this is uncommon - I've discovered two drives in the last year that were writing corrupt data). This happens.

This is increasingly uncommon today as this is mostly a risk with Winchester (spinning) drives. As the world moves to more reliable SSD, this has dropped off. It doesn't go away, but most corruption is believed to come from the network not the storage media, which nothing protects against (yet). But for the media level issues, newer firmware and tech, better components, and the biggest thing - moving from magnetic media to solid state all reduce this risk a bit. It remains, of course, but most companies go years or decades without experiencing it and probably the average person will never see it in a life time. It happens, but so often to things of no consequence (like empty areas of a disk) that it doesn't matter. When it does strike, typically it is a file so trivial to replace that it's not considered consequential. A real risk and certain scenarios definitely require watching for it carefully, but by and large between how rarely it impacts people and that most shops have moved to safer tech it's all but vanished as a talking point. In the 2000s, it was a bit concern.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

The 'expansion' problem comes about because if you add another spindle to that ZRAID2, suddenly block 1 would be on spindle 1 at sector 10, spindle 3 at sector 500 and spindle 6 at sector 1000 (or whatever). So the location of the data would be wrong, and everything falls apart.

ZFS doesn't have that problem actually. You can grow a ZFS RAIDZ pool: https://www.tech-recipes.com/rx/1416/zfs-grow-or-add-more-disk-space-to-pool-or-filesystem/

It is the zpool add command. You can do it live, while the system is running. Most RAID implementations support this in some form.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

I'm not going to get into SSDs, because I've had terrible results with a couple that wildly skew the stats - but they've burned me so badly....

I would definitely go back and reevaluate the foundation of this. SSDs are, in general, much safer than Winchesters. There are exceptions, but they are very rare and getting rarer every day. Modern SSDs are insanely safe, and have been for a decade. Winchesters are safe, but not really any safer than they were fifteen years ago. SSDs have lept forward, while Winchesters basically have been stagnant. Chances are you had something else happen that caused the SSDs to seem to be the thing that failed or were just super unlucky with your SSDs. But the math and real world usage puts SSDs leaps and bounds ahead of HDDs, to the point that for high end systems people are often skipping RAID (not that I recommend that) because the reliability will often meet or exceed HDDs in a RAID setup.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

... Hardware RAID is useless - Yep, these days it's a terrible idea to use hardware RAID, unless you're prepared to keep identical hardware cards with identical firmware versions in stock, right next to every machine with hardware RAID. If you haven't been bitten by a hardware RAID card being unable to import a RAID when the previous card died, you're a lucky lucky person.

Back to the basics. This is not true at all. Hardware RAID remains almost totally ubiquitous in commodity hardware, which has mostly overtaken everything else today, because Windows still has no enterprise alternative (and that drags Hyper-V along with it), and VMware has no alternative at all. Basically all enterprise servers ship with hardware RAID included, so doing without it generally means disabling or removing it, and it brings real benefits.

We've covered this a lot in the community. Few people promote software RAID as much as I do, but the benefits of hardware RAID are real and in no way is hardware RAID dead, dying, or useless. It remains the only choice for a large percentage of the world (those that need Windows, Hyper-V, VMware, etc.), and the standard choice for many more, and the only common way to get blind swap, which most shops need.

Hardware RAID might not be as fast as software RAID, but it is so fast that no one cares, both are ridiculously fast to the point that it is rarely relevant. It's like "which is faster, Ferrari or Porsche" and the real answer is "who cares, you can't go that fast between here and the grocery store." Hardware RAID generally offers blind swap, cache offload, CPU offload, and OS agnostic portability which are all real benefits. Speed always sounds important, but ZFS is already not a super fast file system, so in cases where we are talking hardware RAID vs. RAIDZ it's probably a wash anyway.

I've done hundreds or thousands of hardware card swaps (I've done the world's largest published study on RAID reliability with 160,000 array years, and 80,000 controller years) and the number of failed imports was zero. Of course this assumes enterprise hardware, proper firmware management, and so forth. But that's stuff any IT shop should be able to do with ease. Imports can fail, but are so rare in a properly maintained environment to be statistically able to be ignored.

scottalanmiller

@xrobau said in Revisiting ZFS and FreeNAS in 2019:

You're so hung up about HOW BLOCKS ARE STORED ON THE DISK that you've ignored everything else I've said.

So just to recap where we are at this point, because this line was intended to make us forget....

The point was that several of us had warned that there are many myths around ZFS and FreeNAS and that essentially all use of them is based on misinformation and misunderstanding. You picked up the challenge and decided to take the list of concerns and attempt to refute many or most of them to show why you felt ZFS (and FreeNAS) were good choices.

In doing so, you essentially repeating the myths either as they were or with slight variations, and based most of your "why we were wrong" on the misunderstanding of what ZFS fundamentally was and how it fundamentally worked. Being caught up on how the blocks were stored on disk is what we were addressing because you felt that how they were stored on the disk created all kinds of unique, special situations that could not exist anywhere else which would, obviously, make justifying ZFS easy.

Digging into ZFS and understanding how it works therefore was the opposite of ignoring what you said, and was what listening to it was - because essentially every point you made was based on ZFS storing blocks on disk in a special way that was not RAID, not parity, not standard, not available to other systems for the most part. This belief also led to the underlying misunderstanding of why hardware RAID works better than you had realized, that other systems can do useful scrubbing, and so forth.

Now that we are where we are and have established that the basis for your belief that ZFS had special use cases for you, and similarly we've looking into other myths like the hardware RAID and scrubbing, do you understand that we did this because we were paying attention to your holistic decision around ZFS because by showing that ZFS wasn't the myth that you thought that it was, that it shows that every one of the "standard myths and concerns" is valid?

I would suggest starting over. As the basis for the discussion no longer exists, you should look at your decision to use ZFS again. Your understanding of ZFS was off in ways that led you not only to feel ZFS was better than it really is, but also to believe that RAID was not as good as it really is. So you were led astray in multiple ways. You had stated that RAID couldn't do many things that it can and does do, that hardware RAID couldn't have standard features that it generally does have. With that new information, that most of the things you liked about ZFS are either equally available without ZFS or don't exist anywhere, I would recommend starting the whole decision process fresh. The underlying foundations of your entire approach to local storage have shifted, not just one small piece. That piece was the basis for seemingly everything that you were thinking about how storage worked.

So please, don't see this and think anything you've said has been ignored. Everything you've said has been paid very close attention to. All of it. Anything that you feel I've not addressed, I've done so because the amount of proof and course correction that you needed felt overwhelming any more than was absolutely necessary. But I've carefully read everything and ignored nothing, and believe if you read the thread and source materials again, you will see that all of your reasons for feeling FreeNAS and ZFS made sense here were thoroughly addressed.

xrobau

@scottalanmiller Dude, dumping a huge amount of stuff in 15 different posts is TOTALLY UNCOOL and is really unfriendly. Please don't do that. I now have to quote multiple things, scroll backwards and forwards, and generally waste even more of my time.

@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

As the world moves to more reliable SSD, this has dropped off. It doesn't go away, but most corruption is believed to come from the network not the storage media, which nothing protects against (yet).

ZFS does. And, my experience is that I've had 2 SSDs silently fail and return corrupt data (of out 30 or so) and 2 spinning disks fail (out of several hundred). That's why I said it's a statistical anomaly.

@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

and probably the average person will never see it in a life time.

They will probably never NOTICE it, unless they're running btrfs or ZFS, which has inherent checksum validation.

@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

ZFS doesn't have that problem actually. You can grow a ZFS RAIDZ pool

No, you can't. In fact, the announcement of the POTENTIAL of the ability to do drew excitement from all the storage nerds. What you linked to is appending another zdev to a zpool. You can't expand a raidz.

https://www.reddit.com/r/homelab/comments/83wo88/any_news_on_zfs_raidz_expansion/

This is what frustrates me here - I know this stuff IN DEPTH (yes, I was wrong about parity vs copies - I dug up some of my old course notes and it said copies there - THAT was the source of my error), and you're trying to claim that you know this better than me, when you obviously don't. It's massively frustrating.

Re 'WTF are you doing with Hardware RAID, it's dead':
@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

This is not true at all. Hardware RAID remains almost totally ubiquitous in commodity hardware

Funnily enough, almost all 'hardware RAID' cards are actually software RAID with a wrapper around them. And if they're not, they're going to be slower than your CPU anyway, so, back to my original point - why ADD slowness and INCREASE failure types? Pretty much the only Hardware RAID cards these days are the PERC-esque cards. I'm not going to go into EXPLICIT details, but if your RAID card doesn't have a heatsink on it, it's almost certainly a software raid implementation.

@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

Few people promote software RAID as much as I do, but the benefits of hardware RAID are real and in no way is hardware RAID dead, dying, or useless.

Hardware RAID is slower, and more finnicky, and provides less visibility of individual platters than software RAID. For example, can a standard hardware RAID card provide access to SMART data of each drive? (No).

@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

I've done hundreds or thousands of hardware card swaps and the number of failed imports was zero.

As I said originally - the only way that is true is if you had identical cards with identical firmware versions on standby. That's perfectly fine for an EMC sized company, but it's not fine for anyone with only 200 or 300 spindles. I've had multiple P410i's refuse to import a RAID that was generated with a different version of firmware. This is not something uncommon, this is something that happens ALL THE TIME.

@scottalanmiller said in Revisiting ZFS and FreeNAS in 2019:

. You picked up the challenge and decided to take the list of concerns and attempt to refute many or most of them to show why you felt ZFS (and FreeNAS) were good choices.

FreeNAS is just a wrapper for ZFS, with all the tools everyone needs built in.

ZFS is, unfortunately for those that are trying to make a living in the HARDWARE RAID space, a significant nail in their coffin. I brought up a whole bunch of things where your statements were wrong, or misleading, or in some cases totally irrelevant.

In retrospect, from your comments, it seems that that you make a living from hardware RAID, so it's somewhat unsurprising that you're trying to spread a pile of FUD on ZFS. Comments like 'people say it's magic' are just casting dispersion on it, purely to disparage it without that meaning anything.

And ZFS is so portable that I can literally pull the drives from a FreeNAS box, plug them into an Ubuntu machine, run 'zfs import' and all my data is there. Can you do that when you move your HDDs from a HP to a Dell to an IBM?

There. See how you can reply in ONE comment, rather than 30? It makes it much more constructive.

xrobau

@scottalanmiller Well, in this case because I'm trying to get across a REALLY COMPLEX thing, that you're having difficulty with, please respond IN ONE MESSAGE. That'll be easier to keep track of, OK? Otherwise there's no coherency.