Open Storage Solutions at SpiceWorld 2010

  • Youtube Video

    Another one from Viddler that needed to be rescued. This is one of my favourites, this is my presentation on understanding storage fundamentals at SpiceWorld 2010. This is the famous "cardboard boxes and duct tape" talk that got a lot of attention.

  • Say, that's a really nice unavailable video you have there.

  • I cannot view 😞

  • Video loads for me. I'm still waiting for my session to be posted. Might reach out to see if I can get a copy.

  • @art_of_shred said:

    Say, that's a really nice unavailable video you have there.

    It's there. Google was still processing.

  • @ajstringham said:

    Video loads for me. I'm still waiting for my session to be posted. Might reach out to see if I can get a copy.

    They lost most of them. None of NTGs made it last I knew. They were all posted long ago.

  • Transcript:

    so my name is Scott Alan Miller that's
    all I'll tell you about me because
    honestly no one actually cares and so if
    I talk about myself I'm boring and
    narcissistic and one of the rules of
    presentations is it could be boring it
    could be narcissistic people so this
    talk is on open storage I think it's
    called open storage landscapes and I
    think there's a lot of no one's quite
    sure what I'm gonna be talking about
    what I want to do here is not look at
    specific technology implementations or
    some real specifics but my goal today is
    to have you come away with a different
    vision a different appreciation for
    storage and the storage landscape and
    and exactly what your options are and
    how to approach thinking about storage
    in general when it's oranje that is a
    major problem for all companies today
    whether you're you know a tiny shop or
    you're a giant enterprise that sort of
    expensive storage is critical in reality
    nothing is more important in your
    business than your storage if you lose
    everything from a processing perspective
    and save your storage or your voice then
    you're recoverable but if you lose all
    your storage and keep your processing
    you're screwed
    so so storage is very very important but
    we have a very we tend to look at it in
    very strange ways kind of forgetting how
    storage actually works under the hood so
    I kind of want to start off by going
    through some real basics of storage and
    try to present it in a little bit of an
    interesting way and then I want to talk
    about the concepts of open storage and
    how that applies to to your businesses
    and some reference implementations of
    how that might be something that you
    would want to do and when that might
    make sense because my format is very
    very loose I don't work with slides
    because they're distracting and I can
    kind of scale the speed very easily so
    I'm gonna be a lot more open to people
    asking questions as we go try to keep it
    kind of related to where I am but if you
    want to jump in and it's kind of
    ladies will tackle it as we go because I
    could very easily you know miss
    something that's valuable for everybody
    so alright so in the beginning right
    when when computers first began the very
    first thing we needed was well we have
    this computer so we need to store stuff
    so the very first thing that we got was
    the hard drive I realize this is not a
    bit hard drive the hard drive was simple
    right has one spindle it's store stuff
    everybody understood that right very
    very easy
    there's nothing complex it's a physical
    device and we attach it to a computer
    so this is our server
    we refer to servers as boxes we can
    demonstrate them as
    so what we did with it was our very
    first server the first server because we
    took this hard drive the first hard
    drive ever
    and we literally took a cable that
    actually looked like that babe right I'm
    stuck inside we're now directly attached
    one drive directly in Tetris get
    attached inside your server the
    connection technology was very very
    simple right we know today is scuzzy yes
    there's something before it but
    basically scuzzy was the beginning of
    real storage it is a generally
    simplistic technology but it does have
    the ability to dress but to address more
    than one drive so we started putting
    more drives in service once we had more
    than one driving service they said well
    this is a problem so we took that drive
    when we said this is a point of failure
    we lose this drive we have nothing
    obviously can have backups hopefully
    you're following Daniels advice and have
    made good backups and you can get a new
    hard drive and keep going but we're also
    concerned about not just recovery which
    is where backup comes into play but
    continuity of business which is where
    with redundancy come from the place so
    in order to get redundancy we said well
    you know what if we had two drives which
    we already had we had multiple drives
    and we said well what if we built an
    abstraction layer and put these together
    half the entertainment just watching it
    works doctor
    so we take these hard drives and we
    literally put them together in a way
    that for an outside observer it's one
    drive hands bigger but it's the same
    thing right we call this rate so in this
    case it's raid one so these drives in my
    example are mirrored to each other but
    not important so the key element here is
    that when we use raid we are taking
    multiple independent devices and we are
    attaching them together in such a way
    that when we then attach it to the
    computer alright and there's a little
    bit more than duct tape involved here
    imagine this duct tape includes a RAID
    right and that RAID controller provides
    that abstraction layer so that your
    computer when it sees this block of
    storage right it's still connected using
    the same technology that the single
    drive was connected with and is
    connected in the same way and when the
    server when the operating system looks
    at your raid array it is looking at one
    drive it has no idea how many physical
    devices are in play it believes that
    there is a single logical drive so this
    was the first step of making enterprise
    storage right and this was a pretty good
    step once we got to this point we could
    really do business with servers so what
    happened from here was we started
    getting a lot of these right oh that one
    fell out
    don't bite from that manufacturer and
    and we said well we don't have a lot of
    spaces in these boxes they get hot they
    fill up with other things we want to
    have a lot more than two drives I don't
    have two drives my example but what if
    we had to eat and drive right what if we
    wanted to do that well why not put these
    same drives outside the box it's not
    where the term came from
    so we came up with the ability to
    literally take skuzzy same technology
    nothing's changed we changed the cable
    so there's a little bit more resilient
    to our interference and we attached
    drive outside of the box this is where
    the term direct attached storage came
    from kind of denote that this is what
    we're doing as far as the server is
    concerned they're still drives inside
    the computer
    nothing has changed to the servers
    perspective it's just literally they're
    outside the case rather than inside the
    case you can do this with a running
    server pop off the top yank a car drive
    and as far as the cable will go you can
    just dangle your ear drive but basically
    what's happening in a physically managed
    way so that you're not catching fire
    shorting something out and this gave us
    the next step of storage we were able to
    do more advanced things but nothing has
    changed as far as the server's concerned
    it has one drive that's it and it's
    everything is operating the same so then
    at some point we said well I like how
    this looks like scuzzy cable works out
    really well so the later we said and
    this could be anything right so early on
    OA scuzzy but now there's lots of
    options for this today we've got sass
    we've got SATA we've got USB we have
    firewire we have Fiber Channel right
    lots of options but they all do almost
    exactly the same thing which is very
    so with the more advanced of these
    technologies right we started saying
    well we could instead of simply
    attaching the drives to the server in a
    way that the it's really really
    physically directly attached that's not
    necessarily useful we might want to have
    our drives farther away well if we're
    gonna do that we're going to need a
    little bit more advanced communication
    so scuzzy is traditionally it's very
    much like I love this example it's
    basically like Ethernet not switched
    Ethernet that we use today
    it's like Ethernet long cable vampire
    taps everything communicates at the same
    time interrupting if each other so
    scuzzy was kind of inefficient and it
    really wasn't going to scale so we moved
    to different technologies for large
    and the first thing that really came
    into play was fiber champ and the thing
    that makes fiber channel different than
    scuzzy is that scuzzy is really a non
    lair to networking connection right it
    doesn't really have what we think of as
    Ethernet addressing style machine level
    addressing with fiber channel we have
    machine level addressing so it's a full
    layer 2 networks protocol which also USB
    and firewire are just not on the same
    scale so when we did that we said oh
    well now we can switch these connections
    and I don't have all the cabling here
    that it would be necessary to
    demonstrate this but what's interesting
    is suddenly we could take all these
    drives just like we had before and sit
    them in a box and put a bunch of servers
    over here and just have a switch in the
    middle and connect the drives and
    connect the servers and using the switch
    we can tell which ones connected to
    which and what we have is rudimentary
    Sam right and the most simple devices
    you can get a hold of like your neck
    your SC 101 is literally two drives in a
    little plastic container with the
    smallest chip you've ever seen and the
    only thing it does is act as a very very
    simplistic drive controller and puts zzn
    protocol on to the wire and that's it no
    logic no nothing it is basically a
    network card and you combine Network
    written small network cards that attach
    directly - I'm sure they exist for sass
    but I've seen them for old-school
    parallel data plug them in and you
    literally have a network connection on
    the drive nothing else no raid no
    management no anything the computer
    still sees the block device that it saw
    when it's plugged in internally plugged
    in externally plugged in through fibre
    channel switch whatever so nothing has
    changed as far as the computers
    so what we started doing then right is
    we said well now these drives are out
    here this was kind of an experimental
    stage this wasn't a stage that lasted
    very long with all these old drives in
    fact it said well if we took all these
    drives and we put them into another box
    another server and instead of being
    rudimentary like the SC 101 we've got a
    big box with an eye specifically got a
    larger box so that we could hold lots of
    drives right this box might end up being
    several racks in size we can hold
    hundreds of draws thousands of drives in
    a single box and now we have all these
    abstractions that already exists we have
    the r8 abstraction that allows this
    machine to remember this as a server to
    see those drives as a single drive and
    we have networking technologies like
    fibre channel that allow this to share
    out those drives as block devices to any
    device that wants to read a block device
    right so these are acting the same as
    any other block device you're going to
    see right whether it's we think of
    storage as its own thing but storage
    drives networking they're all blocked
    devices to the system so they're all
    acting exactly the same it's just the
    way that we look at them so we get our I
    don't have the switch here so just
    imagine they're switching
    so we have our fiber channel connection
    we're gonna attach it from this server
    which has all the drives over to this
    server which is where we want the data
    so yes
    yes correct so this is a server that we
    have made purposed to storage and this
    is a server that's doing you know who
    knows what the normal stuff probably
    with no Drive or could have its own
    drives plus these struts you can mix and
    match so now we have dedicated devices
    but as far as this server is concerned
    it still sees a scuzzy connection a
    fiber channel connection whatever so
    we're still dealing with the the same
    technology this is what's interesting is
    that we start to think oh weird things
    are happening but as far as the server
    is concerned nothing has happened it's
    still just that drive connected directly
    so then after this stage right we said
    okay so we're going to introduce another
    abstraction layer to the network and
    we're going to pump fibre channel over
    tcp/ip so we introduced ice Guzzi which
    is the fiber channels goes here
    basically the same protocols very very
    close I scuzzy is simply taking the
    original scuffie protocol and
    encapsulating it in tcp/ip so that I
    scuzzy leaves this box
    I'm sorry scuzzy leaves this box goes
    into an encapsulation layer that allows
    scuzzy be transported over tcp/ip and
    then it gets taken out on this box and
    its back to scuzzy so it's a device
    layer it is still seeing a single scuzzy
    drive connected nothing has changed so
    that really brings us to modern storage
    as far as a block level storage is
    concerned I'm trying to avoid the
    obvious words that every knows I'm
    talking about so this is a server normal
    server which is very very important it's
    a normal storage server this is a normal
    computing device and everything is
    viewed as block devices if we take this
    server and we then apply an interface to
    it that makes it very very easy to
    manage it makes it no longer look like a
    normal server but look like a dedicated
    storage server take away the options of
    processing on it add in some
    perhaps ease of management type things
    we then call it appliance sized and we
    refer to it as a sand but there's no
    difference to a sand from a storage
    server except that it has an interface
    that makes it work not like the regular
    server that's so so this is the really
    interesting point we have all this
    processing power here and so if we then
    let this server not just share out the
    lock level devices without touching them
    with you know when we're dealing with
    this at this level this machine is
    effectively dumb it doesn't know what's
    on the disks it doesn't know how to use
    them it doesn't know how to read them it
    can have if this could be a window
    server and those drives that are in it
    could be loaded with Linux file system
    of Solaris file systems it can't read
    them they can't do anything with them
    but it can share them out because it's
    purely a block level device if we then
    add more power to this box more power
    from a software perspective and said
    okay we're gonna make it in this box
    able to read these discs and understand
    them then we can start getting even more
    power because we can start doing things
    here without sending it over there first
    so we can add new protocols onto this
    box that give us not block level sharing
    we can still have that this is all still
    going on but this box now can read these
    drives right there this is new layer so
    we add in a protocol that allows this to
    share a file system layer instead of the
    block layer and to do this obviously
    this machine has to understand it too we
    need to be able to put it onto the
    network so specific filesystem
    abstractions are made for this and we
    know them today is NFS sips AFS these
    are popular protocols for this what
    makes them different than blooded block
    level is that at the file system layer
    this device can determine if there's
    been changes to a file and only Sandover
    changes it can make changes
    you can do all kinds of things including
    that's very very important
    it can lock single file from you which
    means that this server is contacting
    this box and wants to write to a file
    this server can lock that file and say
    no one else can write to it which means
    it for the first time we have a means of
    having this connection go to more than
    one server you can do this with block
    level devices back in the early days
    when we had literally hard drives inside
    servers people would actually take this
    scuzzy cable hook one end into one
    server one hand into another server and
    dangle a hard drive off the middle anak
    would cause obviously disaster you have
    to drive controllers it's like having
    two steering wheels in the car and two
    people driving without being able to see
    each other or talk right one person
    wants go this way one person's go that
    way one first the fitment gas the
    person's hitting the brake
    you know deer runs out in the road each
    one thinks a different directions way to
    go you're gonna have a disaster and
    that's what servers do if there's two
    servers talking to a single hard drive
    not aware of each other so there are
    specific file systems that were designed
    to be able to handle that where but each
    server had to play nice there's no
    gatekeeper so any server that decided to
    mess with the data was going to and it
    could make changes that the other one
    didn't know about it could delete files
    that the other one tried to protect they
    could read files the other one said it
    shouldn't be allowed to read there's no
    way to control those things so there was
    a gatekeeper when we're dealing with
    file system level sharing we have that
    gatekeeper we have the security at this
    level where we control it we don't have
    an open connection somewhere that
    someone can do anything they can get
    access to so at this point most people
    know that a device that's doing this is
    called a file server if we then in the
    same manner of taking this storage
    server adding an abstraction layer to it
    so that it looks like a non-standard
    server we call it ass and we do the same
    thing with file a little storage and we
    call it a mass at no point is this not a
    traditional file server it is simply
    applying sized so that it looks like a
    different device and takes
    some of the options of doing some
    general things so the reason that I
    wanted to run through that is because
    quite often when dealing with Santa Ness
    we actually think of the world in terms
    of fan of that we say well we have
    storage right should I have santur Nass
    and in reality those aren't those aren't
    really the questions we're asking what
    we should be asking is do we need
    block-level storage or do we need files
    at the lowest or and that's the big
    question if you're running a database
    and it needs to be able to interface
    directly with the devices because it
    does something really complex which is
    basically have its own file system and
    ignore the actual so you know if you're
    running IBM db2 IBM db2 to these raw
    devices supports discs because it has
    its own file system that it's only for
    the database has no other purpose so it
    has to have block level access so they
    can do anything at one for the drive
    head but if you're dealing with normal
    file sharing right Word documents Excel
    documents all the stuff that users have
    piled up everywhere yes you can do that
    at the block level and attach it to a
    single device yes you can go get really
    expensive commercial file systems that
    allow you to share that out like Oh F Oh
    CF from pork right and GFS too from Red
    Hat but you're getting into running big
    UNIX boxes filled with do those things
    so not really effective for that but if
    you're running sis or NFS you can
    connect all kinds of desktops to it you
    can you know all the things that you
    know how to do you can do so choosing
    block storage or file system level
    search is really really question and at
    the end of the day you have a file
    server one way or another that's doing
    that work so at that point I'm just
    gonna let people ask questions now and
    they have prompt so I'm not sure if
    they're like falling asleep for has
    questions or
    yes okay
    actually what's funny is typically mass
    cost about 150 to 200 dollars and Sam
    starts at under a hundred it's and
    started close to 30 but it's not the
    same people think of they think of Santa
    thick the fan is actually a little one
    so the names are actually awful right
    network-attached storage and storage
    area network
    I hate these terms partially because
    they're confusing FRC because they're
    wrong storage area network is really
    just a term for block level storage and
    network attached storage is really just
    the name for filesystem level storage so
    when you're doing a block level you have
    to attach as a device and when you're
    doing filesystem level you have to have
    a special virtual file system driver
    like sis or NFS that allows you to put
    it onto the network and share it over
    the normal network
    the idea was Sam in theory the reason
    the word exists is because when they
    first did it with fiber channel fibre
    channel was its own thing you didn't
    communicate over fibre channel for other
    tasks so it was a storage area network
    but then later very quickly and actually
    at the time as well people would take
    nass devices file servers because we
    didn't call them NASA back then and put
    them on a dedicated network with its own
    switch connected to the servers on
    dedicated NICs just for storage well
    obviously that's a storage area network
    using Nass so that the terms can overlap
    even when you're doing dedicated
    networks so that's why I hate the terms
    but when we say Sam and we sometimes say
    San protocols doing Fiber Channel I
    scuzzy s CEO ata those things all right
    SC o Ethernet sorry
    and we use Nass to refer to the SIS NFS
    AFS today after that okay cool
    anything else before
    in the back
    yes a a SAN is like a LAN in that it is
    a network yes
    yep and I worked at a company that
    actually had what we called the D bat it
    was a database area network it was a
    dedicated network in the same way as her
    storage area network except it was
    dedicated to database communications so
    all the databases were on here
    Ethernet switch Ethernet service so who
    basically did the same thing really well
    so there was question up here
    I scuzzy is not as I'm aware any more
    noisy then I'm assuming they're there
    thinking it's broadcasting I'm not aware
    that there being any significant
    broadcast traffic unless they weren't
    running switched Ethernet which would be
    the bigger problem
    yeah it's um no it's it's really it's
    the it's DCP and it's non broadcast so
    it's point-to-point same as any any
    other for exalt that nature you know
    it could be actually that's really good
    point so for a customer like that
    regardless of the noisiness of the
    protocol when you have traditional
    Sandpiper Channel will call traditional
    your network is obviously completely
    dedicated to that
    but what's highly recommended is if
    you're going to do I scuzzy or any other
    you know sand that leverages commodity
    networking as we like to say so Ethernet
    you want dedicated Hardware and
    dedicated network for that even so
    because it's it's not necessarily noisy
    but it is a really high volume
    traditionally I mean you wouldn't bother
    with it if it wasn't a certain life so
    you want to have switches that aren't
    doing anything but that in the same way
    you would have done a fiber channel just
    because you switch to I scuzzy from
    fiber channel doesn't mean that you
    should leverage your traditional land
    that's already existing to do so because
    you still have the same volume concern
    you just want to put it on a dedicated
    switch and treat it in the same manner
    which is nice because when we move to I
    scuzzy verses fiber channel you can't
    afford to go faster you can afford to
    have more redundancy you can get better
    switches keeper and quite often it's
    really really popular the more important
    something is Frank your shortage area
    networks the most important thing in
    your business it's really really common
    to jump straight to we need layer 3
    switches we need lots and lots of
    management we need that the biggest less
    expensive things we can find the reality
    is you probably don't care that much
    about monitoring your surgery Network
    you might but most of the time you're
    actually gonna be better served getting
    a cheaper unmanaged switch because what
    you care about is latency and throughput
    and so the the less management there is
    the less layers there are the less
    that's going on you don't wanna be v
    landing on the switch that's handling
    your eyes Guzzi you want it to be its
    own not not virtual LAN you want to be a
    if you need to be LAN get another switch
    and use it physically actually segmented
    because you don't want that overhead on
    your storage because your storage
    latency matters everything in your
    so you want that to be as fast as
    possible and which is actually cheap
    that's the wonderful thing about speed
    on Ethernet pretty much the cheaper you
    get not consumer devices but with him
    within a given product range generally
    the cheaper they are the faster they are
    anyone else before going
    so you're talking about what where I
    said there there are file systems that
    allow you to do that commercial positive
    them okay
    I believe they're simply generally
    referred to as shared access file
    maybe someone actually knows what the
    generic term is for that family of but I
    do know that Red Hat and Oracle are the
    key players with that and I believe the
    Veritas with the XFS does that as well
    but I'm not a hundred percent sure I'm
    definitely not an expert on TX of that
    but GFS too is Red Hat's product so if
    you look up go to Red Hat look at GFS -
    I believe it's actually available for
    free from them but it's a dedicated
    these are file systems
    so anything that attaches to that with
    Sam has to have that file system driver
    so you have to look for a vendor that's
    gonna support whatever devices you're
    using but yep that's the big one
    but I don't work we'll have one - all
    right all right okay so
    a lot of people are familiar in the
    community we've talked about the Sam SP
    right the Sam St which I did not name is
    not an actual product but it is what I
    like to call a reference implementation
    of open storage and the reason that we
    came across with this is because in a
    lot of conversations we talked about
    with with companies and with people in
    community you know people want out well
    you want to put in the sand right so
    they go to a vendor
    everybody's gotta stay on these days
    they go to the vendor and they say well
    I need well you know I need block-level
    storage so they come out with really
    really expensive storage products and if
    you're a giant fortune 500 it probably
    makes sense because your storage devices
    right when I worked at the bank our
    storage devices are in the petabytes
    right they have entire data centers
    dedicated to the storage and we have an
    OSI 192 running to other facilities to
    carry those the fibre channel / - right
    so we can lose an entire data center our
    storage is still there and that sorta
    just replicated to another data center
    over OC 192 so unbelievable amounts of
    storage there's no way you're gonna go
    build this at home yourself
    that's where players like EMC Clarion
    come in and and Hitachi and build entire
    rooms that makes sense but when you're
    looking at more reasonable numbers of
    storage you start getting into the space
    where you're using traditional
    technologies completely including
    chassis so that's really what matters
    here so I'm gonna give a little story on
    this this is kind of the back story of
    how the Santa Fe came and came into
    being I work for where I'm a consultant
    for of a major fortune 10 it's hard to
    be a minor fortune-telling foot and they
    were doing a global commodity grid you
    can read about it online we're well
    known they were over 10,000 notes and we
    push an unbelievable amount of computing
    through that lots of different lines of
    businesses use it a lot of people like
    to call cloud it is not its
    high-performance grid computing it's
    very related to clouds that's not the
    same thing but we so it said it's an
    application layer virtualization not a
    operating system layer virtualization
    r500 realization so that's kind of where
    those government
    so we run several dozen and maybe a few
    score applications on this ten thousand
    note grid to be able to back that grid
    we don't have any storage on the nodes
    except for the operating system just
    makes it easy they boot locally but all
    their data comes in from somewhere else
    and then gets saved somewhere else we do
    cache locally just approximately on this
    we were working with we won't name names
    but a very very very major storage
    appliance member we had the second
    largest product that they made it costs
    really close to three hundred thousand
    dollars per unit we worked with them we
    brought up a new part of our grid and
    the load demand on the grid turned out
    to be higher than this device could
    supply not from a throughput necessarily
    actually from an eye off standpoint it
    just couldn't handle it with the
    spindles ahead and so we approached some
    vendors and at the time another vendor
    in the server space had brought out I
    guess I'll name it son had brought out
    what they called thumper which is a 48
    drive for you server to processors 48
    drives for you for you chassis
    it's a traditional chassis you go to
    your data center it looks like a regular
    for you server nothing weird it just has
    a lot of drive bays and they were
    pushing this as a sort of think of this
    in retro term let's go back to old
    storage stop thinking that they actually
    this is where the term open storage came
    from when they really suffer son said it
    is time to rethink storage storage
    devices that everyone's been buying
    Sanon ass are just servers that have
    lots and lots of drives attached to them
    well why not just buy a normal server
    and use that because when we make normal
    servers we can make lots of them faster
    than your price goes way way
    now when you buy sand and NASA labeled
    devices you tend to get products that
    are not sold on the same types of
    quantities as commodity servers and
    sometimes you use proprietary software
    to do some of the cool features and this
    drives the price through the roof
    they are also considered a non commodity
    so their margins are much higher the
    margins on a traditional server you're
    looking at the major players you know HP
    Dell whatever Dell does not make a
    thousand dollars off every server
    yourself is up
    they make twenty dollars right when you
    buy the time you get all the discounts
    done their margins are low so they're
    not ripping off on a server there it
    cost them a lot to deliver that to you
    so you want to be buying those devices
    if you can help it because that's where
    your value is coming from when you go to
    appliance size products you generally
    have to pay a lot for the main master
    the name sent and so so what Sun did
    something actually came in and worked
    with us and they knew they weren't
    getting this account but they worked
    with us anyway because they hated the
    other vendor we were competing against
    and and we said to them we we really
    feel that this this device that we have
    is very very expensive and doesn't seem
    to be doing as much as we could just do
    with a regular file server and son said
    absolutely a regular file server gives
    you all these advantages there's the
    commodity you can tune the operating
    system you can pick the operating system
    you can pick every component and you can
    do it much cheaper and they actually
    flew in the guy who invented ZFS to talk
    to us about it it was awesome and so we
    said well we went to the client and we
    said we would like to do an engineering
    study and we want the storage vendor
    involved I said ok they ordered the
    largest product that they made the
    largest mass device on the market
    there's a couple years ago so it's this
    figure now this was the it was a half
    million dollars and it was installed and
    tuned by the storage vendors own
    engineers they had a lot of money
    because we weren't buying one we're
    looking to buy like it doesn't
    so they brought in a lot of resources to
    make sure this was going to beat
    anything we could do we took two people
    from the engineering team with a couple
    we took a commodity server that we had
    now is it was a large server at the time
    it was a four-way Opteron box but it
    would be considered a small server today
    it's probably about 1/3 the processing
    capacity of what you would get for run
    for $5,000 today so still a decent
    server but and that's the time pretty
    impressive but nothing if we just fold
    it remember joint we loaded Red Hat
    Linux on it no tuning we got normal
    storage nothing special we set it up
    with NFS which is exactly how they were
    connecting to the other box and we did
    and before we ran it we projected what
    was going to happen that we knew there
    were threading issues on the processing
    side of the storage vendors product
    because it was not an open box and they
    could not update their operating system
    for the latest kernel which they needed
    to do because they weren't making their
    own operating system they were getting
    it for another vendor and they didn't
    have time to rework the operating system
    and we had the ability to run the latest
    Red Hat which had great threading
    support which was needed to be able to
    push the I office and when we ran it not
    only did our at the time $20,000.00
    solution which cost about it was
    literally do you be able to put together
    for about two to three thousand today I
    expect not only do we outperform a half
    million dollar server turn tuned by
    their engineers but instead of flat
    telling and having them we actually have
    all the performance curves we have no
    idea what the capacity of the open
    scratch-built box was because we the
    grid could not request enough I off step
    pressure in the half-million-dollar
    device not only plateaued but when it
    went on the plateau for very long it
    actually shut down
    so the the potential savings here were
    not just millions of dollars of
    purchasing but that this product met the
    need while this product did not it was
    easier for us to manage because we
    didn't have to have dedicated storage
    people we use the skill set we already
    had we already have the engineers for
    this will just manage it along with all
    the other servers it will look exactly
    like all the other services so this
    experience and for those who wonder no
    they didn't go with that they went with
    the expensive solution in the project
    help the welcome to the fortune 10 so
    what that prompted was later when when
    Niagara started looking at storage
    options and we started having a lot of
    conversations in the community about how
    do you do storage how do you make it
    cost-effective what do you do when you
    have all these needs and needs
    flexibility and I can't afford these
    other products oh we looked at the
    product market and we said wow you know
    you go to any major server vendor right
    ones that are here ones that aren't
    anyone who's a top-tier vendor and they
    have these chassis that are very
    affordable that have a lot of space for
    desks some have more than others some
    have different price points but they're
    all relatively affordable and powerful
    stable and manageable and they fit into
    your infrastructure just like everything
    else you can go get third-party disks
    for them some support that a little bit
    better than others but most of them have
    complete open support for any gift you
    want into it you can put lots of disks
    into them you control their speed you
    control their configuration to control
    the models if you have a specific drive
    vendor that you're very very comfortable
    with you can pick them to get you all
    that and you're building systems for a
    few thousand dollars that not only might
    outperform a 30 or 40 or $100,000
    commercial appliance eyes sand or nest
    device but you also have more control
    over it you have
    the event this is the most important
    thing with any computing device remember
    it's just a server there's no magic
    right everybody thinks well I'm gonna
    get its and I can let everything else
    fails cuz the sand won't fail but the
    sand is just a server like everything
    else right there are better ones they're
    cheaper ones but it's just a server
    right it's always subject to the
    forklift risk right but someone's gonna
    drive the forklift into that one box and
    it just absolutely happens right from
    some real example and so when you cut
    the cost dramatically when $30,000 was
    it was a consideration but now you can
    do the same thing for $5,000 don't cut
    your budget by $25,000 cut your budget
    by $20,000 and get two of those and now
    you can now they can you can use them in
    the same way you do with anything we're
    done and that doesn't have to be on a
    scale - it could be on a scale of 50
    right you were gonna buy 50 or 25
    commercial sands now buy 50 of these and
    build things and that's that's an option
    when you get really really big right it
    starts to maybe not make sense really
    large sands have capacity to put lots
    more drives and and they're much more
    manageable and on a really massive scale
    so if there are price points and there
    are feature points where traditional
    sands start to make a lot of sense but
    they almost never do when you're in a
    capacity range where you are working
    with a single traditional commodity
    server chassis capacity as a lot of ways
    basically if you have a normal server
    you can buy off-the-shelf from your
    preferred vendor wherever you're working
    with now and once you're working with
    some white box builder and then stop and
    go get it enterprise bender either if
    you're dealing with an enterprise vendor
    go to them get their price for the
    chassis that makes sense it's almost
    always a to you I know
    Dell is here and they've got a box that
    holds 24 2.5 inch drives in a to you
    right pretty unbelievable
    if it's 24 2.5 inch drives meets your
    needs you've got all that storage and
    that it's that's potentially really fast
    well before I answer that exact question
    because this actually came up last night
    almost exactly the same thing right so
    when I talk about storage I often talk
    about Red Hat because that's how we do
    storage which is not actually true we do
    a little bit of Red Hat most of our
    storage is actually solaris cousins that
    are io throughput but in either those
    cases you're dealing with an operating
    system that chances are the majority of
    you whether that's 51 percent or 80
    percent I don't know but most people in
    this community are not unix proficient
    it's not part of your infrastructure
    it's not something you manage on a daily
    basis if it is it's definitely
    consideration but if it's not it doesn't
    matter because Windows is an absolutely
    wonderful storage platform in the same
    way that UNIX is and it's just in this
    example is that we ran UNIX because
    that's that's what we did were for doing
    administration windows make some really
    powerful storage stuff they do I scuzzy
    they do cess they do NFS it's all free
    it's all included you're not buying
    extra products and they're sips if
    you're doing Active Directory
    Integration it's by far the easiest to
    deal with works the best most reliable
    but if you don't want to go with Windows
    as your storage and you want to go with
    someone like Red Hat as an example you
    have lots of options even if you don't
    have the in-house expertise there are
    lots of MSPs who will do that obviously
    pretty much you can always find an
    atmosphere do something for you know
    what someone killed you find it but
    there really are your storage devices
    are something that need relatively
    little monitoring they need monitoring
    but you're not you're probably not
    concerned with you know capacity
    planning other than the amount of
    storage and you can watch that right
    Spiceworks will monitor it and tell you
    how much it's being used so as long as
    you're dealing with that kind of stuff
    you're probably not dealing with in a
    normal small business situation you know
    CPU capacity memory capacity you've got
    more than enough in the smallest box
    that those dozen companies like Red Hat
    if you actually get Red Hat commercially
    you will get support from Red Hat
    themselves right or if you're getting
    Susa from Novell they don't sell in the
    same way that Windows does Windows is
    based on a license and the commercial
    Linux players are based on support
    contracts so that support is okay or
    those I know a lot of people in here
    like I'm going to
    the exact same thing for free play with
    it and when you're ready to go live you
    can contact canonical and yet torsional
    support directly comes em as a primary
    vendor or to any number of msps who
    would provide that support and of course
    you get training anything else that's
    your question okay
    I do have opinions on them I have run
    across them um I don't like their
    pricing structure I feel that their they
    are hurting themselves in that way I
    think they should have a free version
    that is more scalable but as a product
    based on open Solaris if it fits into
    your price range and it's not that
    expensive right it's a very if you're
    looking at this range of stuff this is I
    think it's a really good product I have
    not used it in the commercial capacity
    so there may be some gotcha so I'm not
    familiar with but the underlying
    technology it is open Solaris is awesome
    right lots and lots of power lots and
    lots of flexibility lots of options and
    very easy to manage and that's something
    I should mention is the attenti is a
    it's a NASA appliance right I can't
    believe I forgot some ends of it so we
    have traditional servers file services
    just you know windows a red hat or
    Solaris whatever and you're doing
    everything and then we have the
    appliance that's right you can go to
    neck here you can go to Buffalo you go
    to Drobo you can go to EMC and ecologic
    and HP everybody right everybody has
    these full appliances but there's also a
    middle ground or you're using the
    hardware like the commodity hardware
    from anybody and then applying the
    operating system that is a appliance
    operating system so next enta is a great
    example of that it's one that's built on
    open Solaris 3 NASA is the same type of
    thing completely free built on FreeBSD
    and open filer is the same thing that
    built on connery Linux which
    unfortunately is a very unpopular
    version of Linux and it's not monitored
    by anything and the valve management
    stuff is funky so that's unfortunate but
    and there is a fourth player and I can't
    remember their name he's definitely the
    small tear in that left hand from HP
    used to be one of those players and when
    they got bought by HP they kind of
    two combined hardware so they kind of
    moved into that side instead of being in
    the software space so but for people who
    want to work with Linux and don't want
    to work with Linux or want to look at
    BSD and don't know what the BSD those
    solutions give you those operating
    systems with those operating systems
    advantages and disadvantages without
    having to know those operating systems
    and one actual caveat to mention is if
    you're gonna work with open file are
    very powerful it is the best for
    replication of any product along with
    all the big commercial ones this
    replication is phenomenal but there's no
    interface for it you are you will need a
    senior Linux admin fees to do that with
    freedom the only couple hours we are in
    officially in the QA I think we have
    five minutes I need five minutes of
    ready now
    well so full disclosure company I work
    for is a partner with Netgear so we have
    to say we'd love ready now but we love
    ready now definitely it my personal
    preferences if you're gonna be working
    in the space where you want an appliance
    mask and you you know you want all the I
    just want to buy it right I don't want
    to be ready NASA's a really really great
    product is based on Linux it does not
    have dr BD replication we are pushing
    them for that that doesn't mean they'll
    do it but we have pushed for other
    things we are doing it so there are some
    caveats with readiness and I'm not
    allowed to tell you but so I'm not gonna
    mention what the caveat are but I can
    tell you since I didn't tell you what
    they are that they're going away in
    December so readiness it's a great
    product and we've priced it versus
    building a Sam st and it's within like
    10 percent of cost and there is someone
    on my team who runs one and was it yes
    I'm sorry Don
    I have I don't have experience on it so
    I'm I can't really compare it to
    anything unfortunately can't really
    answer that very well
    if you're getting the 24 bay from from
    Dell the 2.5 inch chances are if you're
    buying that unit it's because you want
    15 K drives chances are just because you
    selected that chassis that's like why
    that chassis exists you don't have to
    when you're choosing both your your raid
    levels right and everybody knows but
    mostly you know probably that I
    absolutely hate raid 5 but the reality
    is if you're in a archival situation and
    it's not a live system and it's backed
    up and all you want it to be is online
    most of time and you're willing to take
    a little bit higher risk raid 5 can save
    you a lot of money I would not use it
    for online systems but I would use it
    for Nearline which is kind of a lot of
    small businesses don't do near line
    storage so but when it comes to actually
    selecting your spindles it's really a
    question of price and versus I ops right
    and so if you know if you're gonna go
    with SATA
    you just have to have more of them but
    they cost less so that could be very
    beneficial and typically you're gonna
    get more storage while you do it so you
    might be like oh here's the option for
    SAS at 15k here's the option for SATA at
    7.2 K and at the price point where it
    gives you the performance you need this
    one likely is gonna give you 2 or 10
    times the actual storage capacity that
    might be a winner but it also might be
    so many doesn't fit in the chassis you
    want to get and there's a little looser
    so and as you have more drives they are
    more likely to fail right just 20 drives
    are more likely to fail them - so there
    are risks there but just doing a
    calculation of performance is really the
    only the only factor there there's no
    there's no guaranteed thing and a lot of
    commercial Santa nests are only SATA
    because they just add more of them
    well so with raid5 compared to and it's
    not just right by the way it's called
    the r8f family which is right two three
    four five and six they use what's known
    as a soar calculation and what that does
    is there's obviously a stripe across the
    disks and you get great capacity out of
    it and that's why it was is why they
    spent the effort to invent it the way
    that that works is the rate controller
    has to do whether it's software hardware
    has to do a lot of work to make that
    stripe work and because of that the rate
    controller becomes a significant point
    of failure compared to rate an array one
    which doesn't have as or calculation so
    the risk that you get beyond performance
    issues the sort calculation causes
    performance issues additionally but
    performance is a arguable point right do
    you care about performance do you not
    care about performance but losing your
    data everyone cares about and I have had
    first hand which will really convince
    you but I also know other companies who
    have had raid controller failure on a
    raid F array node rise fail everything
    lost because the RAID controller freaked
    out and has a destructive operations
    where it destroys all the disks great
    one and rate ten do not have a
    destruction operation to perform on the
    disks well they do a rebuild it is a
    mirror if they were to mirror a good
    disk it would build a new healthy disk
    but if a raid 5 attempts to rebuild an
    unhealthy system it will destroy a
    healthy one
    and so if raid F fails its action in
    failing is to scrap everything and I've
    seen I have definitely seen that
    firsthand caused by chassis shudder in a
    in a data center it was a server that
    was in use for years drives came in and
    out of contact over we're not there's a
    period of minutes or a period of hours
    it kicked off multiple rebuild
    operations and one of them just posed
    tire array so when we found it they had
    all receded themselves and we had six
    healthy discs to help you rate
    controller and no data I think we're at
    it and we're done