Installing Gluster on CentOS 7


  • Service Provider

    Gluster, or formerly GlusterFS, is the venerable Linux world scale out storage system. Red Hat bought the GlusterFS project in 2011 and has developed and managed it since then. Since Red Hat is the project sponsor it seems obvious that RHEL 7 or CentOS 7 would be our ideal place for deploying Gluster. Gluster is the best known scale out storage system in the open source world and quite popular.

    The first thing that we need is multiple VMs! That's right, Gluster doesn't do anything with only a single node. Now if you are on a platform like I am we can template and clone our systems to make this faster and easier. I'll point out where to do that. So if you are doing this on a cluster (I'm on a Scale HC3 HC2100) where you can using imaging to clone your nodes, I will show where we can pause to do that.

    I am just building small, demo nodes here. My standard layout is to use a 16GB base build and then add on my storage as an extra device, a 100GB device in this example, likely you would use something many times larger in production.

    clone centos for gluster on scale hc3

    add 100GB block virtio device

    Now to log in and get started:

    yum -y install wget epel-release
    wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/RHEL/glusterfs-epel.repo
    yum install glusterfs-server
    pvcreate /dev/vdb
    vgcreate vol_gluster /dev/vdb
    lvcreate -l 100%FREE -n lv_gluster vol_gluster
    mkfs.xfs /dev/mapper/vol_gluster-lv_gluster
    mkdir -p /export/glusterdata
    mount /dev/mapper/vol_gluster-lv_gluster /export/glusterdata
    mkdir -p /export/glusterdata/brick
    echo "/dev/mapper/vol_gluster-lv_gluster /export/glusterdata xfs defaults 0 0"  >> /etc/fstab
    systemctl start glusterd
    systemctl enable glusterd
    

    At this point we have built the basics and could create a template from which to clone new gluster nodes. If this was going to be for production, I would stop here and create this as an unused base template as you may want to add nodes, replace nodes, recover nodes or whatever rather often. Keep a clean template ready to go.

    In our example here, I am only making two so I will continue to use the original to build gluster1, but first I am going to clone it, change the hostname (vi /etc/hostname) and update the ip address (nmtui) and am ready to get started with the gluster2 node with minimal effort. If you don't have the ability to clone (maybe you are not building on a cluster) then you will need to repeat the above steps on each node.

    Now once the second node is ready, back to the first node again:

    gluster peer probe lab-lnx-gluster2
    gluster volume create gv0 replica 2 lab-lnx-gluster1:/export/glusterdata/brick/ lab-lnx-gluster2:/export/glusterdata/brick/
    mkdir /data
    mount -t glusterfs lab-lnx-gluster1:/gv0 /data
    

    Gluster is up and running! But before we start doing anything, over to the second node:

    mkdir /data
    mount -t glusterfs lab-lnx-gluster2:/gv0 /data
    

    That's it, your Gluster storage cluster is up and running. Let's test it:

    touch /data/test-file
    

    Now go to each box and see if it is there!


  • Service Provider

    In my example here, I use LVM as the block device. This is a case where, in production, you would likely not use LVM as there are already several abstraction layers going on and the goal is a lean storage cluster. But LVM provides some flexibility should we want to grow this in the future.


  • Service Provider

    Using this with a system like a Scale HC3 or another form of cluster, you would want to be absolutely sure that you "pin" or set node affinity to ensure that individual nodes run only on independent pieces of underlying hardware.


  • Service Provider

    You probably want a way to see what is going on with your Gluster storage. The info command will tell us the status, like in this example:

    # gluster volume info
     
    Volume Name: gv0
    Type: Replicate
    Volume ID: fc3d20d9-d65e-47ab-93b3-3598e1c9b751
    Status: Started
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: 192.168.1.80:/export/glusterdata/brick
    Brick2: 192.168.1.81:/export/glusterdata/brick
    Options Reconfigured:
    performance.readdir-ahead: on
    


  • Aside from the size of the drives, what would you change if you were putting this into production?

    Ideally, you would have a way to prevent split-brain type problems.



  • This will be helpful. We have a few servers at work that the RAID cards have failed. We are planning to put software RAID and test some things out. One was either Ceph or Gluster. This will help a lot.


  • Service Provider

    @dafyre said:

    Aside from the size of the drives, what would you change if you were putting this into production?

    Ideally, you would have a way to prevent split-brain type problems.

    For production I would have at least three nodes and pretty typically would not have this on a shared infrastructure but on dedicated hardware. Because this is a full cluster on its own, I would expect that I would have resources for nothing but this, custom build for the purpose.

    If Raspberry Pi 3 had SATA connections, I would totally build a cluster that way for fun. That would be neat. You need very low CPU power for Gluster.

    I would likely remove LVM in production as well. Just use the raw disk and all of it.



  • I'm firing up a couple VMs on my KVM box to test it.

    Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.


  • Service Provider

    @johnhooks said:

    I'm firing up a couple VMs on my KVM box to test it.

    Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

    Not a lot.

    http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

    Now that CEPH and Gluster are both inside the RH fold, if you don't want the object flexibility of CEPH, Gluster might be for you.



  • @scottalanmiller said:

    @johnhooks said:

    I'm firing up a couple VMs on my KVM box to test it.

    Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

    Not a lot.

    http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

    Now that CEPH and Gluster are both inside the RH fold, if you don't want the object flexibility of CEPH, Gluster might be for you.

    Ya we would be using it pretty much as a giant NAS. That's what we are experimenting with is older 24 drive servers that were NAS boxes.



  • @scottalanmiller said:

    @johnhooks said:

    I'm firing up a couple VMs on my KVM box to test it.

    Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

    Not a lot.

    http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

    Now that CEPH and Gluster are both inside the RH fold, if you don't want the object flexibility of CEPH, Gluster might be for you.

    Ha I just read that article like 10 minutes ago.



  • So the next question would be... which IP address do you use for connecting to the Gluster system? the IP address of Brick 1 or Brick 2... or Brick N... ?

    Or do you set up some kind of master IP address with Pacemaker / Heartbeat, et al?


  • Service Provider

    @dafyre said:

    So the next question would be... which IP address do you use for connecting to the Gluster system? the IP address of Brick 1 or Brick 2... or Brick N... ?

    Great question. The Gluster client actually handles this. Mount from Server1 and that server fails, the client automatically attaches to Server2. It's not 100% transparent, there is some noticeable delay during the failover but it takes care of itself. It's self healing.

    At mount time, you can't do that, if Server1 is down and that's what is in your mount command it can't find the second server. So either you accept that limitation or you put backup servers into the mount command itself and then it handles it at boot time as well.


  • Service Provider

    Basically, when mounting, the client appears to query the first node, ask it where the other nodes are, and then is ready to reach out to them as needed. The systems remains able to read and write without any intervention even if an individual node fails.


  • Service Provider

    @dafyre said:

    So the next question would be... which IP address do you use for connecting to the Gluster system?

    Any or all.



  • You forgot

    gluster start volume gv0
    

    before you mount the volume to /data



Looks like your connection to MangoLassi was lost, please wait while we try to reconnect.