Installing Gluster on CentOS 7

scottalanmiller

Gluster, or formerly GlusterFS, is the venerable Linux world scale out storage system. Red Hat bought the GlusterFS project in 2011 and has developed and managed it since then. Since Red Hat is the project sponsor it seems obvious that RHEL 7 or CentOS 7 would be our ideal place for deploying Gluster. Gluster is the best known scale out storage system in the open source world and quite popular.

The first thing that we need is multiple VMs! That's right, Gluster doesn't do anything with only a single node. Now if you are on a platform like I am we can template and clone our systems to make this faster and easier. I'll point out where to do that. So if you are doing this on a cluster (I'm on a Scale HC3 HC2100) where you can using imaging to clone your nodes, I will show where we can pause to do that.

I am just building small, demo nodes here. My standard layout is to use a 16GB base build and then add on my storage as an extra device, a 100GB device in this example, likely you would use something many times larger in production.

clone centos for gluster on scale hc3

add 100GB block virtio device

Now to log in and get started:

yum -y install wget epel-release
wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/RHEL/glusterfs-epel.repo
yum install glusterfs-server
pvcreate /dev/vdb
vgcreate vol_gluster /dev/vdb
lvcreate -l 100%FREE -n lv_gluster vol_gluster
mkfs.xfs /dev/mapper/vol_gluster-lv_gluster
mkdir -p /export/glusterdata
mount /dev/mapper/vol_gluster-lv_gluster /export/glusterdata
mkdir -p /export/glusterdata/brick
echo "/dev/mapper/vol_gluster-lv_gluster /export/glusterdata xfs defaults 0 0"  >> /etc/fstab
systemctl start glusterd
systemctl enable glusterd

At this point we have built the basics and could create a template from which to clone new gluster nodes. If this was going to be for production, I would stop here and create this as an unused base template as you may want to add nodes, replace nodes, recover nodes or whatever rather often. Keep a clean template ready to go.

In our example here, I am only making two so I will continue to use the original to build gluster1, but first I am going to clone it, change the hostname (vi /etc/hostname) and update the ip address (nmtui) and am ready to get started with the gluster2 node with minimal effort. If you don't have the ability to clone (maybe you are not building on a cluster) then you will need to repeat the above steps on each node.

Now once the second node is ready, back to the first node again:

gluster peer probe lab-lnx-gluster2
gluster volume create gv0 replica 2 lab-lnx-gluster1:/export/glusterdata/brick/ lab-lnx-gluster2:/export/glusterdata/brick/
mkdir /data
mount -t glusterfs lab-lnx-gluster1:/gv0 /data

Gluster is up and running! But before we start doing anything, over to the second node:

mkdir /data
mount -t glusterfs lab-lnx-gluster2:/gv0 /data

That's it, your Gluster storage cluster is up and running. Let's test it:

touch /data/test-file

Now go to each box and see if it is there!

scottalanmiller

In my example here, I use LVM as the block device. This is a case where, in production, you would likely not use LVM as there are already several abstraction layers going on and the goal is a lean storage cluster. But LVM provides some flexibility should we want to grow this in the future.

scottalanmiller

Using this with a system like a Scale HC3 or another form of cluster, you would want to be absolutely sure that you "pin" or set node affinity to ensure that individual nodes run only on independent pieces of underlying hardware.

scottalanmiller

You probably want a way to see what is going on with your Gluster storage. The info command will tell us the status, like in this example:

# gluster volume info
 
Volume Name: gv0
Type: Replicate
Volume ID: fc3d20d9-d65e-47ab-93b3-3598e1c9b751
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.80:/export/glusterdata/brick
Brick2: 192.168.1.81:/export/glusterdata/brick
Options Reconfigured:
performance.readdir-ahead: on

dafyre

Aside from the size of the drives, what would you change if you were putting this into production?

Ideally, you would have a way to prevent split-brain type problems.

stacksofplates

This will be helpful. We have a few servers at work that the RAID cards have failed. We are planning to put software RAID and test some things out. One was either Ceph or Gluster. This will help a lot.

scottalanmiller

@dafyre said:

Aside from the size of the drives, what would you change if you were putting this into production?

Ideally, you would have a way to prevent split-brain type problems.

For production I would have at least three nodes and pretty typically would not have this on a shared infrastructure but on dedicated hardware. Because this is a full cluster on its own, I would expect that I would have resources for nothing but this, custom build for the purpose.

If Raspberry Pi 3 had SATA connections, I would totally build a cluster that way for fun. That would be neat. You need very low CPU power for Gluster.

I would likely remove LVM in production as well. Just use the raw disk and all of it.

stacksofplates

I'm firing up a couple VMs on my KVM box to test it.

Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

scottalanmiller

@johnhooks said:

I'm firing up a couple VMs on my KVM box to test it.

Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

Not a lot.

http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

Now that CEPH and Gluster are both inside the RH fold, if you don't want the object flexibility of CEPH, Gluster might be for you.

stacksofplates

@scottalanmiller said:

@johnhooks said:

I'm firing up a couple VMs on my KVM box to test it.

Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

Not a lot.

http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

Now that CEPH and Gluster are both inside the RH fold, if you don't want the object flexibility of CEPH, Gluster might be for you.

Ya we would be using it pretty much as a giant NAS. That's what we are experimenting with is older 24 drive servers that were NAS boxes.

stacksofplates

@scottalanmiller said:

@johnhooks said:

I'm firing up a couple VMs on my KVM box to test it.

Does Ceph have any advantages? I don't think I can count object storage as an advantage based on what we would be using it for.

Not a lot.

http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

Now that CEPH and Gluster are both inside the RH fold, if you don't want the object flexibility of CEPH, Gluster might be for you.

Ha I just read that article like 10 minutes ago.

dafyre

So the next question would be... which IP address do you use for connecting to the Gluster system? the IP address of Brick 1 or Brick 2... or Brick N... ?

Or do you set up some kind of master IP address with Pacemaker / Heartbeat, et al?

scottalanmiller

@dafyre said:

So the next question would be... which IP address do you use for connecting to the Gluster system? the IP address of Brick 1 or Brick 2... or Brick N... ?

Great question. The Gluster client actually handles this. Mount from Server1 and that server fails, the client automatically attaches to Server2. It's not 100% transparent, there is some noticeable delay during the failover but it takes care of itself. It's self healing.

At mount time, you can't do that, if Server1 is down and that's what is in your mount command it can't find the second server. So either you accept that limitation or you put backup servers into the mount command itself and then it handles it at boot time as well.

scottalanmiller

Basically, when mounting, the client appears to query the first node, ask it where the other nodes are, and then is ready to reach out to them as needed. The systems remains able to read and write without any intervention even if an individual node fails.

scottalanmiller

@dafyre said:

So the next question would be... which IP address do you use for connecting to the Gluster system?

Any or all.

stacksofplates

You forgot

gluster start volume gv0

before you mount the volume to /data

Emad R

@scottalanmiller
No package glusterfs-server available ???

I tried other articles as well
I can install = centos-release-gluster
but not glusterfs-serve = not available

Oh nvm they changed the url of their repo

Connecting to download.gluster.org (download.gluster.org)|23.253.208.221|:443... connected.
HTTP request sent, awaiting response... 404 Not Found

This worked for me:

yum search centos-release-gluster #check LTS version number (centos-release-gluster310)
yum -y install centos-release-gluster310 -y
sed -i -e "s/enabled=1/enabled=0/g" /etc/yum.repos.d/CentOS-Gluster-3.10.repo
yum --enablerepo=centos-gluster310,epel -y install glusterfs-server
systemctl start glusterd
systemctl enable glusterd

stacksofplates

@emad-r said in Installing Gluster on CentOS 7:

@scottalanmiller
No package glusterfs-server available ???

I tried other articles as well
I can install = centos-release-gluster
but not glusterfs-serve = not available

Oh nvm they changed the url of their repo

Connecting to download.gluster.org (download.gluster.org)|23.253.208.221|:443... connected.
HTTP request sent, awaiting response... 404 Not Found

This worked for me:

yum search centos-release-gluster #check LTS version number (centos-release-gluster310)
yum -y install centos-release-gluster310 -y
sed -i -e "s/enabled=1/enabled=0/g" /etc/yum.repos.d/CentOS-Gluster-3.10.repo
yum --enablerepo=centos-gluster310,epel -y install glusterfs-server
systemctl start glusterd
systemctl enable glusterd

It's in the storage SIG too. So if you use a mirror local to you, you should be able to find it under storage.

PenguinWrangler

I was thinking about doing Gluster Storage for my three KVM Hosts and keep my KVM VMs there. So if I made a virtual machine for the Gluster that used all the storage on each machine and then mounted the Gluster store in each KVM host for storage, would there be any disadvantage to that?

travisdh1

@penguinwrangler said in Installing Gluster on CentOS 7:

I was thinking about doing Gluster Storage for my three KVM Hosts and keep my KVM VMs there. So if I made a virtual machine for the Gluster that used all the storage on each machine and then mounted the Gluster store in each KVM host for storage, would there be any disadvantage to that?

Yes, good plan.

That's essentially how many commercial offerings operate today, they just hide the complexity from you.