Setting up an ELK Logging Server
-
Okay, at some point everything got broken and even what was working before is not working now. This is pretty crazy both that all of the instructions from DigitalOcean are wrong, the documentation from ELK is wrong and on GitHub there is no consensus on a fix. I'm rapidly losing all faith in ELK as they are dependent on components that are not working and even they don't seem to have any idea how to make work.
-
Rebuilding from their new (updated in the last 72 hours) ELK image and starting fresh.
-
Update: DigitalOcean has a new build of the ELK image that is fully up to date since I started this thread and you need it in order for things to work. If you are experiencing the issues that I listed above, stop and start over with the latest build. Things "just work" again. I already have CentOS running on CloudatCost sending logs over to ELK on DigitalOcean.
-
If you have a central jump server like we do, it is super easy to push out keys. Once you have the key in place on the Jump server, you can do this to update it at client machines (very easy to script.)
scp /etc/pki/tls/certs/logstash-forwarder.crt root@dny-lnx-pbx1:/etc/pki/tls/certs/
-
Just got a third server feeding into the ELK system. This is working perfectly after the latest update.
-
Here is my working /etc/logstash-forwarder configuration file (x.x.x.x = my IP address, of course)
{ "network": { "servers": [ "x.x.x.x:5000" ], "timeout": 15, "ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt" }, "files": [ { "paths": [ "/var/log/messages", "/var/log/secure" ], "fields": { "type": "syslog" } } ] }
-
Next step is to see if the ElasticSearch YUM repos work for this, because that will be far better than the one off RPM install that DO has us doing in their docs. So let's see.
Here is the docs from ELK.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-repositories.html
-
Here is what some heavy log ingest looks like on the CPU...
-
This is working awesome now, several servers are feeding in and the reports look fantastic.
-
Here is a view of the log reading portion of the interface.
-
Here is digging into the details of a single log entry:
-
Here is the SAR report for the server. Remember we are running at half the cores, half the memory that is recommended - mostly just as an experiment to see how much is really needed for things to be responsive. And so far, ingesting five servers, it is working just fine. We will be adding more servers and keeping an eye on things to see how the performance is and will grow the server if we need to. We are trying to learn from this so that we will have better capacity information. But for a smaller company it looks like a very small server will work just fine. No question that the server is busy, but now that it is up and running and no longer handling the initial setup, it's nowhere near being fully loaded.
02:25:01 PM CPU %user %nice %system %iowait %steal %idle 02:35:01 PM all 12.91 19.61 4.53 0.37 0.00 62.59 02:45:01 PM all 2.68 6.86 2.34 0.20 0.00 87.91 02:55:01 PM all 2.73 6.42 2.25 0.21 0.00 88.40 03:05:01 PM all 2.26 9.77 2.07 0.19 0.00 85.71 03:15:01 PM all 3.56 6.49 2.57 0.30 0.00 87.07 03:25:01 PM all 3.52 12.39 2.90 0.26 0.00 80.93 03:35:01 PM all 2.97 6.45 2.37 0.27 0.00 87.95 03:45:01 PM all 2.54 11.15 2.17 0.17 0.00 83.97 03:55:01 PM all 1.44 5.42 1.69 0.10 0.00 91.35 04:05:02 PM all 0.98 4.86 1.52 0.06 0.00 92.58 04:15:01 PM all 1.54 5.07 1.75 0.09 0.00 91.54 04:25:01 PM all 1.52 10.37 1.91 0.11 0.00 86.10 04:35:01 PM all 3.74 6.99 2.65 0.23 0.00 86.38 04:45:01 PM all 3.11 10.70 2.42 0.24 0.00 83.53 04:55:01 PM all 1.02 5.07 1.59 0.05 0.00 92.26 05:05:01 PM all 1.76 5.64 1.89 0.15 0.00 90.57 05:15:01 PM all 0.93 9.27 1.64 0.05 0.00 88.11 05:25:01 PM all 1.71 5.45 1.86 0.13 0.00 90.85 05:35:01 PM all 2.58 5.40 2.24 0.14 0.00 89.64 05:45:01 PM all 4.18 11.75 2.92 0.25 0.00 80.90 05:55:02 PM all 3.16 5.85 2.13 0.26 0.00 88.60 06:05:01 PM all 3.54 6.36 2.32 0.20 0.00 87.58 06:15:01 PM all 3.14 10.63 2.14 0.16 0.00 83.92 06:25:01 PM all 4.87 11.22 3.27 0.24 0.00 80.40 Average: all 9.22 10.60 3.03 0.41 0.00 76.74