Solved Issue with Elasticsearch
-
It used to be around 30 minutes after the service is started, not sure if changing the heap size made any difference but the last failure was after an hour later. May be because the usage is less, not sure
-
I mean where in the SAR time listing would I see what it looked like at crash time.
-
That would be around 9
09:00:01 PM 5273976 10990756 67.57 423200 9036896 1627500 8.86 3952744 6244044 80
09:10:01 PM 5279188 10985544 67.54 423332 9036824 1606860 8.75 3949060 6244132 228
09:20:01 PM 5257448 11007284 67.68 423464 9036884 1637644 8.92 3968956 6244136 140
09:30:01 PM 5546232 10718500 65.90 423528 9037552 1211984 6.60 3680828 6244452 168
09:40:01 PM 5543888 10720844 65.91 423580 9037992 1202388 6.55 3684388 6244412 36 -
Odd, I don't see memory freeing up after the crash.
-
This doesn't fix the problem, but you could use something like monit or supervisord to start it up again after it quits.
-
@johnhooks monit looks promising. Is there any specific option by which I can monitor the Elasticsearch failing?
-
@Ambarishrh said:
@johnhooks monit looks promising. Is there any specific option by which I can monitor the Elasticsearch failing?
It's been a while since I used it, but I think it uses /var/log/monit.log and will track errors. If not, you can set the log in the config file under the set logfile line.
-
Seems like its java which is causing the issue. I have cloudlinux on the server, and that also might be an issue as it jails the user to use resources. Just removed elasticsearch user from cagefs to see if this solves the issue. Then i need to find out the right resources that needs to be allocated and bring back to caged setup. Waiting to see if it crashes again
-
After several tests, i thought of setting up Elastic on a separate server to make sure that its resources are not shared with anything else. Setup a new server, installed Java and Elastic Search. Now i need to give access to ActiveCollab server to use ElasticSearch, but even with the port 9200 opened/even disabling firewall, I am not able to access the server with port 9200. One place i read that Elastic Search won't be available over the internet. Is that so, or am i missing a config setting by which i can enable ES to grant access to the other server?
-
Use telnet from the remote machine to see if it is open properly.
Also verify that it is listening with
netstat -tulpn
-
Figured that out. on the elasticsearch.yml file, I need to change network.host from localhost to an IP accessible from other servers; Public/Private
-
@scottalanmiller said:
Use telnet from the remote machine to see if it is open properly.
Also verify that it is listening with
netstat -tulpn
Sorry didn't see that message. It was not the firewall, was a config on ElasticSearch, thats solved now. I need to watch it for a day or two to make sure that this doesn't fail.
-
So after 24+ hours of monitoring, ElasticSearch works fine, didn't fail! Concluding that for ElasticSearch to function correctly, use minimum 16GB RAM server and keep it dedicated only for ES.
Hardware recommendation from ES site:
A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductivehttps://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Closing this thread and marking it as solved. Thanks guys
-
@Ambarishrh said:
So after 24+ hours of monitoring, ElasticSearch works fine, didn't fail! Concluding that for ElasticSearch to function correctly, use minimum 16GB RAM server and keep it dedicated only for ES.
Hardware recommendation from ES site:
A machine with 64 GB of RAM is the ideal sweet spot, but 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductivehttps://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html
Closing this thread and marking it as solved. Thanks guys
IMO, that is an insane amount of RAM to be required.
-
It is a lot, but in memory large scale databases often do similar. We had similar numbers with things like Cassandra.
-
Ha my ELK server has 3, but it's a small number of VMs.
-
We have run ELK on two pretty well. But I think that our new one is going to be more like eight.
-
@johnhooks said:
Ha my ELK server has 3, but it's a small number of VMs.
An ELK server is the reason I am concerned about this value. I don't have 16GB of RAM to just through at a VM without a damned good reason.
I really want to get an ELK server setup at a couple clients, but none of their servers have that kind of RAM unallocated.
-
@JaredBusch said:
I really want to get an ELK server setup at a couple clients, but none of their servers have that kind of RAM unallocated.
How many machines will they monitor? We've done ~20 normal servers to a 2GB ELK server, worked fine. Might have been more responsive with more, but it was just fine.
The 64GB recommendation is when using Elastic as a clustered NoSQL database for other purposes where you are dealing with datasets larger than 64GB. No need for numbers like that on a normal SMB ELK install at all. You might want to look for more than 2GB, but you can do pretty well without much.
If you get to the point where the log set that you are reporting on is not able to be in memory, you'll feel the lag on the interface for sure. But most SMBs aren't looking at ten year old logs in real time, either.
-
I think, unless you have some crazy log traffic, that if you can get 4GB for ELK in an SMB, you are nearly always good. I'd expect hundreds of servers to be able to log to that, as long as you have fast disks (it still has to get to disk fast enough no matter how much memory there is.)
We've had massive Splunk databases with 32GB - 64GB, but those are taking data from thousands and thousands of servers and doing so as a high availability failover cluster, so they have to ingest, index and replicate in real time.