How to monitor 100 cloud VM's
-
another thought would be three different views, one for CPU, one for RAM, one for Network. each view would have a 10 x 10 grid of either green or red, the red meaning it's over some threshold.. they you click the box and get directly to the machine in question.
-
@fuznutz04 said in How to monitor 100 cloud VM's:
@scottalanmiller What type of resources are we talking about for SS for the agents?
Relatively little. It uses Salt Minion alone as the local agent right now. It's not zero, but it is low and doesn't need to be a priority.
-
@dashrender said in How to monitor 100 cloud VM's:
another thought would be three different views, one for CPU, one for RAM, one for Network. each view would have a 10 x 10 grid of either green or red, the red meaning it's over some threshold.. they you click the box and get directly to the machine in question.
Yup, a CPU Overview kind of screen would be nice.
-
@krisleslie said in How to monitor 100 cloud VM's:
I like the direction your going it would be totally cool to see 25 at a time. Its digestable.
Um, what do these 1000 windows 2012 servers do???
And when they break, why?
Also, why?
-
I'm doing these checks with PRTG locally, at even 75 servers the main screen is crazy to look at.
-
The main screen is actually not that bad. This is a decent view, green means everything is within the agreed limits. Each item has five or so checks underneath. Disk, RAM, Ping, Event Log, Uptime etc... You can drill down to them by clicking on them.
@krisleslie would that do what you need? I'd guess that screen will still be usable at 100 VMs as any issues would change from green to red, flagging it to you?
-
@scottalanmiller said in How to monitor 100 cloud VM's:
@dashrender said in How to monitor 100 cloud VM's:
another thought would be three different views, one for CPU, one for RAM, one for Network. each view would have a 10 x 10 grid of either green or red, the red meaning it's over some threshold.. they you click the box and get directly to the machine in question.
Yup, a CPU Overview kind of screen would be nice.
I have a load screen in Grafana that shows just the load of every system. It's really handy.
-
@stacksofplates said in How to monitor 100 cloud VM's:
@scottalanmiller said in How to monitor 100 cloud VM's:
@dashrender said in How to monitor 100 cloud VM's:
another thought would be three different views, one for CPU, one for RAM, one for Network. each view would have a 10 x 10 grid of either green or red, the red meaning it's over some threshold.. they you click the box and get directly to the machine in question.
Yup, a CPU Overview kind of screen would be nice.
I have a load screen in Grafana that shows just the load of every system. It's really handy.
And then there is another dashboard that does full system details with everything from Prometheus.
-
@stacksofplates said in How to monitor 100 cloud VM's:
Prometheus
I think any tool that can handle it would be of use.
If it's graphical and can do the job so be it. If it is a table and can do the job so be it.
I've tried suggesting and using Comodo ONE in this use case and I don't think it's up to the task for the job. It can monitor, and notify sure. But a visualization I'm not 100% sure about.
Same could be said about Spiceworks.
-
@krisleslie said in How to monitor 100 cloud VM's:
@stacksofplates said in How to monitor 100 cloud VM's:
Prometheus
I think any tool that can handle it would be of use.
If it's graphical and can do the job so be it. If it is a table and can do the job so be it.
I've tried suggesting and using Comodo ONE in this use case and I don't think it's up to the task for the job. It can monitor, and notify sure. But a visualization I'm not 100% sure about.
Same could be said about Spiceworks.
SW would be insanely heavy for that many machines to monitor and requires a dedicated Windows server to run.
-
@krisleslie said in How to monitor 100 cloud VM's:
@stacksofplates said in How to monitor 100 cloud VM's:
Prometheus
I think any tool that can handle it would be of use.
If it's graphical and can do the job so be it. If it is a table and can do the job so be it.
I've tried suggesting and using Comodo ONE in this use case and I don't think it's up to the task for the job. It can monitor, and notify sure. But a visualization I'm not 100% sure about.
Same could be said about Spiceworks.
I would recommend Zabbix. And other options below:
https://www.opennms.org/en/docs
https://my-netdata.io/
https://sensuapp.org/downloads -
If these were my servers, I would want to see bandwidth usage too, especially if the cloud provider is charging me for it. The PRTG approach looks like a really good option.
I've heard good things about Sensu as well.
But no matter what you use, you need to be able to know what normal operation (performance, capacity, utilization) is like across the servers so you will truly know if the behavior you see is an outlier or expected behavior (i.e. a SQL VM spikes in CPU and memory usage because there are a ton of queries running for order inserts at the end of the day, etc.).