Scheduling Simple Local Linux Reboots

scottalanmiller

A very common Linux administration task is to schedule reboots. There are, of course, many ways to skin this particular cat. Some people want reboots kicked off from a central source. But more commonly, more simply and more appropriate for the cloud computing world is the simple cron based reboot. Just let the internal system scheduler handle the reboots.

Just open root's cron table (crontab) using crontab -e as root or sudo -i crontab -e as a wheel and then enter a line like this one:

4 17 * * 6 /sbin/shutdown -r now

This command will make the server reboot at 1704 (5:04PM) on each Saturday. You can obviously adjust the time as is appropriate for your environment. Early Sunday morning is a popular time for reboots, or late Friday evening.

The advantage to a local cron reboot is that cron is highly reliable, very well known, an expected place for reboots to be scheduled and will work even when nearly all other things have failed. Even when Internet access is down, remote access is disabled, NICs have failed, user authentication is broken, services have failed, etc. it will still have an extremely high likelihood of success and when things are broken, getting a reboot is often the best way to start fixing things so this can be very advantageous.

aaron-closed account

This post is deleted!

JaredBusch

@aaron said in Scheduling Simple Local Linux Reboots:

Why would you schedule unattended reboots?

Especially Friday evening‽

I'd userdel someone doing that

I hate unrebooted servers. How do you know they will reboot correctly if they have been running for 300 days?

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

Why would you schedule unattended reboots?

Because you want your systems rebooting at the optimum time, reliably, every time. You don't want to pay people to do a task that is better left to automation. That's why we automate systems.

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

Especially Friday evening‽

Because the most common green zone for businesses (not all, just the most common) is from Friday at 5PM when everyone goes home or to happy hour and you have until Monday morning before people are back in the office. It's the best window at which to trigger a reboot so that you have over 48 hours to resolve critical issues before your maintenance window turns into "downtime." Sure if you are 24x7 you have to find your slow time and make the best of it. But most businesses are Mon-Fri 9-5 or similar. So you take intentional risk when you have a maintenance window and reduce the risk during production hours.

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

I'd userdel someone doing that

They have automated systems that contact your manager if you disable this on Wall St. When uptime really, really matters, you don't leave your reboots to chance. The Monday morning "why didn't your server reboot and what managing director signed off on not rebooting" calls are annoying to have to answer to.

aaron-closed account

This post is deleted!

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

But isn't that exactly what you're doing? Leaving it to chance?

That's why you monitor. By that logic, all automation is leaving it to chance.

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

Not the execution, but that the machine comes back up correctly? I want eyes watching every reboot, not the guy on call, but the person doing the reboot.

Of course, but you don't want the system admins "watching" systems, ideally you don't even want local logins, you want application teams and end users performing checkouts of the final application. You want monitoring systems watching the system itself. Not people staying at consoles.

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

If an organization decides to reboot systems at a regular maintenance window, I'd still want it to be executed by a person and not a cron job.

That's both extremely expensive, and complicated. For example, I had 600 servers to myself at one job, 3,000 at another. I couldn't watch those reboots even if I did it every minute of every day. And there is no way that I could see anything useful. What do you expect people to watch for during a reboot? Unless they are scouring logs, which they would not be doing by watching the server anyway, what would they see?

aaron-closed account

This post is deleted!

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

@scottalanmiller I think we're talking about totally different team sizes hahah

Whether you are a team of one or one hundred, the factors remain the same. I had a team of 80, but there were tens of thousands of servers. So the per-admin time needed to watch consoles was still very prohibitive.

aaron-closed account

This post is deleted!

scottalanmiller

At least with VMs, the amount to watch and the time to watch is far, far less. Once upon a time I had to do this with physicals and the fifteen minute memory POST check was seriously painful.

scottalanmiller

@aaron said in Scheduling Simple Local Linux Reboots:

@scottalanmiller said in Scheduling Simple Local Linux Reboots:

what would they see

A config file borking a service, a hardware problem that manifested, etc.

You would not see that just watching a reboot, that's my point. A config problem would need to be caught either from logs or from the application checkout. It would be caught without an admin watching the reboot.

Hardware would not be found in a VM, and a hypervisor failing to reboot would get caught by monitoring really quickly. So I don't see the benefit there unless you have a rush window of say under ten minutes, in which case you do indeed have all of your resources standing by including the remote hands in the datacenter, not just the Linux and hypervisor admins. But that wouldn't be Linux (unless you are on KVM.)

dafyre

@aaron said in Scheduling Simple Local Linux Reboots:

@scottalanmiller said in Scheduling Simple Local Linux Reboots:

what would they see

A config file borking a service, a hardware problem that manifested, etc.

Again, this is why you would use a monitoring and alerting service. Friday afternoon at 5:00PM : 300 systems reboot... By 6:00PM those systems should be back up... If System-247 fails... then start alerting folks.

Edit: If Reboots are done regularly, there's much less risk of something breaking, IMO.

scottalanmiller

@dafyre said in Scheduling Simple Local Linux Reboots:

@aaron said in Scheduling Simple Local Linux Reboots:

@scottalanmiller said in Scheduling Simple Local Linux Reboots:

what would they see

A config file borking a service, a hardware problem that manifested, etc.

Again, this is why you would use a monitoring and alerting service. Friday afternoon at 5:00PM : 300 systems reboot... By 6:00PM those systems should be back up... If System-247 fails... then start alerting folks.

With automation, you can tighten that window, too. You can make it a five minute or ten minute alert suppression. You tweak, of course. If you don't automate, you can't have that tight monitoring and are actually, I feel, more likely to miss an error than if you didn't have the human involved.

aaron-closed account

This post is deleted!

JaredBusch

@aaron said in Scheduling Simple Local Linux Reboots:

I'm not against automation. I think y'all got the wrong idea. I against screwing the in call team on a Friday evening because servers are rebooting and nobody is actively watching the monitoring. I am not talking about watching them POST for goodness sakes.

Why would no one be watching the monitoring? That is the point of monitoring. It sends alerts. and someone should always be around to handle it.

aaron-closed account

This post is deleted!