NGINX Just Stop Working

NashBrydges

I have NGINX running on an Ubuntu 20.04 instance. It runs nothing but NGINX and is hosted on hyper-v server. Everything is running with all updates applied.

The issue is that NGINX will randomly just stop routing requests. Websites and services are going offline and looking at the NGINX logs (/var/log/nginx) as well as the syslog doesn't show any errors but when I check to see if the NGINX service is running, it shows as stopped. All I have to do is reboot the Ubuntu server and everything works again (restarting NGINX service doesn't always fix the issue but a server reboot works every time). No other change required, just a reboot.

It's proxying for only a dozen sites and services and traffic is not that high. Looking at resource utilization doesn't indicate there are problems there.

I'm already running auto reboots every night but these random stops continue to happen (before someone asks, no, the issues are not correlated with the reboot schedule). Before I enable debug logging, I thought I'd reach out here to see if anyone else had experienced this before and how you might have fixed it. Should I be looking elsewhere for details on what might be causing this?

scottalanmiller

What do the logs say leading up to it stopping?

Obsolesce

What does error.log say?

Are you running certbot?

NashBrydges

Sorry, running out the door for a client. I'll grab the logs and post the contents this weekend.

I am running certbot for Let's Encrypt.

travisdh1

@NashBrydges Since nginx is running, this should return ok, but you might want to try a nginx -t

Obsolesce

@NashBrydges said in NGINX Just Stop Working:

I am running certbot

Is it up to date?

NashBrydges

@Obsolesce Yes, all packages are up to date.

NashBrydges

Here is the only entry in the NGINX error log for the last time NGINX stopped.

2021/01/08 22:34:03 [error] 847#847: *195 access forbidden by rule, client: 195.154.63.222, server: plextrack.jpslconsulting.ca, request: "GET / HTTP/1.1", host: "plextrack.jpslconsulting.ca"

The Let's Encrypt log shows no activity immediately before the outage.

Syslog also shows no errors. It has entries from 3AM to 3:155AM and 9:59PM to 10:02PM on the day of the last incident however the outage occurred between 7:06PM and 10:00PM so the only related entries in this log are at the time the outage was discovered and Ubuntu restarted.

NashBrydges

I also ran the NGINX test and all looks good.

black3dynamite

certbot.timer failing?
https://stackoverflow.com/a/52967898

NashBrydges

@black3dynamite I'm not seeing any evidence of this failing in the letsencrypt.log file syslog or nginx logs (both access and error). Would those logs be elsewhere? Obviously I don't want to have to manually renew certs.

black3dynamite

@NashBrydges letsencrypt.log is the only one I'm aware of. Actually are using systemd to renew your certs or cronjob?

NashBrydges

@black3dynamite systemd...

scottalanmiller

@NashBrydges said in NGINX Just Stop Working:

I also ran the NGINX test and all looks good.

If they weren't it wouldn't even start up.

scottalanmiller

@NashBrydges said in NGINX Just Stop Working:

Here is the only entry in the NGINX error log for the last time NGINX stopped.

The error log is where it records HTTP errors, not Nginx software errors.

NashBrydges

@scottalanmiller Well at this point I'm looking at any log that has "error" in the name. Lol

scottalanmiller

@NashBrydges said in NGINX Just Stop Working:

@scottalanmiller Well at this point I'm looking at any log that has "error" in the name. Lol

This should show you what there is for Nginx itself....

grep nginx /var/log/messages

NashBrydges

@scottalanmiller said in NGINX Just Stop Working:

grep nginx /var/log/messages

/var/log/messages

Does not exist.

scottalanmiller

@NashBrydges said in NGINX Just Stop Working:

@scottalanmiller said in NGINX Just Stop Working:

grep nginx /var/log/messages
/var/log/messages
Does not exist.

Oh sorry, use Ubuntu's log. That's RHELs.

Obsolesce

@scottalanmiller said in NGINX Just Stop Working:

@NashBrydges said in NGINX Just Stop Working:
Here is the only entry in the NGINX error log for the last time NGINX stopped.
The error log is where it records HTTP errors, not Nginx software errors.

Which is useful for in a case I've seen where the service was started by other means, and showed all addresses were already in use.