NGINX Just Stop Working
-
I have NGINX running on an Ubuntu 20.04 instance. It runs nothing but NGINX and is hosted on hyper-v server. Everything is running with all updates applied.
The issue is that NGINX will randomly just stop routing requests. Websites and services are going offline and looking at the NGINX logs (/var/log/nginx) as well as the syslog doesn't show any errors but when I check to see if the NGINX service is running, it shows as stopped. All I have to do is reboot the Ubuntu server and everything works again (restarting NGINX service doesn't always fix the issue but a server reboot works every time). No other change required, just a reboot.
It's proxying for only a dozen sites and services and traffic is not that high. Looking at resource utilization doesn't indicate there are problems there.
I'm already running auto reboots every night but these random stops continue to happen (before someone asks, no, the issues are not correlated with the reboot schedule). Before I enable debug logging, I thought I'd reach out here to see if anyone else had experienced this before and how you might have fixed it. Should I be looking elsewhere for details on what might be causing this?
-
What do the logs say leading up to it stopping?
-
What does error.log say?
Are you running certbot?
-
Sorry, running out the door for a client. I'll grab the logs and post the contents this weekend.
I am running certbot for Let's Encrypt.
-
@NashBrydges Since nginx is running, this should return ok, but you might want to try a
nginx -t
-
-
@Obsolesce Yes, all packages are up to date.
-
Here is the only entry in the NGINX error log for the last time NGINX stopped.
2021/01/08 22:34:03 [error] 847#847: *195 access forbidden by rule, client: 195.154.63.222, server: plextrack.jpslconsulting.ca, request: "GET / HTTP/1.1", host: "plextrack.jpslconsulting.ca"
The Let's Encrypt log shows no activity immediately before the outage.
Syslog also shows no errors. It has entries from 3AM to 3:155AM and 9:59PM to 10:02PM on the day of the last incident however the outage occurred between 7:06PM and 10:00PM so the only related entries in this log are at the time the outage was discovered and Ubuntu restarted.
-
I also ran the NGINX test and all looks good.
-
certbot.timer failing?
https://stackoverflow.com/a/52967898 -
@black3dynamite I'm not seeing any evidence of this failing in the letsencrypt.log file syslog or nginx logs (both access and error). Would those logs be elsewhere? Obviously I don't want to have to manually renew certs.
-
@NashBrydges letsencrypt.log is the only one I'm aware of. Actually are using systemd to renew your certs or cronjob?
-
@black3dynamite systemd...
-
@NashBrydges said in NGINX Just Stop Working:
I also ran the NGINX test and all looks good.
If they weren't it wouldn't even start up.
-
@NashBrydges said in NGINX Just Stop Working:
Here is the only entry in the NGINX error log for the last time NGINX stopped.
The error log is where it records HTTP errors, not Nginx software errors.
-
@scottalanmiller Well at this point I'm looking at any log that has "error" in the name. Lol
-
@NashBrydges said in NGINX Just Stop Working:
@scottalanmiller Well at this point I'm looking at any log that has "error" in the name. Lol
This should show you what there is for Nginx itself....
grep nginx /var/log/messages
-
@scottalanmiller said in NGINX Just Stop Working:
grep nginx /var/log/messages
/var/log/messages
Does not exist.
-
@NashBrydges said in NGINX Just Stop Working:
@scottalanmiller said in NGINX Just Stop Working:
grep nginx /var/log/messages
/var/log/messages
Does not exist.
Oh sorry, use Ubuntu's log. That's RHELs.
-
@scottalanmiller said in NGINX Just Stop Working:
@NashBrydges said in NGINX Just Stop Working:
Here is the only entry in the NGINX error log for the last time NGINX stopped.
The error log is where it records HTTP errors, not Nginx software errors.
Which is useful for in a case I've seen where the service was started by other means, and showed all addresses were already in use.