Posts made by NetworkNerd

NetworkNerd

I know the Linux expertise in ML is strong and could use some guidance on a Debian WordPress VM issue. Here's some context on the VM that is having performance issues (almost daily now) during the same time window:

Bitnami WordPress instance with NGINX deployed into an e2-micro instance in Google Compute Engine. It uses MariaDB specifically as its database. It's Debian 5.10.205-2. The VM was deployed in the last month or two to migrate to a more current version of the OS from the previous version of the site. We were experiencing all of the problems described below using the former instance of the site before it was migrated. The only migration work that was done was a backup and restore of the images and the database to a new VM with the latest Bitnami WordPress image, and the problem has persisted even after that.
Most every morning (with few exceptions) around 5 AM CST there is a spike in disk IO which causes CPU IO wait to spike and sends swap through the roof for 1 - 2 hours minimum. Queue length also goes way up too. The site's behavior when you visit during this time is to eventually throw an NGINX 504 gateway timeout error. The only way to resolve the problem that we have found is to login to the VM via SSH and either reboot it or run the Bitnami service restart command (sudo /opt/bitnami/ctlscript.sh restart). Either of those will return the VM to a working, functional, responsive state until the same time window hits the next day. There seems to be no real difference in the reboot vs. the services restart in terms of keeping the problem at bay other than the reboot might prevent it for the day after you rebooted (but not always). All Bitnami WordPress services seem to be running during the problem window (nothing seems to be failing).
If someone were to run iotop during the problem window, they would see the mariadb service as the top culprit as shown below (which seems to indicate something is hitting the database really hard during this time window). Outside of the problem window you don't see IO spikes. The mariabd service may jump up and use 30% IO for a second and then disappear from atop the list. During the problem window you can guarantee mariadb and the php-fpm services (and several instances of them) will be at the top.
I've also noticed during the problem window systemd-journal-flush.service shows loaded and failed when you run systemctl. That seems to make sense during a period of high IO and a lot of swap with high CPU IO wait, but I would love insight from others.
This site is a website for a podcast and hosts the main feed for the show. The WordPress database itself is tiny (like 10 MB) but has several hundred posts in it. The only other data really stored in WordPress would be small PNG files that get used for featured images. We have at least 3 GB free on the VM's disk at this point.
From a plugin standpoint in WordPress we have everything disabled except Akismet, Jetpack, Blubrry PowerPresss, and Updraft Plus. We thought it might be Updraft, but every backup completes in 10 - 11 seconds with no errors. The time for backups is like 6:30 PM CST in the evening, We confirmed that by looking at time stamps of files inside the VM's OS and even tried de-activating Updraft to see if the problem went away (it did not).
Outside of the problem window the VM has plenty of free memory when you run free -m and is using little or no swap. The site works great outside of the specific time window.
We looked at cron jobs, and nothing seems out of the ordinary. It feels like there is some kind of scheduled task for the database specifically that is causing the problem, but I do not know how to pinpoint it or what queries are being run against the database. I tried installing sar to get some details but apparently have too much rust on my Linux chops (which were minimal) since the days of building and administering Elastix PBXs. There do not seem to be any scheduled scripts, etc. from looking at wp-config.php either.

Has anyone here seen an issue like this? If so, does it make any sense why the database would be hit with so much IO during a specific time period like this? By the way, this site isn't getting a crazy amount of traffic either. I looked at Jetpack stats, and it gets anywhere from 5 to 20 or 30 visits in a day. Any guidance is greatly appreciated.

I'll also add that looking at dashboards in Google Compute Engine confirms the time window of issue. The database process seems to show up as top usage of CPU and memory during the problem window.

NetworkNerd

Sorry for the last minute post here, but if you happen to want to join in the virtual SpiceCorps meeting we're having this evening at 6:30 PM CST, it's open to any and all. Full details are below.

This meeting will be virtual, so be sure to register here to receive the Zoom link! We will seek to get together in person later in the year and ran out of runway for that this quarter.

The recent cybersecurity landscape has been challenging for many of us. More publicized attacks and hacks make for more of a business focus on risk mitigation. What are you doing to protect the crown jewels of the business, and how has your job become more security focused in the last 6-18 months? Come join us for an open forum discussion on how technologists are helping to protect their companies in the current state of the world. This is a time to share ideas, ask questions of peers, and learn together.

Even if you've never joined us before, we'd love to have you attend and share your knowledge with us!

NetworkNerd

For anyone who missing the live event, you can find the recording here - https://www.youtube.com/watch?v=RJjVInQuIbg. It's 2 hours of really good information.

Maybe we will see you next time, @scottalanmiller. I hope all is well!

NetworkNerd

We're kicking off in less than an hour. There's still time to join us if anyone out there is up for it and didn't register previously.

NetworkNerd

This is open to anyone who might like to attend, regardless of location.

NetworkNerd

It's that time again! Let's get together and nerd out on something technology related. Last time we discussed career, and this time we will dive deep into the realm of databases with David Klee, owner of Heraflux Technologies and SQLibrium.

This meeting will be virtual, so be sure to register here to receive the Zoom link!

David will be presenting the following session for us with open Q & A. If it has been a while since you joined in the fun or you are a first timer, you don't want to miss the knowledge that will be shared during this meeting!

How to Maintain and Improve SQL Server Performance for the IT Generalist

For the accidental DBA, SQL Server usually ‘just works.’ However, simply staying online and running as well as it can for your business are two completely different things. In this session, we will describe all the things that you, the IT generalist, can do to make sure your SQL Servers are running efficiently, being maintained properly, and alert you if something bad is brewing. We will cover topics such as routine maintenance and backup management, performance validation, license optimization, and performance tips such as missing index creation. This real-world session is coming from over 25 years of SQL Server experiences, and you’re sure to take away ways to make your environment run smoother!

NetworkNerd

We're still a green light for this event. Join us if you can for stories of job change during the pandemic, and bring your questions.

NetworkNerd

Any possibility of doing it 100% virtual?

NetworkNerd

Date: 2/22/2021
Start Time: 6:30 PM Central
Format: Zoom

This will be our first virtual meeting of the year (open to all who would like to attend). Maybe you, like many others, want to start to the year by setting a goal to get a new job or change your career in some way. Come join us and get the scoop from people who have been there!

In this meeting, we will focus on the career progression of two technologists who changed jobs during the global pandemic. We'll get the scoop on the why, the interview and onboarding process and how it changed as a result of the pandemic, what it was like on the other side of the job change, how each adapted to new managers in teams, and how they have grown as professionals in the process.

Each of our guests will be sharing their experience and have time to answer your questions.
-Paul Mai (a member of our SpiceCorps) - Systems Administrator at Allied Electronics
-Jeff Eberhard - Oracle Cloud VMware Solutions Leader at Oracle

I'll open the bridge early for anyone who wants to hop on a few minutes before we start. Remember to register here to get the Zoom meeting details. You can also RSVP in the community.

NetworkNerd

@EddieJennings said in Free / Cheap Unattended Remote Access Utility for Windows PCs:

@NetworkNerd

My MeshCentral VM is in Vultr.

Oh nice - it looks like they have 3 different options at $5 / month or less.

NetworkNerd

Thanks for the recommendations. It looks like the best practice here is to run your own server for either Guacamole or MeshCentral. I imagine no one here would risk running on the public MeshCentral server (don't think I would).

Maybe this is my chance to tinker with ESXi on Arm with some Raspberry Pi 4s. Either that, or I can provision something in AWS / GCP / Azure for pretty cheap.

NetworkNerd

It's been a while since I have needed something like this, but what's the current recommendation for some free tools for unattended access to non-domain Windows PCs? I can spend a little if needed but would prefer free if there are some options to support friends, family, etc. I was thinking TeamViewer but seem to remember the unattended access option was a paid feature.

NetworkNerd

After way too much time spent on this, we found the problem is twofold. Hopefully this will help someone else in a similar situation.

In the Lumens, to get it to recognize a new stream key for Facebook, we had to put in some random text characters in the stream key field, save the changes, paste the new stream key, and save the changes. That would generate the preview and allow streaming to Facebook no problem.
We found there was a greater issue present. The internet speed at the building where the encoder sits was supposed to be 100 / 7, but they were not getting anything close to that. Once we got Facebook working in addition to YouTube, we found streaming both caused them to be extremely choppy.
After having an ISP technician out to the building last week, they said the wiring in the box outside the building was awful, we were at the end of the line, and that he was surprised we ever got internet signal to the building. They are working on clearing up the pipe now so the signal is strong and clean, and we're upgrading to 200 / 10 pretty soon. We also changed the video resolution we're sending to YouTube and Facebook so we're using a lot less bandwidth to ensure we don't use the entire upload pipe.

NetworkNerd

@dbeato We don't have web filtering capabilities at this location. It's just a Linksys gateway router (not sure there are filtering capabilities on it). Access to this device is limited to 1 person, and no configs changed.

NetworkNerd

I've been trying to help launch a live stream for a church lately. They have a Lumens LC200 appliance and one camera (also a Lumens, cannot remember the model number). They are using this with a Presonus sound board / mixer.

They got the LC200 configured to pull in the camera feed, set all camera presets, etc. with no issues. For streaming specifically, the setup it's that hard. You enter the RTMP / RTMPS server address and stream key from Facebook / YouTube, and you're on your way. At least you would think so, right?

We tested Facebook Live the other day twice successfully (once at noon and once at 5 PM). The first was a private post, and the other was public. The church has a page on Facebook, so as long as you are an admin of the Facebook page you can Go Live on the church page.

The third time we tried to stream to Facebook Live (later on the same day as the two successful tests), the video stream never made it to Facebook to preview (so no way to go live). The lumens showed a stream error inside the Director software that someone uses to activate the stream. We double and triple checked stream keys, tried different ones, rebooted the Lumens, blew away configs and added them back. I thought maybe the church was having an internet issue, but there were no problems getting to the internet from any PC there (could always get to Facebook to attempt to go live but never get a video preview).

Yesterday we went back when no one was there to test again. I thought maybe there was an outside chance it would work. But nope...the same stream error happened. We updated firmware on the LC200 and the camera and tried Facebook again. No dice.

But then we tried streaming to YouTube (basically same config parameters). That worked like a champ every time we tried (video preview comes in as expected in seconds once you start the stream from the Lumens).

I have a ticket open with Lumens on this, but I don't see how this could have worked twice successfully and all the sudden stopped working. This video shows the simplicity of the setup on the Lumens side.

I should also add that anyone can Go Live from a mobile device with no issues like they did before the camera and Lumens device came in to play.

Has anyone seen this? I'd love to hear ideas from folks here on what we might be missing. I would rather have had this never work streaming to Facebook Live than to work twice and then fail after that, especially since YouTube works. I checked permissions on the church page, and the account used to login to Facebook is still an admin of the page. We even tried a different user login who is the admin of the page with no success.

NetworkNerd

I see an empty conference center where VMworld once sat.

If you’ve never attended VMworld in person, it’s a completely overwhelming experience for the first time attendee. Now that it’s virtual and free to attend, it’s more accessible to us all. But it can still be overwhelming. I wanted to take a few minutes to educate the community on some hidden gems that will allow you to make the most of your experience.

Get the full story here.

NetworkNerd

There's still time to join in the fun before 9/22.

NetworkNerd

Date: 9/22/2020
Start Time: 11 AM Central
Format: Zoom

It's that time again! This will be another virtual SpiceCorps meeting using Zoom, and we're going to work it in over lunch (bring your own food, of course). Please use this link to register so the Zoom details will be e-mailed to you. The registration is only to prevent trolls (nothing more).

As the number of devices used to connect to corporate networks continues to grow, planning to ensure devices can perform optimally when running applications is a challenge. As administrators, troubleshooting the network performance of devices can be even more challenging, often requiring us to use many different tools. Join us for a lunch and learn session with Nyansa (now part of VMware) as we share a technical deep dive and demo of their Voyance Platform with plenty of time for interactive questions and answers. You’ll learn how this platform can enhance the user experience of any device on the corporate network, how to leverage performance benchmarking and proactive recommendations for remediation, and how to gain additional insight into the behavior of IoT devices on your network.

Bring your questions, and we look forward to seeing you there! I'll plan to open the bridge a few minutes before 11 AM.

SpiceCorps meeting link is here.

NetworkNerd

Today is the day! Be sure to register to get the meeting link via e-mail if you'd like to join.

NetworkNerd

We're only 2 days away. Keep in mind this is open to anyone who would like to join...regardless of location.