60k IOPS Spike



  • Let me start by saying that I am rerunning the test with a longer duration to see if my initial results are an anomaly. But in the meantime, I would like some speculation. I ran Dell live optics a while back, twice. The first test was for only 10 minutes and the second test was for 12 hours. Apparently I had only been looking at the results of the 10 minute test and I didn't pay attention to one specific part of the 12 hour test.

    On the 12 hour test, there was an initial spike of IOPS from one of our datastores that is an 8x250 raid 10 array with SSD's. The spike lasted ~30 minutes and had a peak of just over 60k read IOPS. It started right after the test was started. What can cause something like this? Can the test itself cause results this high?

    It may be a coincidence, but while this test was being done, our main database VM that sits on this array corrupted itself to the point that I had to restore from a backup from before this test. Could that somehow be responsible for the spike, or could the test have caused the corruption? I hit start right before leaving for the night and when I came in the next morning, the VM said it had no OS disk.

    I would upload a screenshot of the live optics, but ML is giving me an upload error.



  • If you upload to Imgur or similar service and get a link, you can link the hosted image here for now till the plugin gets fixed.



  • @Donahue said in 60k IOPS Spike:

    It may be a coincidence, but while this test was being done, our main database VM that sits on this array corrupted itself to the point that I had to restore from a backup from before this test. Could that somehow be responsible for the spike, or could the test have caused the corruption? I hit start right before leaving for the night and when I came in the next morning, the VM said it had no OS disk.

    A newly loaded database can definitely cause some incredible spikes. It might have been re-indexing or loading into RAM during that time. Really intensive operations.

    Rule of thumb for Live Optics is two weeks.



  • there is also a hard page fault spike that corresponds to the same time period, nothing else is on that graph. Maybe I will run a 2 week one after finishing this 24 hour one I started today.

    looking into imgur now.



  • @Donahue said in 60k IOPS Spike:

    there is also a hard page fault spike that corresponds to the same time period, nothing else is on that graph. Maybe I will run a 2 week one after finishing this 24 hour one I started today.

    looking into imgur now.

    Page faults must trigger IOPS, the two are linked.



  • Isn't the test itself passive? Just monitoring the IOPs in usage, not actually trying to cause the system to use high IOPs, right?



  • @Dashrender said in 60k IOPS Spike:

    Isn't the test itself passive? Just monitoring the IOPs in usage, not actually trying to cause the system to use high IOPs, right?

    The theory is that the impact of the test is minimal.



  • alt text
    alt text
    alt text

    Edit: The button for url pictures doesnt seem to work either.
    https://imgur.com/GvyQjFR
    https://imgur.com/5As19Pa
    https://imgur.com/nKZDM5h



  • @Donahue said in 60k IOPS Spike:

    alt text
    alt text
    alt text

    Edit: The button for url pictures doesnt seem to work either.
    https://imgur.com/GvyQjFR
    https://imgur.com/5As19Pa
    https://imgur.com/nKZDM5h

    Those aren't pictures, those are web pages. You have to link the image itself When doing so, you don't need the image button, just the image link will do the trick.



  • GvyQjFR.png
    5As19Pa.png
    nKZDM5h.png



  • @scottalanmiller said in 60k IOPS Spike:

    @Dashrender said in 60k IOPS Spike:

    Isn't the test itself passive? Just monitoring the IOPs in usage, not actually trying to cause the system to use high IOPs, right?

    The theory is that the impact of the test is minimal.

    Is it even really a test, or simply a monitoring. The use of the term 'test' implies to me that LiveOptics itself is testing something, is it? I seriously don't know.



  • @Dashrender said in 60k IOPS Spike:

    @scottalanmiller said in 60k IOPS Spike:

    @Dashrender said in 60k IOPS Spike:

    Isn't the test itself passive? Just monitoring the IOPs in usage, not actually trying to cause the system to use high IOPs, right?

    The theory is that the impact of the test is minimal.

    Is it even really a test, or simply a monitoring. The use of the term 'test' implies to me that LiveOptics itself is testing something, is it? I seriously don't know.

    Just monitoring.



  • I kind of think that the spike has thrown off the average so even the 95% percentile is wrong. But what would cause it to last for so long?



  • @Donahue said in 60k IOPS Spike:

    I kind of think that the spike has thrown off the average so even the 95% percentile is wrong. But what would cause it to last for so long?

    "So long" is only 20 minutes. That's nothing. I have spikes longer than that just for a patch cycle.



  • As far as I can tell, there was nothing going on during that time frame other than the test.



  • Well, there was nothing in the 24 hour test. I have started another one for 7 days, which is the longest option in the current version.


Log in to reply