File Corruption on Copy Issue
-
I've got an HP DL360 G8 hooked up to 3 D2600s / 30TB Raid 10 / P822 / 30 mil files / Windows Server 2012.
Last week where someone copied TIF images - the source & destination had same file size / counts but the newly made copy was corrupt / not viewable. (Verified this with a binary editor actual files didn't match / but file properties matched.). When I tried to do this exact copy - it reproduced the exact same situation (corrupt copy / file properties matched). 2 people also claimed they had copied directories that had pulled different directories. I have only seen a situation like these once & it had to do with IT staff expanding an array that didn't support expanding live.
I ran a chkdsk that didn't find any issues on the array, monitored files early in the AM to make sure we weren't being ransom wared, dug thru log files to see if I could find anything - everything appeared to be normal. After the Windows Update -> Reboot I could no longer produce the problems. The day people complained to me 4 people had issues, after the windows update / reboot - no one has complained since.
I'm pretty alarmed & have paused our 10 day rolling backups, have additional storage arriving tomorrow to create a snapshot backup restore before resuming. Have any of you ever seen something like this where both the array controller said the array was fine & chkdsk didn't find anything?
-
Things that spring to mind - bad RAM, bad NIC, bad switch. bad cables.
Sounds like the CRC check isn't happening somewhere. -
This post is deleted! -
@jim9500 said in File Corruption on Copy Issue:
Verified this with a binary editor actual files didn't match
Yes a hash in the future. MD5 or SHA1 are built for this.
-
@jim9500 said in File Corruption on Copy Issue:
I have only seen a situation like these once & it had to do with IT staff expanding an array that didn't support expanding live.
Can't be the same situation. That would potentially corrupt the file on the disk, but not in flight over and over again.
-
@jim9500 said in File Corruption on Copy Issue:
Have any of you ever seen something like this where both the array controller said the array was fine & chkdsk didn't find anything?
The array and filesystem ARE fine, you proved that. The file that is stored on the disk isn't bad.
What is bad is the copy that arrives at the other location. The two are different, you verified with the binary editor.
So you know that chkdsk had nothing to find and that nothing is wrong with the array. That's established.
The question is purely... why is the file getting changed during the transfer.
-
@jim9500 said in File Corruption on Copy Issue:
After the Windows Update -> Reboot I could no longer produce the problems. The day people complained to me 4 people had issues, after the windows update / reboot - no one has complained since.
Sounds like you had a bug that wasn't patched, and you patched it. Or it is coincidence that something was loaded incorrectly into RAM and the patching process reloaded it. In either way, take note that it happened but don't waste time trying to track down something that you have no reason to suspect will happen again.
-
@jim9500 said in File Corruption on Copy Issue:
I've got an HP DL360 G8 hooked up to 3 D2600s / 30TB Raid 10 / P822 / 30 mil files / Windows Server 2012.
So obvious concerns.... Windows itself isn't the best for storage, but it's okay. But Windows 2012 is a decade old. Like, seriously. Nothing wrong with using Windows when it is the right tool for the job. But there's no case where it is the right tool for the job and are you willing to run an ancient, unsupported version. If you are going to commit to Windows, commit to it and keep it up to date. If you are even considering the possibility of letting it stay on an old version stop, you've proven that Windows is wrong for you and move to something that you can maintain better.
Which also brings up the question... how old were the missing patches on this old machine?
-
These were the updates installed - after reboot issue went away & hasn't come back
2021-07 Security Monthly Quality Rollup for Windows Server 2012 for x64-based Systems KB5004956
Windows Malicious Software Removal Tool X64 - v5.91 KB890830
2021-07 Security and Quality Rollup for .NET Framework 3.5, 4.5.2, 4.6, 4.6.2, 4.7, 4.7.1, 4.8 for Windows Server 2012 for x64 KB5004230
2021-04 Servicing Stack Update for Windows Server 2012 for x64-based Systems KB5001401
Update for Windows Server 2012 KB3102429
Intel - LAN (Server), Other hardware - Intel(R) 10 Gigabit CX4 Dual Port Server Adapter
2021-01 Security Update for Windows Server 2012 for X64-based Systems KB4535680 -
@jim9500 said in File Corruption on Copy Issue:
These were the updates installed - after reboot issue went away & hasn't come back
2021-07 Security Monthly Quality Rollup for Windows Server 2012 for x64-based Systems KB5004956
Windows Malicious Software Removal Tool X64 - v5.91 KB890830
2021-07 Security and Quality Rollup for .NET Framework 3.5, 4.5.2, 4.6, 4.6.2, 4.7, 4.7.1, 4.8 for Windows Server 2012 for x64 KB5004230
2021-04 Servicing Stack Update for Windows Server 2012 for x64-based Systems KB5001401
Update for Windows Server 2012 KB3102429
Intel - LAN (Server), Other hardware - Intel(R) 10 Gigabit CX4 Dual Port Server Adapter
2021-01 Security Update for Windows Server 2012 for X64-based Systems KB4535680None of those would point specifically to something that might be a problem. But several of them obviously could be, including the Intel driver one.
-
@scottalanmiller said in File Corruption on Copy Issue:
@jim9500 said in File Corruption on Copy Issue:
These were the updates installed - after reboot issue went away & hasn't come back
2021-07 Security Monthly Quality Rollup for Windows Server 2012 for x64-based Systems KB5004956
Windows Malicious Software Removal Tool X64 - v5.91 KB890830
2021-07 Security and Quality Rollup for .NET Framework 3.5, 4.5.2, 4.6, 4.6.2, 4.7, 4.7.1, 4.8 for Windows Server 2012 for x64 KB5004230
2021-04 Servicing Stack Update for Windows Server 2012 for x64-based Systems KB5001401
Update for Windows Server 2012 KB3102429
Intel - LAN (Server), Other hardware - Intel(R) 10 Gigabit CX4 Dual Port Server Adapter
2021-01 Security Update for Windows Server 2012 for X64-based Systems KB4535680None of those would point specifically to something that might be a problem. But several of them obviously could be, including the Intel driver one.
Right the network driver was updated.
I did not follow did he apply the updates and reboot? Or did the updates get applied and the reboot was pending, which he then did? -
@jaredbusch said in File Corruption on Copy Issue:
updates get applied and the reboot was pending,
Updates were applied and reboot was pending on the network driver. Starting to think this was the issue, even though I don't understand how or why it would have caused it.
-
@jim9500 said in File Corruption on Copy Issue:
Updates were applied and reboot was pending on the network driver. Starting to think this was the issue, even though I don't understand how or why it would have caused it.
Network driver... copying files across a network.. Nope cannot see it..
Or did I misunderstand the copy process? Was it all local?
-
@jaredbusch From one folder on the share to another folder on the share from a client computer. Was reproducible on specific directories before restart. Copying same files down to a networked computer locally - then back up to the share resulted in legitimate copy.
-
@jim9500 said in File Corruption on Copy Issue:
@jaredbusch said in File Corruption on Copy Issue:
updates get applied and the reboot was pending,
Updates were applied and reboot was pending on the network driver. Starting to think this was the issue, even though I don't understand how or why it would have caused it.
That's a definitely likely situation. Drivers getting updated but not reloaded means that what is on the disk and what is in RAM do not match and there might be an attempt to load something from disk later and get a conflict.
-
@jim9500 said in File Corruption on Copy Issue:
@jaredbusch From one folder on the share to another folder on the share from a client computer. Was reproducible on specific directories before restart. Copying same files down to a networked computer locally - then back up to the share resulted in legitimate copy.
Yup, that's network. So the network driver was definitely in use during the process. So while we are just guessing, it's a very rational place to start to guess at placing blame.
-
@scottalanmiller Really appreciate the input. Didn't know if it was possible for chkdsk to not pick up corruption issues or it was something mid flight. Intel drivers seems to make the most sense.
-
@jim9500 said in File Corruption on Copy Issue:
Didn't know if it was possible for chkdsk to not pick up corruption issues
Chkdsk can pick up certain types of file corruption, which you did not have. You had a network issue, which is completely different. At no time was any file corrupted. The original file stayed pristine. The new file was written flawlessly. Neither file was corrupt.
-
@jim9500 said in File Corruption on Copy Issue:
it was something mid flight. Intel drivers seems to make the most sense.
It WAS mid-flight. That's known. What is a guess is that the driver is at fault. But I'd say that it is pretty likely.