Does block level sync exist?
-
We use Barracuda backup, and use offsite replication for each of our sites.
I found a discrepancy between the backup appliance and the replicated storage. Some folder/files had not replicated from the appliance even though the Barracuda console says it's been replicating successful.
I've been working with barracuda support for several weeks on this, and they have not given me an answer as to what happened, and why the data was not the same. Their running all these tests to check the integrity of the data such as checking the metadata/replicated metadata/local binary data/replicated binary data.
This check has been running for several weeks, and I asked if there is technology available that could synchronize the data between the appliance and offsite storage. Here is their response " Block level synchronization would not be possible here as the data is compressed on the receiver but not the sender and the appliances are receiving data from different senders; the blocks would not look the same."
A few questions on this.
-
Is this statement true?
-
If it's not true, what is the solution here?
-
If it's true, how do you check the integrity of the data on the backup appliance that is being replicated to offsite, such as another site, or cloud?
My concerns is if this was a major disaster recovery situation such as the building catching on fire, we would be dead in the water. Especially if those files/folders were sensitive data.
Maybe there's a better way to do this. I'm all ears.
-
-
Why do you want backups at the block (dumb) level rather than at a data-aware level? I'm not clear what you are looking for here, but I think you might be looking for something uncommon because it isn't something you would likely want.
-
@Fredtx said in Does block level sync exist?:
My concerns is if this was a major disaster recovery situation such as the building catching on fire, we would be dead in the water. Especially if those files/folders were sensitive data.
If data integrity matters, block level is definitely not where you should be. You need application level awareness. That's a fundamental of backups (see my book on the subject, lolololol, no but seriously, I cover this in the book.)
-
@Fredtx said in Does block level sync exist?:
Block level synchronization would not be possible here as the data is compressed on the receiver but not the sender and the appliances are receiving data from different senders; the blocks would not look the same."
That would cause some amount of problem, yes. If they were compressing in the right place, though, it would not. This is a design flaw, not a data integrity flaw.
-
@scottalanmiller said in Does block level sync exist?:
@Fredtx said in Does block level sync exist?:
Block level synchronization would not be possible here as the data is compressed on the receiver but not the sender and the appliances are receiving data from different senders; the blocks would not look the same."
That would cause some amount of problem, yes. If they were compressing in the right place, though, it would not. This is a design flaw, not a data integrity flaw.
Here's an article on how their replication works. How offsite replication works barracuda backup
It says:
"When data is transferred offsite, it is further deduplicated by checking to see if the file part already exists on the offsite replication destination. If the part does not exist, the part is compressed, encrypted using 256-bit AES encryption, and transferred to the replication destination. If the part already exists on the offsite replication destination, the part is simply dropped and not transferred offsite."From what I see. The backup (vm and agent level) of the servers to the Barracuda appliance (local server linux box) is application aware, but the offsite replication is not.
-
This design is "supposed" to adhere to the Backup 3-2-1 rule, but I guess technically it does not since the barracuda appliance (where local backup copies of servers resides) is replicating to another appliance at the block level. It's only hoping all the data is copied to the offsite storage.
-
@Fredtx said in Does block level sync exist?:
@scottalanmiller said in Does block level sync exist?:
@Fredtx said in Does block level sync exist?:
Block level synchronization would not be possible here as the data is compressed on the receiver but not the sender and the appliances are receiving data from different senders; the blocks would not look the same."
That would cause some amount of problem, yes. If they were compressing in the right place, though, it would not. This is a design flaw, not a data integrity flaw.
Here's an article on how their replication works. How offsite replication works barracuda backup
It says:
"When data is transferred offsite, it is further deduplicated by checking to see if the file part already exists on the offsite replication destination. If the part does not exist, the part is compressed, encrypted using 256-bit AES encryption, and transferred to the replication destination. If the part already exists on the offsite replication destination, the part is simply dropped and not transferred offsite."From what I see. The backup (vm and agent level) of the servers to the Barracuda appliance (local server linux box) is application aware, but the offsite replication is not.
Dedupe, not compression, would be pretty simple to test if files are the same. Sounds like they are just lazy.
FYI: I'd never use barracuda stuff. These are the guys who had a fully open back door in their FIREWALLS open to the public. They are definitely a vendor I would never consider for anything critical. Email filtering, I suppose. But why do you have them for backups?
-
@Fredtx said in Does block level sync exist?:
This design is "supposed" to adhere to the Backup 3-2-1 rule, but I guess technically it does not since the barracuda appliance (where local backup copies of servers resides) is replicating to another appliance at the block level. It's only hoping all the data is copied to the offsite storage.
If it was replicating for real, it could be verified with checksums. Everyone else that does this can do that.
-
@Fredtx said in Does block level sync exist?:
From what I see. The backup (vm and agent level) of the servers to the Barracuda appliance (local server linux box) is application aware, but the offsite replication is not
Typically no backup software is application aware unless your sole applications are like general MS SQL Server, Active Directory, etc. Essentially nothing is application aware in reality. To be application aware you have to have API hooks into the specific application.
DataVault, for example, is aware of AviMark. Only one backup software in the world is AviMark aware, and AviMark is the only application it is aware of.
So unless you have like no apps at all, Barracuda can't be application aware at a meaningful level. And if they are block based, there's just no way at all. You don't do block level when you want application awareness. Block level essentially exists solely for shops that have given up on knowing what their apps are and just hoping for the best.
-
@scottalanmiller said in Does block level sync exist?:
But why do you have them for backups?
This is the vendor our parent company has been using for some time, and seem to think it's a good product for them to use it for so many years. I mentioned the flaw to upper management during our monthly meeting last week. Director of IT said to let him know if he needs to escalate it. I'm trying to get our divisions backup process in order, and looking at how everything is done, and why is it being done like the way it is, and if it's even working like it's supposed to.
-
@scottalanmiller said in Does block level sync exist?:
@Fredtx said in Does block level sync exist?:
This design is "supposed" to adhere to the Backup 3-2-1 rule, but I guess technically it does not since the barracuda appliance (where local backup copies of servers resides) is replicating to another appliance at the block level. It's only hoping all the data is copied to the offsite storage.
If it was replicating for real, it could be verified with checksums. Everyone else that does this can do that.
The way I've been told their checking this is by: "Checking the metadata/replicated metadata/local binary data/replicated binary data" between the local appliance and the offsite appliance that it's replicating to. So far, we are on day 21, and the verification is not complete. I'm probably past the point where this needs to be escalated at a much higher level since this could have been data loss.
-
@Fredtx said in Does block level sync exist?:
@scottalanmiller said in Does block level sync exist?:
@Fredtx said in Does block level sync exist?:
This design is "supposed" to adhere to the Backup 3-2-1 rule, but I guess technically it does not since the barracuda appliance (where local backup copies of servers resides) is replicating to another appliance at the block level. It's only hoping all the data is copied to the offsite storage.
If it was replicating for real, it could be verified with checksums. Everyone else that does this can do that.
The way I've been told their checking this is by: "Checking the metadata/replicated metadata/local binary data/replicated binary data" between the local appliance and the offsite appliance that it's replicating to. So far, we are on day 21, and the verification is not complete. I'm probably past the point where this needs to be escalated at a much higher level since this could have been data loss.
Block level replication is difficult because it can't tell if things are good until EVERYTHING is good. It's kind of all or nothing. So the amount of work is staggering. If you want to know that that one critical database file made it and is replicated the same everywhere, that's trivial with normal replication. But if you do blocks from filesystem level scales, it can be slow to the point of useless.
Example... take an MS SQL backup of a 1GB database (I did this today, in fact.) Then do an MD5 checksum on it. Then send it off somewhere for offsite storage. Then do an MD5 on it there, too. Voila, you know that it is the same in both places, guaranteed.
But use block level from a filesystem container and that 1G is potentially now 500GB, and you have to deduplicate with a lot of overhead to keep storage costs in line and that simple MD5 is now nearly impossible.
-
@Fredtx said in Does block level sync exist?:
"Checking the metadata/replicated metadata/local binary data/replicated binary data" between the local appliance and the offsite appliance that it's replicating to
That's really just repeating your request back to you, it didn't actually tell you a single thing. That's literally what you asked for them to do.
-
@Fredtx said in Does block level sync exist?:
This is the vendor our parent company has been using for some time, and seem to think it's a good product for them to use it for so many years.
Wow, who uses that AND thinks it's a good product? What the heck. At best, it's "poor but doesn't fail that often."
But, like I said in a meeting recently, all backup software is for the cases where we failed to backup properly. It's all a fallback for failures in data design. In enterprise systems, you don't need backup software. Almost everyone has it as a second option, but you should never need it or rely on it. Or if you use it, it's as an interface to other systems only.
-
@Fredtx said in Does block level sync exist?:
I'm trying to get our divisions backup process in order, and looking at how everything is done, and why is it being done like the way it is, and if it's even working like it's supposed to.
Honestly, if your PARENT company doesn't understand backups, just get them to sign off that doing what they do is good enough and don't make this your problem. If you want to get into backup theory and how to truly protect data it will...
- Never, ever be something that people doing this crap will understand.
- Make them look like idiots for taking something so critical and ignoring the obvious problems with it (not even talking about the one you found.)
- Put you in the line of fire for making managers look bad for not knowing the basics.
There's no upside to you.
-
@scottalanmiller said in Does block level sync exist?:
But, like I said in a meeting recently, all backup software is for the cases where we failed to backup properly. It's all a fallback for failures in data design. In enterprise systems, you don't need backup software. Almost everyone has it as a second option, but you should never need it or rely on it. Or if you use it, it's as an interface to other systems only.
Yea, and most backup vendors will not take the fault if there was some kind of data loss. At least that's what I've seen from the past years of being in IT. They'll say something like "It was corrupted" or give another reason why there was not a copy of the files/folders that were loss.
-
@Fredtx said in Does block level sync exist?:
@scottalanmiller said in Does block level sync exist?:
But, like I said in a meeting recently, all backup software is for the cases where we failed to backup properly. It's all a fallback for failures in data design. In enterprise systems, you don't need backup software. Almost everyone has it as a second option, but you should never need it or rely on it. Or if you use it, it's as an interface to other systems only.
Yea, and most backup vendors will not take the fault if there was some kind of data loss. At least that's what I've seen from the past years of being in IT. They'll say something like "It was corrupted" or give another reason why there was not a copy of the files/folders that were loss.
Right. Well, to their credit, it's really ITs responsibility to understand that only IT can do reliable backups and what we are asking of backup vendors is literally impossible for them to do. So that there is occasional corruption is a guarantee. While they have to be careful not to promise the impossible, it's also our responsibility not to demand it or act like we could get it if we tried hard enough.
Backup software, by definitely, is best effort. Because it has no way to guarantee that the data is healthy when they back it up. The only way to do that is application awareness, which is out of the backup vendors' hands by and large, and even when they have it, it still means "trusting the application".
-
I do backups for financial systems, for example. And we always explain "well, we can quiesce the database and ensure that database is not corrupt, but we can never know if the database has been given quiesced application data because only the developers can tell us that".... and 99% of the time, the devs don't even know themselves and never accounted for needing to make the application safe to back up at all!
When I have my application developer hat on, we make our applications to have their own backup tools, because it's literally the only safe way to know you are getting good backups of a live system. The only. Full stop. If our customers were to buy backup software, it would be so goofy... because it would be extra effort to be less safe.
-
@scottalanmiller said in Does block level sync exist?:
I do backups for financial systems, for example. And we always explain "well, we can quiesce the database and ensure that database is not corrupt, but we can never know if the database has been given quiesced application data because only the developers can tell us that".... and 99% of the time, the devs don't even know themselves and never accounted for needing to make the application safe to back up at all!
I agree. If the application isn't designed for backups in a specific manner then the only safe bet is to shut it down, snapshot the data for backup and power it up again.
The same operations needed to shutdown is a superset of the operations needed to put the database and application data in a safe known state. And most applications are designed to shutdown and startup safely.
It may be clumsy but with VMs the service interruption will usually be short. Maybe 30 seconds or so.
-
@Pete-S said in Does block level sync exist?:
@scottalanmiller said in Does block level sync exist?:
I do backups for financial systems, for example. And we always explain "well, we can quiesce the database and ensure that database is not corrupt, but we can never know if the database has been given quiesced application data because only the developers can tell us that".... and 99% of the time, the devs don't even know themselves and never accounted for needing to make the application safe to back up at all!
I agree. If the application isn't designed for backups in a specific manner then the only safe bet is to shut it down, snapshot the data for backup and power it up again.
The same operations needed to shutdown is a superset of the operations needed to put the database and application data in a safe known state. And most applications are designed to shutdown and startup safely.
It may be clumsy but with VMs the service interruption will usually be short. Maybe 30 seconds or so.
Yeah, it's amazing how many people in IT think that they can buy their way out of this problem. You gotta either be application aware, or stop the backup being of an application (and make it just the storage) by taking the application down. It's super weird because on a desktop we'd likely understand the mechanics but once it is a server, people tend to think it has become a magic black box and they forget the basics that they know from their home computer PC use.