Deduplication on CSV storage

Jimmy9008

Hi folks,

We have a three node Windows Failover Cluster, with several CSVs, each provided by Starwind using local storage.

One interesting idea has been brought up internally and i'd like to know if it sounds sensible or possible.

Is it possible to enable deduplication of CSV storage within the cluster? My initial thoughts are that as VM files are open in vmms.exe, the deduplication process will not be able to work on those open files. Is that wrong though? I would expect iso files to dedup a they are offline/not in use, but have a feeling live running VM VHD(X) files on the host will be unable to run through the process...

I could try this, but don't want to risk causing issues to the storage. So, thought best to put the idea out first.

Best,
Jim

scottalanmiller

Never researched this, but I see no reason why a live file would be unable to be deduplicated as at the block level, where dedupe happens, the concept of open or closed doesn't exist.

scottalanmiller

Dedupe is by volume, not file system. So I'm pretty sure CSV with mounted VHD will be just the same as anything else.

Also, Hyper-V is an optimization option for Dedupe, so I'd really expect it to work.

dbeato

No, do not enable deduplication on the CSV Storage if the the Server OS are Server 2012
https://docs.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs

https://support.microsoft.com/en-us/help/2906888/known-issues-after-you-enable-data-deduplication-on-csv

scottalanmiller

@dbeato said in Deduplication on CSV storage:

if the the Server OS are Server 2012

This is really just part of the more general advice of "never use Server 2012".

Obsolesce

@Jimmy9008 said in Deduplication on CSV storage:

Hi folks,

We have a three node Windows Failover Cluster, with several CSVs, each provided by Starwind using local storage.

One interesting idea has been brought up internally and i'd like to know if it sounds sensible or possible.

Is it possible to enable deduplication of CSV storage within the cluster? My initial thoughts are that as VM files are open in vmms.exe, the deduplication process will not be able to work on those open files. Is that wrong though? I would expect iso files to dedup a they are offline/not in use, but have a feeling live running VM VHD(X) files on the host will be unable to run through the process...

I could try this, but don't want to risk causing issues to the storage. So, thought best to put the idea out first.

Best,
Jim

No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.

scottalanmiller

@Obsolesce said in Deduplication on CSV storage:

No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.

Why not on the CSV?

Obsolesce

@scottalanmiller said in Deduplication on CSV storage:

@Obsolesce said in Deduplication on CSV storage:

No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.

Why not on the CSV?

That does it across the board then for everything in it. If that's what he wants, go for it. Just know what you need to do to do it properly. It also depends on a few variables.

Just look over the docs well and know it first.

Jimmy9008

@Obsolesce said in Deduplication on CSV storage:

@Jimmy9008 said in Deduplication on CSV storage:

Hi folks,

We have a three node Windows Failover Cluster, with several CSVs, each provided by Starwind using local storage.

One interesting idea has been brought up internally and i'd like to know if it sounds sensible or possible.

Is it possible to enable deduplication of CSV storage within the cluster? My initial thoughts are that as VM files are open in vmms.exe, the deduplication process will not be able to work on those open files. Is that wrong though? I would expect iso files to dedup a they are offline/not in use, but have a feeling live running VM VHD(X) files on the host will be unable to run through the process...

I could try this, but don't want to risk causing issues to the storage. So, thought best to put the idea out first.

Best,
Jim

No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.

Such as? I mean, I thought it shouldn't be done. A few people have said no in this thread, some have said yes...

Obsolesce

@Jimmy9008 said in Deduplication on CSV storage:

Such as?

I don't understand what you're asking?

Jimmy9008

@Obsolesce said in Deduplication on CSV storage:

@scottalanmiller said in Deduplication on CSV storage:

@Obsolesce said in Deduplication on CSV storage:

No, as others pointed out. If you want dedupe, do it inside of the VMs on the data volumes there for that kind of stuff, not on the CSV.

Why not on the CSV?

That does it across the board then for everything in it. If that's what he wants, go for it. Just know what you need to do to do it properly. It also depends on a few variables.

Just look over the docs well and know it first.

Sorry, was supposed to quote that. I mean, such as what variables? I've looked at various resources and can't see definitive information about doing this on a csv with VM files and drives...

Not sure if it's sensible. Won't it have an overhead too?

scottalanmiller

@Jimmy9008 said in Deduplication on CSV storage:

Not sure if it's sensible. Won't it have an overhead too?

Yes, but normal tuning makes it only use that overhead when the system is idle. So unless you have some really specific workload that will cause problems that we don't know about, the system docs are pretty clear that the overhead won't impact you.

scottalanmiller

@Jimmy9008 said in Deduplication on CSV storage:

I've looked at various resources and can't see definitive information about doing this on a csv with VM files and drives...

That's because you are looking for "negative" documentation. It's like "can I store my video games on X hard drive." It's a hard drive, of course you can store a video game on it. You can't look for "Civilization VI on WD 4TB Red Drive" because it's not a proper question. The drive is SATA and supports any file system, Civ 6 will go on any file system, ergo, it works. You can't document every use case imaginable like that.

That's what is going on here. The dedupe is block level. Ergo, talking about the file types on it isn't relevant. Hence you are feeling like you aren't getting answers. but knowing that it is block level answers all of that automatically.

dbeato

For example SQL Servers or Exchange cannot have deduplication on them so maybe you don’t have that. I have attempted to enable deduplication on Hyperv Servers and it really break things but I would open a ticket with Starwinds Support and they should be able to advise you on that. Their support is great.

scottalanmiller

@dbeato said in Deduplication on CSV storage:

For example SQL Servers or Exchange cannot have deduplication on them so maybe you don’t have that.

I'm pretty sure that they can. Maybe not inside the VM, but certainly from outside of it. How would they know or be affected? They can't be. If dedupe works, it works for every workload. If it doesn't, it never does. It's an all of nothing system, the workload on top of it can't determine if you can use it or not. It would only be affected by performance.

scottalanmiller

@dbeato said in Deduplication on CSV storage:

For example SQL Servers or Exchange cannot have deduplication on them

Microsoft lists SQL Server as their prime example of "might be good for it, but you need to evaluate your use case" ...

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/install-enable

Screenshot from 2020-06-25 09-15-51.png

dbeato

@scottalanmiller said in Deduplication on CSV storage:

@dbeato said in Deduplication on CSV storage:

For example SQL Servers or Exchange cannot have deduplication on them

Microsoft lists SQL Server as their prime example of "might be good for it, but you need to evaluate your use case" ...

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/install-enable

Well, I would say this is the part with Hyper-V that is so ambiguous

scottalanmiller

@dbeato said in Deduplication on CSV storage:

@scottalanmiller said in Deduplication on CSV storage:

@dbeato said in Deduplication on CSV storage:

For example SQL Servers or Exchange cannot have deduplication on them

Microsoft lists SQL Server as their prime example of "might be good for it, but you need to evaluate your use case" ...

https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/install-enable

Well, I would say this is the part with Hyper-V that is so ambiguous

It's not, it can't be. VDI has massive overlap and can use more aggressive deduplication, but that's all. Dedupe by definition is either always safe, or never safe. There cannot be an inbetween. Not when it runs below the filesystem.

Hyper-V support and safety is never in question. Only Hyper-V for VDI has a specialized tuning option.