NetApp: Deduplication ASIS and VMWare
Man, where did this come from? I used to rule out NetApp deduplication due to the costs of the licenses, but now it is free (if you have one of their NAS). Also, the max volume size that NetApp can dedupe is 1 terabyte for the model we are on. A coworker of mine enabled deduplication on a VMWare ISCSI volume recently. Turns out this option could be great for small to mid sized companies that can not afford to have both an EMC and NetApp. Also as a cost effective way to save money on storage by offloading lower priority servers or VMs to the ASIS volumes
Below is information pulled from the NetApp showing two volumes that are being deduplicated
NAS> df -sh
Filesystem used saved %saved
/vol/users/ 246GB 68GB 22%
/vol/vmware/ 96GB 99GB 51%
Lets see the status of the service on the VMWare volume
NAS> sis status
Path State Status Progress
/vol/vmware Enabled Active 35 GB Scanned
Current VMWare volume utilization
NAS> df -h vmware
Filesystem total used avail capacity Mounted on
/vol/vmware/ 950GB 147GB 802GB 16% /vol/vmware/
Here is the sis configuration and status of the VMWare data store
NAS> sis status -l
Path: /vol/vmware
State: Enabled
Status: Active
Progress: 38231884 KB Scanned
Type: Regular
Schedule: sun-sat@0
Last Operation Begin: Thu Oct 23 00:00:07 EDT 2008
Last Operation End: Thu Oct 23 00:03:59 EDT 2008
Last Operation Size: 2405 MB
Last Operation Error: -
As we can see, the last run took almost 4 hours
The first volume is a CIFS (Windows) user shares containing home directories and whatever data they so desire to store there. Deduplication is low on here due to the majority of the data not being the same thing just stored many times. The VMWare share has about 9 windows and Linux virtual machines running there. This volume is getting a lot better compression due to most of the data being the same system files that are needed for the operating systems to run. Deduplication is great for this cause, meaning that each of those files only need to be stored once. We will be off loading more of the non critical and development virtual machines to NetApp ASIS (deduplicated) volumes once it is on fiber. The compression will continue to increase as the number of VMs running within them increase. From the looks of it, we should easily be able to get 80% or more deduplication rates! In the long run using this feature will save a lot of money on the number of drive trays needed.
Note: One downfall, ASIS is not real time deduplication. It is ran on as schedule instead of as the data is coming in, unlike a Data Domain.
Just goes to show that you shouldnt believe all of the vendors putting down the competitors products or capabilities. I do not think that the ASIS is the answer to all problems, but it can definitely help smaller companies. It is also a lot easier than adding on an appliance like EMC.
10/27/2008
Getting better:
NAS> df -sh /vol/vmware/
Filesystem used saved %saved
/vol/vmware/ 215GB 306GB 59%
11/15/2008
NAS> df -sh /vol/testVol
| Filesystem | used | saved | %saved |
| /vol/testVol/ | 519GB | 754GB | 59% |

any performance impact on the VM’s while deduplication was running?
oscar said this on November 4, 2008 at 12:50 pm
Nothing really noticeable, but it is currently running only about 13 VMs that are non-critical. I don’t think there would be a big hit unless we were deduping full terabyte volumes. Also, the volume is spread over multiple smaller 134 15krpm spindles, so that also increases read/write performance
kcollo said this on November 4, 2008 at 1:40 pm
Nice write up.
colinmcnamara said this on December 30, 2008 at 1:27 pm
Which one are you using? ISCSI or CIFS for the storage in VMWARE to store the VMs ?
Steve said this on January 19, 2009 at 9:57 am
At time of posting, we were doing both ISCSI and Fiber Channel. As of now, all of the VM storage has been moved to Fiber. This has not affected the deduplication rates at all. We also do deduplication on our CIFS shares for Windows users directories. NFS would be a lot better than CIFS to store VMs if you had to chose between one of the NAS(network) protocols and not block level (ISCSI, Fiber Channel).
kcollo said this on January 19, 2009 at 10:38 am