Data Domain
I have been managing Data Domain units for about a year now. Currently, there are 3 running on the network. Initially, some issues came up that made me a little “weary” of them. First was that we had a few occurrences of one unit disabling its file system. The biggest problem was when it hit a “less than 1 percent chance” error. The hash function used in the RAID software to isolate corruption had a problem slip through. This rendered the unit worthless for about 3 days worth of debugging from Data Domain. Since then, there have been no major problem with either 3 of the units. Also, the compression that we are getting out of the units is amazing. The following information is coming from a report I did when we were only running Data Domain 510s. Backups started to surpass the designated backup window and this is the argument for the upgrade.
|
3.75 TB Raw |
|
1.95 TB Usable |
|
290 GB/hr throughput |
Problems
- We are maxed out in capacity (number of disks) on the DD510
- Currently using ~70% of the available storage
- Can not meet a backup window of 8hrs
- We are maxing out the processor during the backup window
Backup window calculations
Using the last 24 hour pre-compression write rate, we are pushing 2810.8 gigs of data to the Data Domain. Per DD benchmarking and documentation, the max throughput for the DD510 is 290GB an hour. Using an 8 hour window, the calculations would be as follows:
2810.8/8 = 351.35 GB /hr
This shows that we are already trying to push more data than the DD 510 can handle in the time window we are trying to achieve. Below is from Data Domains documentation showing a comparison between their models
Future expansion
Plans are in the works for the IT department to implement NetBackup for Exchange and also move their active directory backups to the Data Domain. Forecasting of the space required for exchange backups would need to be ~1000 gigs. Currently, the Exchange data store is around 350 gigs. Added into the current setup that would be
3160.8/8 = 395.1 GB /hr
That would put us over the limit of the DD530 appliance and up to the DD565 in just throughput alone. The DD565 would give us the ability to scale up to 3 shelves of disks, whereas the DD510 and 530 can not expand beyond one enclosure (shelves).
Conclusion
Best case for achieving a backup window of ~8 hours or below would be to purchase a DD565 for use at the primary site. This would allow for over double the hourly throughput and up to 6 times as much raw storage capacity. The costs and compression capabilities of using a Data Domain system in comparison of an EMC or NetApp solution seems to be the most financially reasonable route. I do hear that EMC Avamar is coming down in price! Also, NetApp free deduplication license for the NetApp does reasonable compression as well. Although with NetApp, that means owning one first and that would cost a lot more than the Data Domain.
Since this initial writing, there is a new Data Domain 565 in the process of being deployed. Once this is up and running, I will update the information here with the backup window results.
Note: This is destination based agentless deduplication, not source based. All the data is pushed from the servers, through the network, and to the datadomain. The data is then deduped on the Data Domain. EMC Avamar is agent based, allowing the server to only push the changed blocks or data to the Avamar deduplicaiton appliance. This will definitely take a load off of a network. Only problem with that is you have to install agents on all servers, as well as update that software. You better have at least a true gigabit network when pushing upwards of 5 terrabytes a night to the data domain. All in all, I am very happy with data domain and recommend them.
Update: Per a conversation with a Data Domain sales team today, they show in the labs that restore speed (read from the Data Domain) is ~85% of the write speed to the device.
So, if a backup takes 1 hour, retrieval of that data should take approximately 1 hour and 11 minutes (70.588 minutes). Now, this does not take into account the overhead of the application doing the restore. Hope I did my math correctly.



