The following are the recommendations to get the maximum data deduplication ratios with the VTL data deduplication feature

  • Disable software data compression by the backup application
  • Enable hardware compression for the tape drive to maximize space savings
  • Disable multiplexing of backup streams
  • Disable encryption of backup streams
  • Disable data deduplication by the backup application itself

Tape Block Size

The block size used to write to tape plays an important part in deduplication ratios achieved. The simple rule is that larger the block size the higher the deduplication ratios. The following lists the default block size used by different backup applications. NOTE: For certain backup applications a newer block size will only be picked up after a relabeling of the virtual tape

Symantec NetBackup

The default block size used for tape is 64K. This is not optimal for data deduplication and can be increased to 256K by specifying SIZE_DATA_BUFFERS as 256K

IBM Tivoli Storage Manager

The default block size used is 256K and is optimal for data deduplication

IBM iSeries BRMS

The deduplication engine is content-aware and tries to identify file data, database block etc. within a backup stream. However the deduplication engine is not aware of the backup format for iSeries backups. Based on experience it has been observed that by splitting the backup stream into 4k blocks for iSeries backups a decent deduplication ratio can be achieved.

To set the dedupe block size for the VTL

/quadstorvtl/bin/vtconfig -m -v <vtl name> -b <dedupe block size> 

Dedupe block size can be one of 0, 4096, 8192, 16384, 32768, 65536, 131072 or 262144. Setting the block size to 0 will revert to the original behavior of trying to scan the data for dedupe points.

For example

/quadstorvtl/bin/vtconfig -m -v VTL1 -b 4096 

4096 block size will give the best possible deduplication of iSeries backups. On the down side, if the VTL is unable to find duplicate blocks or the deduplication ratio is less then this block size will lead to a big drop in read/write performance

Another point to consider is that the compression ratios of iSeries backups are usually. It might be worthwhile to backup to vcartridges in non-dedupe storage pools which will give a consistent read/write performance along with very high space savings due to compression. 

Computer Associates ARCServe

The default block size used for tape is 64K and is not optimal for data deduplication. This can be changed to 256K as described in https://support.ca.com/cadocs/0/CA%20ARCserve%20%20Backup%20r16-ENU/Bookshelf_Files/HTML/tapelibr/index.htm?toc.htm?ag_tl_cfg_lib_2_func_as_vtl.htm

CommVault Simpana

The default block size used for tape is 64K and is not optimal for data deduplication. This can be increased to 256K in the "Data Path Properties" of a library. For example see "Increasing Block Size" section in http://documentation.commvault.com/hds/v10/article?p=features/performance_tuning/tunable_parameters.htm

Symantec Backup Exec

The default block size used for tape is 64K and is not optimal for data deduplication. This can be changed for each drive by selecting its properties and changing the "Block Size" to 256K

HP Data Protector

The default block size used is 256K and is optimal for data deduplication

EMC NetWorker

The default block size used is 128k which is good but may be increased to 256K for optimal data deduplication

Dell/Quest Netvault

The default block size used is 32k which is not optimal for data deduplication. This can be changed for each drive by selecting its properties and changing the "Block Size" to 256K