W. Curtis Preston | Sept. 3, 2010
The technologies to vastly improve backup and recovery performance and reliability have arrived

In addition, customers more familiar with a tape interface were presented with a very easy transition to backing up to disk. Another approach to creating a shareable disk target is the intelligent disk target, or IDT. Vendors of IDT systems felt the best approach was to use the NFS or CIFS protocol to present the disk system to the backup system. These protocols also allowed for easy sharing among multiple backup servers.

But both VTL and IDT vendors had a fundamental problem: The cost of disk made their systems cost effective as staging devices only. Customers stored a single night's backups on disk and then quickly streamed them off to tape. They wanted to store more backups on disk, but they couldn't afford it. Enter data deduplication.

The magic of data deduplication

Typical backups create duplicate data in two ways: repeated full backups and repeated incrementals of the same file when it changes multiple times. A deduplication system identifies both situations and eliminates redundant files, reducing the amount of disk necessary to store your backups anywhere from 10:1 to 50:1 and beyond, depending on the level of redundancy in your data.

Deduplication systems also work their magic at the subfile level. To do so, they identify segments of data (a segment is typically smaller than a file but bigger than one byte) that are redundant with other segments and eliminate them. The most obvious use for this technology is to allow users to switch from disk staging strategies (where they're storing only one night's worth of back-ups) to disk backup strategies (where they're storing all onsite backups on disk).

There are two main types of deduplication:

          o Target dedupe systems allow customers to send traditional backups to a storage system that will then dedupe them; they are typically used in medium to large data centers and perform at high speed.

          o Source dedupe systems use different backup software to eliminate the redundant data from the very beginning of the process and serve to back up remote offices and mobile users.

Backing up as you go

CDP (continuous data protection) is another increasingly popular disk-based backup technology. Think of it as replication with an Undo button. Every time a block of data changes on the system being backed up, it is transferred to the CDP system. However, unlike replication, CDP stores changes in a log, so you can undo those changes at a very granular level. In fact, you can recover the system to literally any point in time at which data was stored within the CDP system.

A near-CDP system works in similar fashion except that it has discrete points in time to which it can recover. To put it another way, near-CDP combines snapshots with replication. Typically, a snapshot is taken on the system being backed up, whereupon that snapshot is replicated to another system that holds the backup. Why take the snapshot on the source before replication? Because only at the source can you typically quiesce the application writing to the storage so that the snapshot will be a meaningful one.


