Thursday, November 20, 2014

The Basic Fundamental of Data Deduplication

If you work on a computer, at some point you've had a moment of panic. Did I save that last file before I logged off? Most of us have had at least one experience in the past when we discovered to our horror that we lost an hour’s worth of work because we forgot to save changes to a document, or worse, spent an entire morning seeking a file that had disappeared completely. Thankfully, most computer programs now back up your work at timed intervals – you never need to suffer data loss anxiety again.

But that security comes at a price. Your computer or network are backing up and storing vast amounts of data multiple times each day but a large amount of the data being backed up at any time is largely identical to that saved during the prior backup and the one before. Imagine writing a lengthy report and going to the copier to duplicate it every time you make the slightest change. You’d soon be overwhelmed by a tidal wave of paper. This buildup of redundant information is happening in your office every day, eating into your storage and slowing down data transfer across your network. But don’t worry – there’s a solution for this dilemma as well, one you’re probably already using. It’s called data deduplication.

It works like this. Data deduplication programs analyze and compare data and when they come across a section of redundant data, instead of copying each redundant chunk they make only one copy and replace the other occurrences of the chunk with a reference point, essentially indexing a file so that it file retrieves the data identical to the single copy of the data saved. The consequent reduction in file size allows you store data more efficiently. It also makes for speedier file transfers in your network secondary to requiring less bandwidth. This can come into play particularly during disaster recovery, when you want to restore backed-up files quickly and return to normal operations as soon as possible.

There are three options for employing a data deduplication system for your office or network – hardware, software or a combination of both. The hardware option, whether a virtual tape library or NAS (network-attached storage) is a file server usually manufactured for the specific purpose of data storage. Its advantages include generally better compression than software deduplication and a certain amount of user-friendliness – the hardware you purchase, once installed, will handle backup functions automatically.

The main advantage of a software solution for data deduplication is the cost. Deduplication hardware for a small business or a sole proprietor is likely to be cost-prohibitive.

It’s also possible to use a hybrid of both methods. For example, if you have a central office and several branches, it might be worthwhile to consider a hardware solution for the overall backup needs of your company and software for the individual data of each branch that can then be sent across a WAN (wide-area network) to the central storage facility.

How do you choose which solution is right for your company? As with most things, it’ll depend on your specific data needs, your budget, and whether or not it’ll require you to modify your existing storage hardware. Of course, data deduplication may not be right for everyone – a small businessman may be able to back up data on a single flash drive. And if the IT department of your company consists of a single PC, most computers are now sold with a built-in system restore capacity to help you return to functioning in the event of a crash or other data disaster.

Peter Jonathan Wilcheck

Image is property of Microsoft Corporation.

No comments:

Post a Comment