Want to protect research data? Go multi-tiered

Network attached storage (NAS) provides campus researchers with a cost-effective way to store, share, and back up their work securely, but it should be part of a multi-tier storage and recovery strategy.

storage-NAS-tieredHow many researchers have mislaid a USB hard drive and had that sinking feeling that months—or years—of work had simply evaporated? Yet, despite the risks, USB drives remain a primary storage mechanism for many campus researchers who often want to maintain physical possession of their work. Fortunately, the days of such a laissez-faire approach may be numbered given the enormous improvements in the reliability, ease of use, and security of network attached storage.

NAS comes in two flavors: desktop appliances and rack-mounted systems. Both are scalable and capable of holding close to 100 TB of data. The smaller systems are often for personal use, but the range of available NAS is staggering, capable of meeting the storage needs of departments, campuses, and even as backup for an entire university’s SAN. The smaller desktop systems are generally easy to install and manage without expert IT support, and they connect to an existing network—wireless or otherwise—via an Ethernet cable. Computers and other devices on the network can then be set to automatically back up their contents to the NAS.

Because they’re on a desktop or mounted in a rack, NAS don’t go walk-about like USB drives sometimes do. Further, NAS can be configured in a RAID to provide built-in redundancy. “In contrast, the reliability of an external USB drive is relatively low, especially if you move it around a lot,” said Brian Chee, an IT specialist in the dean’s office of the School of Ocean and Earth Science and Technology at the University of Hawaii at Manoa, which has implemented NAS from Taiwan-based Synology.

Like Dropbox for Storage

But the advantages of a NAS over a USB drive extend well beyond reliability. With a NAS, users can establish both private and public storage areas, in much the same way that Dropbox allows files and folders to be shared or restricted. While a computer with an external hard drive can also be configured to allow sharing, performance tends to take a significant hit, especially when the amount of data is large.

This was the problem facing the Department of Materials at Oxford University, for example, where a single experiment often generates 100 GB of data. As a solution, the department installed a rack-mounted Synology DS3612xs NAS connected to four HPC workstations.

“Now, at any given time, two to three researchers can be working on different parts of the data, conducting their own analyses in parallel,” said Mahmoud Mostafavi, James Martin Fellow at Oxford University. “Shared access increases the speed with which the whole process is completed.”

A cloud setup can offer a similar collaborative environment, but it too cannot match the speed of a NAS. “Bringing data from the cloud into a fast storage array for a super computer is not very quick,” said Chee. “Having a NAS provides you much quicker ways of moving data around within your local area network.”

Cost is also a factor from a networking standpoint, especially if schools want to implement a virtualized environment. “Because the NAS is able to work over existing 1gbps or 10 gbps Ethernet networks, a lot of switching infrastructure doesn’t have to change—you don’t need to purchase a dedicated fiber optic network just to run the storage,” said Brian Kirsch, network program chair at the Milwaukee Area Technical College, which uses a Synology NAS to teach networking with storage to its students. “As a result, you are able to buy a unit that can run a virtual environment without some of the high initial capital costs. It allows customers to get into the market to see the benefits of virtualization and shared storage.”

The Importance of Offsite Backup

Despite these advantages, Chee is adamant that NAS should be only one element in a multi-tier strategy to keep data secure. He knows personally the dangers inherent in having data stored in only one location. In October 2004, a huge wall of water swept through campus and destroyed 30 years’ worth of research at the school of medicine as well as a national archive in the basement of Hamilton Library. “A large amount of irreplaceable research was lost,” he recalled. “I was one of the schmucks who was ankle deep in mud in the basement of Hamilton poking around with broomsticks trying to find servers.”

As Chee pointed out, a NAS is just as susceptible to a disaster. “A lot of researchers have this unfortunate misconception that a RAID 5 or RAID 6 or RAID 10 within a NAS is good enough,” he said. “That’s been the hardest education piece I’ve had to deal with.”

While Chee uses a complex archiving and storage system that involves Linux machines, a Synology NAS, a Primera Bravo Blu-ray disc publisher, and Amazon Glacier cloud storage, he advises schools at minimum to ensure that they have redundant offsite storage. “I’m trying to get people to do a combination of NAS—for backup, higher reliability, and high-access backup so you can get into archives easily—with offsite storage, either in the cloud or an offsite data center,” he said.

With these systems in place, Chee now sees researchers’ insistence on backing up their data to a USB drive as icing on the cake, although it’s not a wise strategy if the data is sensitive or a likely target of theft. Indeed, in light of recent stories about Chinese hacking of university systems, more and more researchers are giving the sophisticated encryption capabilities of some NAS a second look.

Chee, for example, has started using a Synology NAS to encrypt the engineering data associated with the ALOHA Cabled Observatory. “Anything that is considered sensitive—anything that we don’t want to see in the wrong hands—is encrypted automatically,” he explained. “I then back that encrypted file to Amazon Glacier.”

Given the benefits of NAS within a multi-tier storage and archiving strategy, it may seem surprising that their use is not more widespread among university research departments. The reason, says Chee, is that NAS is only now recovering its reputation after early teething pains.

“Traditionally, NAS have been great for desktop use, a pain to scale and, if something goes wrong, they’ve been a pain to recover,” he said. “That has changed. The software that has been developed, particularly by Synology, has changed the minds of a lot of researchers. It’s no longer as hard to implement, it’s no longer hard to set up remote access, and it’s no longer hard to expand.”

Andrew Barbour is a contributing editor with eCampus News.