The future of tape
First published in Showreel magazine, December 2007It’s been a while since I wrote about the creation of an uncompressed hard-disk data recorder, and whole years have gone by since I first looked at the SPX-800E P2 camcorder from Panasonic. These are just two of the ways you can currently use a camcorder to create huge amounts of data. I haven’t even touched XDCAM yet, Grass Valley is still trialling Infinity, and we still haven’t really decided what to do with the truly huge amounts of data they produce. At the moment, this is data which is, in a very real sense, homeless. I covered the current and upcoming answers to this problem an issue ago, but I suspect most people already knew which answer is most practical if you need a device right now – magnetic tape. Really serious large-scale data storage looks set to be the last bastion of tape technology, offering an unbeatable (at the moment, at least) price-to-capacity ratio and known archival longevity.
This stuff has been in use more or less since computers and magnetic tape technology coexisted. Everyone’s seen movies such as Tron, with the computer’s rotating open reels of tape. The first application was in 1951, on the Eckert-Mauchly UNIVAC 1, which achieved a princely 7,200 characters per second on a half-inch phosphor bronze ribbon. We still use half-inch tape in modern data types such as LTO and DLT, and the Beta form factor series of Sony video tape formats, among others.
Until comparatively recently, tape remained slow, expensive and technologically difficult to implement, with the very low cost per megabyte its only real attraction. Since the mid-90s, however, tape has begun to approach and then exceed hard disk speeds, although the drives themselves are never going to be incredibly cheap. Many are comparable in complexity to a VTR mechanism and, while the most suitable current formats for our use don’t use rotating heads to produce a helical scan, the newer approaches still involve mechanical complexity. The most interesting recent development, though, is actually in the way these devices are packaged for use.
Traditionally, tape drives have almost exclusively been designed for the SCSI storage-device interface. There’s a lot of variants of SCSI, with varying capabilities, but very few workstation-level computer systems had support built-in. Server-class hardware usually does, but adding it to a desktop computer, or another workstation-scale system such as a hard disk recorder required expensive add-in cards and internal modifications to the machine. This is fine if you’re in a permanently-installed environment with discrete server and workstation hardware, as indeed is the case in most postproduction facilities, but there have always been significant barriers to simply adding a magtape capability to an arbitrary PC, especially at short notice.
But it gets worse. Even once the hardware is physically connected, there have usually been custom software programs to write and read magnetic tape data. There are two main problems with most of the existing systems. First, they’re usually designed to do timed backups of entire hard disks, and there is often no way to direct the system to take a file and write it immediately. Secondly, and most problematically, there is no widely-accepted standard for the way data is formatted on tape. Because of this, the alarming possibility exists that one piece of tape-control software may write files in a manner which is incompatible with another, even if the other system uses identical drives and media.
The solution to all this that’s being offered by Quantum is a tape drive implemented as an external device connectable via an Ethernet network. I think this is an extremely good idea, for two reasons, one of which I believe is actually a very desirable side-effect of the way it’s been done.
First, Ethernet is ubiquitous; you would work hard to find a computer released in the last five to ten years that doesn’t have an Ethernet port. This alone is an extremely persuasive reason to use it; a reason persuasive enough that it would be worth putting up with other inconveniences to achieve it. However, we really don’t have to put up with too many inconveniences. Normal, standard, 100-megabit Ethernet is capable of transferring about eight or nine megabytes per second, in ideal circumstances. It’s considerably slower than most of the tape formats to which this scheme is being applied, but as a fallback in situations where nothing better is supported, it works, and you will get there in the end. More recent computers – of the last two or three years – may even have a gigabit ethernet connection, or can have one added very cheaply. This is, fairly obviously, ten times as fast, fast enough to feed current tape mechanisms at their full speed. This is really good. Basic functionality is ubiquitous; really good functionality is easy to achieve.
But it gets better. The way these devices are implemented is very straightforward – actually, it’s a complete solid-state computer in a box, a computer which happens to have a SCSI interface to the tape drive, and an ethernet port. Nothing technological has really changed; these devices are really just a wrapper layer around preexisting technology. The advantage of this approach is that this little embedded computer runs Linux as an operating system. The way Linux writes files to magnetic tape is about as close as we have to a standard; it is generally straightforward to read any Linux-originated tape archive on any other Linux-based OS. Traditionally, it’s been very hard to write tape archives in this format on any OS other than Linux, which causes serious interopability problems in a postproduction environment that may include Linux, MacOS and Windows machines as a matter of course. The data wrangling load imposed by the need to unarchive tapes on a Linux server then copy material around to where it’s needed is onerous.
The thing is, with these Ethernet-connected devices – correctly termed ‘network-attached storage’ – Windows isn’t writing the tape, Linux is. So now, we have a simple-to-use, embedded black box device which allows non-Linux operating systems to write tapes in a very widely readable format. I don’t think this is intended functionality; Quantum supplies a very competent archive application with the drive which is clearly bent towards working with P2 or XDCAM-originated MXF files. There’s a lot of functionality in this regarding timecode-referencing clips, and it seems very capable; I’m sure Quantum will curse the brevity of my coverage on this, but it’s not really my area – I’m interested in storing big chunks of uncompressed HD, not MXF files. Either way, if you want to write a tape on one of these then give it to a Linux-centric colleague with a more traditional SCSI connected drive, he or she will be able to read it. I don’t think it was done on purpose, but it’s certainly a very desirable side-effect, especially when postproduction houses tend to be set up that way by default.
So, the concept is good. What’s the implementation like? Well, there’s two versions. The desktop unit which I’ve looked at, the SDLT-600A, takes a single tape in a unit maybe 7in square by 16in long. Refreshingly, there’s a minimum of bells and whistles – a hole to stick the tape in, another one for the Ethernet connection, and one for the mains connector. A few status indicators to indicate what the tape loading and status situation is, and you’re done. There is an autoloader, a robotic tape device which will automatically handle access to lots of tapes at once, but I didn’t get a closer look at the Superloader 3A than on the stand at NAB. Quantum is offering similar devices in both DLT and LTO tape formats.
DLT was originally developed by DEC in the mid-80s for the MicroVAX series of computers, and has been one of the two fastest and biggest data tape formats (with Sony’s AIT) ever since. I believe evelopment of further, higher-capacity and higher-speed versions has stopped since the release of the open LTO spec. LTO (at least, the ubiquitous Ultrium LTO) and DLT cartridges are superficially extremely similar, a factor provoked by the desire to allow existing and very expensive robotic tape libraries to be converted from one format to the other simply by switching the drives. I looked at the DLT simply because they had one available, but LTO will be the format of choice for anyone doing long-term archive work (or wanting the faster transfer rate). The devices are otherwise equivalent; it’s a bit-bucket, and the medium the data ends up on is a decision nicely separated from how you supply it.
It’s worth mentioning here that almost all tape drive manufacturers quote both capacity and transfer rate taking into account the compression applied automatically in the drive’s firmware. Quantum is at least honest enough to state the capacity as “300/600Gb” – the tape media has a capacity of 300Gb, enough for just under half an hour of 10-bit 1080p24, ignoring overheads. The compression, a lossless entropy-encoding algorithm, works best with text, figures, or other data of the sort produced by a conventional office whose servers are backed up to tape. That’s probably the lion’s share of Quantum’s business, and even then the oft-quoted 2:1 compression ratio is, let’s say, highly optimistic. The real problem is that the sort of data we’re dealing with is noisy and random and these tricks either don’t work, or can actually increase the size of the output. It is possible to switch compression off and usually that makes sense for high-entropy data such as video and audio or files that have already been compressed. The practical capacity of the SDLT-600A for media files is 300Gb at about 36MB/s and on a gigabit ethernet interface it is capable of achieving that, or close to it, in the real world.
Configuration is relatively simple; it is a standard network device configuration, and there are a lot of people in the world who know how to do that – probably more than those who know how to set up a SCSI device. It’s entirely possible to put the device on a router and let multiple workstations have access to it, but I suspect most users will cable it directly to a workstation. The drive has an IP address and a netmask – figure out which two boxes to put them in, and it works. Even if you haven’t ever set up a network device before, Google will tell you how. It’s something a novice can do from instructions.
The software interface is simply FTP, the File Transfer Protocol, a truly venerable standard which predates the world-wide web as a way of transferring data over computer networks. As I say, Quantum supply an application which supports file transfer and queries the drive for additional status information, but I wanted to see whether I could simply address the drive using any standard FTP client. Windows comes with two, one built into Internet Explorer and a commandline version, but there’s dozens out there. I must report that I had some trouble with the Explorer-based client, but it’s known for being a bit strange at the best of times and I wouldn’t necessarily blame the drive. Commandline FTP worked fine, and reported that file transfer was occurring at or very near the theoretical maximum offered by the drive mechanism.
Ordinarily, FTP will present the remote device to you as if it were a hard disk – just another window full of icons representing files and folders, and it is so here. Soon, other than the frenetic whirring of the tape drive, you forget that you’re using a tape archive. Great. Marvellous – didn’t even have to open the manual. There are a couple of issues with this emulation of a disk, though. You can’t recover space by deleting already-written files which are some way back down the tape, and it can be slow to start recovering a file because, clearly, it has to spool up and down the tape to find what you’re after. Both these problems are inherent to a linear medium; it’s tape, it writes things one after the other on a long monodimensional string, so you can’t really call it a fault; that’s how it works, that’s what it is. Frankly, it’s so much easier to use than the commandline-driven tape archive control often found on Linux systems that it’s a breath of fresh air regardless.
There are a couple of more serious caveats. The first is really rather unfortunate – upcoming versions of LTO are fast enough that they will saturate a gigabit ethernet link, meaning the tape drive can’t work at full speed. Many modern PCs support teamed links, that is a pair of network connections working together to share load, and I hope that Quantum will be able to implement either this or some faster alternative networking scheme to alleviate the problem. The issue with this is that you can go to fiber channel, which is twice as fast as gigabit ethernet, or you can go to ten-gigabit ethernet, but you’re then back to needing a specific and perhaps expensive hardware add-on to use it. This is still better than the SCSI option; you have the convenience of FTP access to magnetic tape, and you still have the standard format written to tape, but it’s a bit of an uncertain upgrade path.
Caveat two is that in many situations where you’re storing frame sequences, you will put a given number of frames in a TAR, or Tape Archive – that is, a simple, non-compressing wrapper which holds a whole load of files together as one. That’s fine, until you realize that the tape device also puts whatever files you send it into a TAR wrapper of its own, so if you do try recovering the data on a Linux machine you get a TAR wrapped in a TAR. I put this to Quantum at NAB back in April and they’ve promised a software update to fix it. It’s not the end of the world, anyway; just a bit time-consuming to unwrap and it could cause a bit of a data-wrangling problem.
If I really wanted to whine, I’d complain that the drive certainly does – it’s noisy. You certainly wouldn’t want to have one on set with you archiving your material as you shot, but then, if you want to be able to record data to tape at tens of megabytes per second, that tape has to be moving fairly fast. I’m sure it could be made quieter if you were willing to pay a lot of money for precision-engineered tape cartridges, but that would rather defeat the object of a low cost per MB.Otherwise, great. I’d like to play with the multi-changer robot. Ordinarily, writing software to reliably control tape robots can be quite complicated, but I have it on Quantum’s authority that in this case each tape appears on the FTP server as a directory numbered with the serial number recovered from each tape’s RFID chip. This, again, is an extremely good idea and means that simple scripted software can easily control and track what data is where. Perhaps we’ll be able to have a closer look at one of these in the future, because in really upscale circumstances, shooting uncompressed HD images, you really need multi-tape archive creation to be an automatic, unattended process.