Digitising a few old media clips, how difficult can it surely be?
Broadcasters across the globe have been given the task of digitising serious amounts of their video archive footage for reasons of preserving digital heritage or creating new revenue streams. It is quite possible those archives contain many hidden gems currently rotting away on many different tape and archive formats collected over a lifetime.
An interesting blog post (Going Bollywood Oct 7, 2007 – By Jonathan Schwartz, then CEO, Sun Microsystems) described the business case where a 50 year movie, stored on 35mm film, was pulled from the archive and digitised. After the release of the digitally remastered version, the DVD rose to number 8 on the Amazon best-seller list. The cost of production? Near $0. The effort was nearly pure profit.
A typical digitisation project can run tens of thousands of hours that, for example, can be over a continuous year’s worth of valuable output. This is big data.
So what to consider when taking on a project of this nature:
- How do I digitise?
- What codec do I utilise to protect the ‘context’ and ‘authenticity’ of my media clips?
- How do I store and protect it?
- How do I ensure it is quality checked (QC)?
- Do I need to re-grade it and add colour correction?
- How do I and others easily find the content once digitised?
- How do I preserve media content for the long term once it has been digitised?
- How much will this cost?
Like many projects, it just depends on . The number one question: What is the business reason for digitising the archive? The business drivers will ultimately affect the choice of technology, workflow, codecs, and the ROI will determine the likely available budget.
How to digitise?
A ingest appliance is required, preferably one that can take base band ingest feeds and provide the ability to create two resolutions of the content to be archived.
A project we worked on recently, proposed an ingest platform that can digitise from VHS into two separate file formats. One low-resolution proxy (H264) and one high-resolution file. The high-res files will be treated as the ‘preserved media’ while the proxy is used for searching and browsing capabilities that can then be used with a search application such as a Media Asset Management (MAM) system.
So what about a choice of codec?
Generally speaking, non-proprietary/multiple source codecs are recommended, and the same with wrappers that contain the codec – for example both MXF and Quicktime are both very well documented. Conceivably in the distant future, anyone could write code to extract the video from long-lost archive files that have been wrapped in one of these formats, starting purely with the documentation. The most future-proof approach to storing video is in uncompressed format – this is usually impractical for cost and technology reasons. So, compression of some sort is usually employed, and in terms of codec examples, here are some recommendations:-
SD – depending upon the quality of source, DV25 (off VHS or similar consumer quality stuff) or IMX50 from a “broadcast quality” source
HD – AVCi100 is becoming a broadcast standard for HD – decent amount of compression balanced off against reasonable file sizes
Film – depends on input source and quality. Film scanners will probably produce image sequences, maybe something like DPX. These could be stored without change – e.g. as a Quicktime DPX sequence with associated sound; or to save space, it is possible to keep feature quality files as DPX sequences. One can then transcode to an AVCi100 or even SD copy for actual display and editing wit a lower quality film being stored only in HD. JPEG2000 is also a viable alternative, open standards alternative to DPX, and is popular in some archiving scenarios.
When people refer to 2K or 4K, they generally mean image sequences – at the time of writing, there aren’t any specific compressed codecs (yet) that are really capable of handling those sorts of images directly at the kind of quality required.
So what about physical storage?
Again, there are many choices and is driven by the business reason. Do all the media files need to be online and instantly available?
Alternatively, can restores be done in a timely manner from an LTO tape that resides on a shelf?
Does the archive need to be kept in perpetuity?
Should I use the cloud to store my digital archive like an Amazon Glacier ?
Whatever happens, content must be protected and secured – unless you have a minimum of two copies of every media and metadata file, you do not have it.
With big data, trying to manually manage your content is nigh-on-impossible. An automated system is necessary to manage and protect content. Digital preservation and data management are not just about multiple copies, but it is also mitigating against future scalability issues and technology obsolescence. If files become corrupt or lost, then recovering them should be easy and preferably automated for high availability.
The aforementioned proposal uses a commercially off the shelf ingest platform to digitise the files which, in turn, are written to MatrixStore. From the MatrixStore a serious of QC checks are done. On completion, after a week, the MatrixStore will then move content to LTO5 tape. For this business, it was felt that having all the high resolution on disk was not required. However, being able to find content (the proverbial needle in the haystack) was most definitely required and have the proxies instantly browsable on the MatrixStore. To begin with, a fully blown MAM system was not feasible due to budget constraints, but they had a tactical issue they wanted to solve in terms of searching for content, and MatrixStore with DropSpot allows that.
As an object-based storage device, as well as protecting the content with multiple copies and being able to write content to tape, MatrixStore stores metadata with the media clips and allows it to be searchable. Using the client tool DropSpot, content can be found even if it has been moved off to tape. DropSpot will even supply the barcode number of the tape that contains the requested media clips.
The proxies will also remain on disk permanently allowing low res content to be continually available to be browsed as if it’s on local disk, but still shareable across the network to many concurrent users.
So what about the cloud? Media clips require lots of bandwidths – you need a solid Internet connection, something that is not always available and can be expensive, especially given the bandwidth that media files require. Can you trust your cloud provider to always be there to serve you your content? What happens if they go out of business? The alternative? Build your own private cloud. Many new MAM systems can be retrofitted to this workflow, giving a browser type interface and offering a cloud type service. The difference being you host it, and if any partners in your workflow should be bought out or disappear, you have mitigated the risk of loss.
In terms of storage-specific data management and digital preservation including how to mitigate against hardware obsolescence, then please refer to an earlier blog I did hear a few years ago, all of which is still very relevant.
As with the true nature of this subject, once you scratch the surface, it always morphs into something much bigger than most people originally expect. I mean, digitising a few old media files:- How difficult can it surely be?
|Digital Preservation Storage
|Keep all proxies browsable
|Move high res content to tape (LTO)
|High Availability (99.999%)
|Automatic backup /recovery
|No hardware lock-in
|Stores files in open and non- proprietary format
|Data Migration facility
|Easy to administer
|Integration with MAM and Ingest Platforms
If you would like more information on how to implement a management free nearline storage platform that scales to multiple petabytes then please contact us.
Author: Mark Andrews, Head of EU Sales, Object Matrix
I would like to acknowledge and thank the help of Steve Sharman of Mediasmith’s for his valuable input on codecs and wrappers.