Digitise your archive of baseband and file based content

Digitise Object Matrix

More organisations need to digitise legacy archive content they own but cannot use or share. Read on for best practices in storage, digital preservation and content distribution.

Digitising a mountain of legacy content stored on a myriad of different tape or file formats, how difficult can it be?

Organisations who have generated video content have been missing out on opportunities to share, distribute and monetise that content as they simply cannot get access to it. It is quite possible those archives contain many hidden gems currently rotting away on many different tape and archive formats collected over a lifetime.

A typical digitisation project can run tens of thousands of hours that, for example, can be over a continous year’s worth of valuable output. This is big data.

So what to consider when taking on a project of this nature:

  • How do I digitise?
  • What codec do I utilise to protect the ‘context’ and ‘authenticity’ of my media clips?
  • How do I store and protect it?
  • How do I ensure it is quality checked (QC)?
  • Do I do it myself or are there companies that offer this service?
  • How do I and others easily find the content once digitised?
  • How do I preserve media content for the long term once it has been digitised?
  • How much will this cost?

Like many projects, it just depends. The number one question: What is the business reason for digitising the archive?

The business drivers will ultimately affect the choice of technology, workflow, codecs, and the ROI will determine the likely available budget.

A ingest appliance is required, preferably one that can take base band ingest feeds and provide the ability to create two resolutions of the content to be archived.

A project we worked on recently, proposed an ingest platform that can digitise from VHS into two separate file formats. One low-resolution proxy (H264) and one high-resolution file. The high-res files will be treated as the ‘preserved media’ while the proxy is used for searching and browsing capabilities that can then be used with a search application such as a Media Asset Management (MAM) system.

Generally speaking, non-proprietary/multiple source codecs are recommended, and the same with wrappers that contain the codec – for example both MXF and Quicktime are both very well documented. Conceivably in the distant future, anyone could write code to extract the video from long-lost archive files that have been wrapped in one of these formats, starting purely with the documentation. The most future-proof approach to storing video is in uncompressed format – this is usually impractical for cost and technology reasons. So, compression of some sort is usually employed, and in terms of codec examples, here are some recommendations:

• SD – depending upon the quality of source, DV25 (off VHS or similar consumer quality stuff) or IMX50 from a “broadcast quality” source
• HD – AVCi100 is becoming a broadcast standard for HD – decent amount of compression balanced off against reasonable file sizes
• Film – depends on input source and quality. Film scanners will probably produce image sequences, maybe something like DPX. These could be stored without change – e.g. as a Quicktime DPX sequence with associated sound; or to save space, it is possible to keep feature quality files as DPX sequences. One can then transcode to an AVCi100 or even SD copy for actual display and editing with lower quality film being stored only in HD. JPEG2000 is also a viable alternative, open standards alternative to DPX, and is popular in some archiving scenarios.

When people refer to 2K or 4K, they generally mean image sequences – at the time of writing, there aren’t any specific compressed codecs (yet) that are really capable of handling those sorts of images directly at the kind of quality required.

Again, there are many choices and is driven by the business reason.

Do all the media files need to be online and instantly available?
Alternatively can restores be done in timely manner from a LTO tape that resides on a shelf?
Does the archive need to be kept in perpetuity?
Should I use the cloud to store my digital archive like an Amazon Glacier?

Whatever happens, content must be protected and secured – unless you have a minimum of two copies of every media and metadata file, you do not have it.

With big data, trying to manually manage your content is nigh-on-impossible. An automated system is necessary to manage and protect content. Digital preservation and data management is not just about multiple copies, but it is also mitigating against future scalability issues and technology obsolescence. If files become corrupt or lost, then recovering them should be easy and preferably automated for high availability.

The aforementioned proposal uses a commercially off the shelf ingest platform to digitise the files which, in turn, are written to MatrixStore. From the MatrixStore a serious of QC checks are done. On completion, after a week, the MatrixStore will then move content to LTO5 tape. For this business, it was felt that having all the high resolution on disk was not required. However, being able to find content (the proverbial needle in the haystack) was most definitely required and have the proxies instantly browsable on the MatrixStore. To begin with, a fully blown MAM system was not feasible due to budget constraints, but they had a tactical issue they wanted to solve in terms of searching on content, and MatrixStore withDropSpot allows that.

As an object based storage device, as well as protecting the content with multiple copies and being able to write content to tape, MatrixStore stores metadata with the media clips and allows it to be searchable. Using the client tool DropSpot, content can be found even if it has been moved off to tape. DropSpot will even supply the barcode number of the tape that contains the requested media clips.

The proxies will also remain on disk permanently allowing low res content to be continually available to be browsed as if it’s on local disk, but still shareable across the network to many concurrent users.

So what about the cloud? Media clips require lots of bandwidth – you need a solid Internet connection, something that is not always available and can be expensive, especially given the bandwidth that media files require. Can you trust your cloud provider to always be there to serve you your content? What happens if they go out of business? The alternative? Build your own private cloud. Many new MAM systems can be retrofitted to this workflow, giving a browser type interface and offering a cloud type service. The difference being you host it, and if any partners in your workflow should be bought out or disappear, you have mitigated the risk of loss.

In terms of storage specific data management and digital preservation including how to mitigate against hardware obsolescence, then please refer to an earlier blog I did hear a few years ago, all of which is still very relevant.

As with the true nature of this subject, once you scratch the surface, it is always morphs into something much bigger than most people originally expect. I mean, digitising a few old media files:- How difficult can it surely be?

MatrixStore Benefits

  • Digital Preservation Storage
  • Keep all proxies browsable
  • Move high res content to tape (LTO)
  • Stores files in open and non- proprietary format
  • Hugely Scalable
  • No hardware lock-in
  • High Availability (99.999%)
  • Automatic backup /recovery
  • Searchable
  • Data Migration facility
  • Easy to administer
  • Integration with MAM and Ingest Platforms

Find out how Object Matrix can help you and your organisation to resolve your storage challenges:

More Information