Top 5 Reasons why Erasure Coding and Media Storage don’t Mix well

Erasure Coding & Media Storage

There are two basic choices when using Object Storage for how objects are kept. One is Erasure Coding, the other is Multiple Instances.

Erasure-coding:

An object is split across multiple nodes such that should a subset of those nodes be available, the object can be rebuilt / read. E.g., you might have a “5+2” algorithm on 7 nodes. This would mean that so long as 5 nodes are “up” you could read your object. It would also mean that the overhead is 140% (where 100% means storing the object without any protection). However, there’s a problem, if a node is down, can you still write new objects? (7 nodes aren’t available), and two disks failing at any one time is relatively likely on large systems with 100s of drives – mean time to recovery is too slow – therefore most such systems don’t really work well till you have 10+5 type protection (ok, 150% overhead but requiring 15 nodes to work as intended).

Multiple Instances:

An object is kept as a contiguous object on a node. However, more than one node will keep the data. In the case of Object Matrix’s product MatrixStore, by default we keep data on multiple nodes, RAID6. The overhead is typically 220% – that’s obviously not as good as 150%. However, a massive 6 disks would have to fail before data is lost, replacing disks and rebuilding is hot swap and there are other advantages we will look at in this article.

Every object storage solution is configurable, the below are generalisations but we often get asked:

Why use Multiple Instance (MI) setups for media storage?

  • Reason 1: Small Object Storage Systems Must Use MI

    Let’s face it – not everyone wants/needs 15 nodes! MatrixStore runs with 3 (5 nodes just fine. And if one fails, objects can still be written or read. But why would someone want a small object storage system? 100s of reasons; they want to take advantage of automated search, metadata extraction, HSM, replication, hybrid workflows, data protection policies… Ultimately, you can start but expand out to 100s of Petabytes without needed to reconfigure anything in the software.

  • Reason 2: Large Object Storage Systems Save Space with MI

    When you’ve got a large object storage solution, chances are that you are going to start thinking about dual (or more) location for disaster recovery and business continuity. At that stage, your protection is in the storage ring – perhaps one copy in London another in New York. With multiple instances, you just keep one copy in each location – with Object Matrix that’s at a 120% overhead or less (12 drives RAID6’d) and the system can auto restore from the remote location if there’s a local issue.

  • Reason 3: Media Files Need Random Access

    When an Object is not kept contiguously and needs to be reconstructed for random access, that’s a very expensive operation indeed. If you want to random access your media files, we’d say there’s no faster way than keeping an object contiguous on a fast RAID array. This can also be very true for operations such as partial restore of media files.

  • Reason 4: Media Libraries need (AI) Analysis!

    So, you’ve taken the major step of extracting your media archive from your LTO solution on to an object storage solution so that you can have faster access to that archive. The next thing that happens is that you get some fancy new AI algorithm to analyse your video for search. Analysing video means reading every single byte in your library, sometimes with random access (see reason no. 3). With Erasure Codes at this point: firstly most systems will require all those objects to be reconstructed, read to somewhere (outside of the object storage box) and analysed, before the metadata is stored into another database. This is all around painful. With Object Matrix many optimisations can be made. For a start, generally reading a contiguous file is far less CPU hungry than reconstructing objects, secondly metadata can be stored in searchable internal DBs, and lastly, you have the option to work with Object Matrix to actually run that analysis inside of the object storage.

  • Reason 5: Future Proof through Flexibility

    Once you start using erasure coding across a wide number of nodes, with CPUs of various speeds and different performances of those nodes across time (e.g., if you scale up your solution in a few years), then simplicity of solution is a major benefit. With multiple instances of objects on multiple nodes, you don’t really care that one node is slower or faster, you can take advantage of new hardware generations quickly and you don’t have a proprietary algorithm that you must go through (equals vendor lock-in in many cases) in order to access your data. KISS (“keep it simple stupid”) applies. Object storage is a beautiful building block to build robust, secure, future proof data stores, but even a beautiful building brick can make a wonky house if used wrongly.

About The Author

Jonathan Morgan is founder and the CEO of Object Matrix. Prior to Object Matrix Jonathan led the largest development team on EMC’s Centera product. At EMC Jonathan helped to design and implement CPP – content data protection – arguably the world’s first object storage erasure code solution.