S3 Object Storage
At Object Matrix, we provide hybrid storage for media workflows, with the great benefit of making the solution manageable from a “single pane of glass”. Customers can therefore easily manage their data and where that data moves, whether that data is on-premise, or on the cloud.
A big part of our cloud integration is working with AWS’s (Amazon’s) S3 Object Storage which has led us to understand at a deeper technical level just what you can and can’t do.
This of course is a changing environment, for instance AWS has recently unified different storage classes under the same umbrella: S3 Object Storage.
In this blog, I will share some of our experiences dealing with S3 Object Storage classes and their APIs.
Data or Object Management
The unification of the different storage classes has brought several advantages to our applications and services. The most important to us are:
- Facilitates the movement of data between tiers
- The new APIs allow direct access to the data to and from the different tiers
However, one of the things we noticed is that what moves seamlessly between tiers is the data, not the metadata. Apparently, Glacier classes do not support any additional metadata for the archives except for an optional archive description.
This means that the metadata has to be kept either on the other AWS tiers or on the client side.
We would guess that there must be an intrinsic limitation in managing metadata by Glacier due to the backend technology but that means this is an important aspect to consider if you want to manage the full lifecycle of an object: Where to keep the metadata safely? And, how to keep the metadata associated with the content, e.g., even if the content is moved to another technology?
At Object Matrix, we consider objects in all scenarios and fundamentally believe that the metadata should always be kept with an object regardless of where the object is moved to. We’ve seen solutions that, e.g., keep the metadata in a database. That disables the transportability of the object (separately from the database) and can result in a form of vendor lock-in.
Some AWS S3 storage APIs support user and application metadata. However, there are some limitations around metadata in AWS S3 that are very relevant in media workflows:
- It’s not possible to manipulate metadata entries independently
- S3 objects metadata is limited roughly to 2KB in size
These limitations may not be important for generic applications but in media workflows, metadata can be large and it’s as important as the data or more.
We have found, maybe due to those limitations, that many companies use S3 for data-only movements, no metadata is stored with the objects. Basically, they use an object API to access just the data.
At Object Matrix, we are strong users of S3 APIs at both the client and backend sides. However, the way that the S3 metadata storage in that area works means that we see as advantageous to keep metadata in a linked sidecar file rather than in the object itself. This has the added benefit that the same sidecar file can be stored on just about any technology: LTO, another cloud provider with its on metadata storage idiosyncrasies, etc. Also, the sidecar file can hold an almost limitless amount of metadata.
Another aspect that came to our attention whilst using AWS S3 APIs between our products and third parties is that having an API only is only half of the story when it comes to building rich integrations and media workflows.
Basically, it would be really helpful for all of the tools if there existed a media file metadata specification for object storage. The specification could define fields around projects, relative paths, media types, etc. There are specifications within, e.g., IMF that could be drawn upon. But how to bring those up to a higher level and in a format that aids applications to collaborate at an object storage level?
So in summary, we’ve had to go about this ourselves and have defined a high-level metadata specification, or, as we term it internally: a Metadata API. All our applications use that specification, be it via filesystem access, S3 access, FTP, DropSpot or Vision. It allows our applications to collaborate seamlessly. And where we integrate with 3rd parties, we pull their metadata into our standardised format. We’d love to see a specification body (hint hint DPP / SMPTE) look at whether a wider specification would be useful.
Amazon S3 Object Storage is evolving quickly to make simple to use cloud storage. With our tools, we combine different tiers. But whilst it has object metadata commands in its API we find that for transportability, avoiding metadata limitations within an API (either in AWS or any other platform), non-proprietary access and no “vendor lock-in” keeping metadata in a sidecar file is key.
And, with object APIs, keeping metadata is only half the story for application interoperability. The other half of the story is keeping the metadata in a format that can be easily understood and shared. We need defining metadata specifications if we want to achieve portable integrations for our customers.
About Object Matrix
Object Matrix is the award winning software company that pioneered object storage and the modernisation of media archives. Our on-prem and hybrid cloud storage solutions bring operational and financial benefits to our customers by securely managing content at every stage of its lifecycle; from ingest and nearline to archive and distribution. Deployed where you need it, our technology is non-proprietary, integrates into existing workflows, enabling you to work locally and share globally.
Our flexible approach, coupled with our focus on the media industry, means that our customers can trust us to deliver the solution they need. Our domain expertise, solutions and world class support are tailored to meet the growing demands faced when creating, archiving and sharing media content. Customers include: BBC, Orange, France Televisions, BT, HBO, TV Globo, MSG-N and NBC Universal.