Maximising disaster recovery options from existing storage infrastructure. Sophisticated DR options from MatrixStore advanced media workflow replication
Disasters, natural and otherwise
- An inexperienced broadcast engineer wipes terabytes of file based assets from production servers.
- A fire in the server room destroys most of the Interplay servers and nearline digital archives are damaged beyond repair.
- A storm creates an outage of media data centres on the east coast of America for days.
- A tsunami wipes out broadcast facilities in an area of Japan.
These are all disasters, although on different scales. The likelihood and frequency of each one happening, and the severity and consequences can also be very different.
This is the thorny world of business continuity planning.
In this post we aim introduce some of the key concepts and begin to apply them to the broadcast and media industry. There is a lot more information on advanced replication strategies for disaster recovery, including configuring advanced Avid interplay replication to be found here.
Business Continuity and Disaster Recovery are terms often used interchangeably. It’s worth putting them in context to avoid confusion.
Business Continuity Plan (BCP)
Business Continuity is the management discipline that ensures that a business can continue serving its customers and creating value even while suffering disruption to normal operations. The BCP is a business centric document that assesses internal and external risk and determines how an organisation will respond. It is the strategy that identifies what risks have to be tolerated, which ones can be transferred and which ones require mitigation. Business continuity typically rests on the tenants of resilience, recovery and contingency.
Disaster Recovery Plan (DRP)
If the BCP helps to provide the bigger picture regarding risk management, then the Disaster Recovery Plan details the mitigation. Today most businesses are data centric (none more so than broadcast and media companies) as such disaster recovery planning is usually IT centric. Building resilience usually means adding redundancy and spare capacity.
A disaster recovery plan is likely to specify a maximum acceptable length of time during which data might be lost from applications due to a major incident, called the Recovery Point Objective (RPO). This timescale will be accompanied by a Recovery Time Objective (RTO), the maximum acceptable length of time that your application can be offline.
Maximising system redundancy while minimising cost
Developing a BCP and DRP represents quite a challenge for the Chief Technical Officer or the broadcast systems engineer charged with writing it. The Chief Financial Officer is also under pressure. The CFO can’t argue against the need for system resilience, the potential loss in ad revenue and/or viewers following a disaster is self evident.
The CFO also knows that adding redundancy (virtual or real) gets expensive.
Hot, Cold and Warm sites aren’t sophisticated enough for media workflows
The traditional data recovery discussions around hot sites and cold sites can be limiting and aren’t sophisticated enough for protecting complex production workflows like Avid Interplay.
Being able to be understand types of moving through a production environment and programmatically direct it across existing LAN and WAN infrastructure has potential to change the way we think about disaster recovery. To do this requires a great degree of integration between storage platform and production server coupled with a sophisticated replication capability.
Exploiting untapped redundancy in your existing storage LAN and WAN
In a perfect world all data would be backed up to geographically diverse sites immediately. Data would be available to any site instantaneously. Production and services would be resumed at the flick of a switch. This type of redundancy can be in the cloud. Virtual redundancy if you wish. Advocates of cloud redundancy will go to great lengths to point this out.
However, despite the infrastructure being virtual it still needs to be paid for and it’s unlikely it will integrate and understand how you prioritise data in your 7 ISIS servers in Chicago compared to the digital archive clusters in Boston.
Back in the real world it’s a process of compromise, RTO’s and RPO’s determine the tolerances for system resilience and set constraints for planning. There will quite often be a QoS (quality of service) requirement standard to be met.
Other constraints may be your WAN and LAN options. In reality contingency solutions will differ for every organisation. LAN and WAN topologies will either constrain you or allow you flexibility in your DRP.
Object Matrix has implemented a DR strategy for a major broadcaster who took a pragmatic approach to balancing redundancy and cost. They used existing storage infrastructure (MatrixStore clusters) installed in multiple geographically dispersed business hubs to create a multi-directional approach to redundancy. Utilising advanced replication for Avid Interplay and archive assets in existing storage (i.e. already sunk fixed costs).
The Object Matrix View
These days media organisations keep their assets digitally, and more often than not, in centralised systems. This is both advantageous for the management and delivery of data but also a potential disaster. Losing assets to a disaster, virus, human error or malicious action is a huge risk that should not be ignored.
Building a BC or DR strategy is an essential extension of data management in the post-tape era. Tools are there to help to make this simple and easy to manage (of course we would strongly advocate MatrixStore). But also, done right, having distributed data can also help workflow, e.g., by being able to work on assets in multiple locations.
If you are interested in finding out more Jonathan Morgan CEO of Object Matrix has written a paper on the topic of advanced replication