Thoughts and Updates from the Hedvig Blog

Subscribe to the blog
Get updates sent directly to your inbox

Use data replication and software-defined storage in your DR strategy

by Rob Whiteley on January 14, 2015

Blog_5_Fig_1.jpg

400 pages.

That’s how long the disaster recovery (DR) manual is for a midsized law firm that I recently visited in the US. Now, to many of us, that might not seem so bad. Four hundred pages is a typical novel length, so it’s no big deal. Right?

Wrong.

In the world of DR that is a big deal.

Think about it. Something goes wrong resulting in an IT outage. It could be a natural disaster, a spilled cup of coffee, or backhoe that takes out a critical cable. Disasters happen all the time. The problem only starts with the disaster; it’s the recovery that can be as painful, if not more so.

Now imagine you’re in charge of recovering from this IT outage and you need to follow 400 pages of manual, step-by-step instructions to bring your business -- and I really do mean business, given IT is the lifeblood of the modern enterprise -- back online. Imagine the power in your home goes out and you had to flip through 400 pages of instructions to restore TV, internet, and phone. I don’t know about you, but I would just opt to live offline for a few days. Unfortunately, your business can’t afford that “luxury.”

Given this anecdote, it’s no surprise that one study found, on average, businesses suffer 2.2 days of IT downtime, costing nearly $400,000. It’s also no surprise that human error and mistakes are the No. 1 contributor, exacerbating outages due to natural disasters with self-inflicted downtime. Still not worried? Then consider this:

So what are the common ways companies avoid or minimize disasters? This SearchDisasterRecovery article does a good job of summarizing three best practices:

  1. Prioritize disaster recovery applications and services. Conduct a business impact assessment, determine tier 0 services (services that must be online for apps to work), and then tier applications based on business priority. Automate how these services are recovered.

  2. Don’t overlook RTOs and RPOs. Understand and map the dependencies among applications, as well as between apps and their data. Make sure systems are recovered in a manner where you can hit your recovery time and recovery point objectives (RTO and RPO).

  3. Keep up with data replication. Ensure critical data is replicated among different failure zones and employ synchronous replication and asynchronous replication according to best practices 1 and 2 above. Coordinate replication efforts at the infrastructure, database, and application layer to make sure you’re not over or under replicating.

Blog_5_Fig_2.jpg

It’s this third practice that I want to further discuss. I came across a great Storage Decisions video from Jon Toigo. He talks about the value of a storage hypervisor (his term for software-defined storage) and data replication. In short: the flexibility of software-defined storage with built-in data replication changes both the capital and operational burden associated with data protection. It eliminates disparate infrastructure -- and the management overhead of such solutions -- and couples it with the cost savings of running virtualized storage on commodity servers.

But data replication is an evolving capability within the relatively new category of SDS. As Jon described, modern software-defined storage (SDS) platforms have replication built in. That means the good news is that as a DR planner and IT admin you’ve never had more choices in terms of how to replicate data. But the bad news is . . . well . . . you’ve never had more choices. The key is rethink how you replicate data to protect your critical apps and services. Gone are the days where data replication is a painful, static, one-way flow of data from site A to site B.

So how do you use SDS and data replication to your advantage in your DR process? Here are three tips:

  • Tip #1: Tune your storage replication factor based on criticality. As discussed above, you should align DR techniques with how critical  the applications and data are to your business. Tier 0 and tier 1 apps can and should be protected differently than tier 2 and below. Use a three-way replication factor as minimum for tiers 0-1, but don’t be afraid to go to a four-way (or higher) replication factor if business requirements dictate it.

  • Tip #2: Uses multiple sites to improve availability. Replicating your data to three nodes in a single data center is good. Replicating data to three different nodes in three different data centers is even better. Of course, you need to plan around network connectivity and latency, but a good DR plan will have already factored these in.

  • Tip #3: Add a cloud DR site. For even better protection, don’t just replicate to your own sites. Add a public cloud as a DR storage site. Run an instance of storage software in AWS, Azure, Google, or the cloud provider of your choice. Coupling multi-site with a public cloud helps mitigate the risk that an issue affecting your on-premise infrastructure will take down all sites, such as firmware upgrades.

 

Blog_5_Fig_3.png

If you’re interested in learning more, don’t hesitate to reach out. You can request a demo or email us for more info and to engage our experts.

Request a Demo

Rob Whiteley

Rob Whiteley

Rob Whiteley is the VP of Marketing at Hedvig. He joins Hedvig from Riverbed and Forrester Research where he held a series of marketing and product leadership roles. Rob graduated from Tufts University with a BS in Computer Engineering.
See all authored articlesÚ