I wrote a blog before joining Hedvig about Polyglot Persistence and I thought I'd revisit the topic again. To quickly recap, the term is generally used in the database world to describe using the correct database for the correct application workloads.
Polyglot actually is applied to meaning someone that can speak different languages, so the context I think is quite relevant.
For example, think of:
- A document-oriented application — think hotel rooms, concert tickets, insurance quotes, etc. where each record can be summarised as a simple document (JSON,YAML, XML, etc.) — should probably use document storage.
- Highly relational systems (say family history records) might be served well by a graph database.
- Time based performance metrics would perhaps be best served by a time series database.
These are all background to the growth of NoSQL (or “not-only SQL” for the purposes of this blog), where specifically optimised databases for certain data types are created. Amazon Dynamo and Apache Cassandra are two very good ones, for obvious reasons!
What has this got to do with storage? As I previously wrote, I think traditional storage systems are similar to traditional SQL databases. They are designed to be general purpose and so they don't necessarily excel at any one thing (well, they excelled at being general purpose I guess). But as IT has moved on, we start to need more out of a storage system, not just a general purpose “jack of all trades” SAN.
That’s where Hedvig and software-defined storage come in.
Hedvig picks up where “NoSAN” left off
I think that Hedvig is to storage systems what a NoSQL database is to SQL. Let me steal (and hopefully pay homage to) Nutanix’s NoSAN campaign, but again in my terms: “not-only SAN”. Our friends at Nutanix created a solution that can do boring traditional SAN stuff like everyone else, where you carve out LUNs, replicate it to a standby site and take snapshots ‘til the cows come home, but also provided the storage and operations needed to support private clouds -- which traditional, monolithic SANs couldn’t support (easily).
Hedvig is similar, but does far more as a software-defined storage platform. We push up our feature set to the allocation layer (what we call a Virtual Disk). This means side-by-side to “SAN storage” I now can create:
- An object store that supports a cloud or distributed application
- A synchronously mirrored, stretched VMware cluster
- A high performance scratch disk
- Persistent storage for a globally distributed container farm
- Storage to support a Hadoop cluster
- Deduplicated and compressed backup targets
How do we support this diversity in a single platform? The key is policy-based provisioning at the allocation layer. We provide storage-based Polyglot Persistence to provide the correct storage profile that's relevant to a specific application, and we don't compromise other workloads in order to do so.
Amazon, Microsoft, VMware, and Docker are on the Polyglot Persistence path
In the industry we’re moving towards policy-based storage platforms, much like Amazon EBS and Azure Storage Spaces. I don't care about the specifics of the storage system, I have an application that needs “ABC” features and it needs some storage, I have another application that needs “XYZ” features and also needs some storage, but I don't really want different storage systems. Actually I don't want to even care about storage systems.
I simply want a storage platform where I can request a certain optimal design of storage or even application features and provision the most optimised data space for that.
I strongly feel that policy-based provisioning is the future for storage. We’re seeing this already with cloud-based services (as already referenced), as well as things like VMware VVOLS and the Docker Volume Plugin. These systems don't ask for specific LUNs, disk tiers, or connection protocols. They simply say “I need some capacity for my application, please provision based on a specific policy.”
To do this, and to best service the multitude of different application types, we need to have policy-based provisioning, which is why we have pushed all of our features (including our RAID replacement, replication policies, deduplication, flash acceleration, and many more) up to the Virtual Disk layer and allow you the consumer to optimise the storage for each individual workload requirement.
The rising importance of APIs in storage
I've somehow got this far and not talked about APIs! Of course this needs a solid, robust API layer to service it! Remove the admin layer that does the manual provisioning and push this to the application layer. When an application or service is provisioned, simply make API calls and get the specific policy-based storage provisioned for you.
This could even be such that all you say is “give me a storage” or “give me 3 storage” (poor English intentional).
The policy defines that this requester should have storage with “XYZ” features and is provisioned in 50GB chunks. This also pushes the storage management tasks up to the application. I believe this is key to having a software-defined storage platform that scales to petabytes without having a team of storage experts that has to scale exponentially.
If you're interested in learning more about how we do this at a technical level, I encourage you to hop over to our whiteboard video series. See Polyglot Persistence in action!