Data lakes for Hadoop, sensor networks, and operational analytics

Big data is the #2 use case for software-defined storage, tied with server virtualization. Big data applications help organizations make better, faster business decisions with real-time insight into information. Hadoop with HDFS and NoSQL platforms are able to save data to locally attached storage, but doing so creates redundant silos for each distribution. These islands of storage increase operational burden and increase costs.

Hedvig provides the ideal storage for big data. Its elastic, distributed architecture fits with the elastic nature of Hadoop and NoSQL. Fine grained control of features including replication, deduplication and compression let you efficiently store your big data applications atop a single, unified data lake.

Hedvig benefits big data, IoT, and operational analytics by:

  • Enabling virtualization of Hadoop and NoSQL, eliminating the CAPEX and OPEX associated with islands of locally attached storage

  • Starting small and scaling incrementally to terabytes per day of ingest rates, ensuring data is loaded and analyzed when needed

  • Applying data protection, multi-site DR, and data efficiency capabilities across multiple big data sets

Global, deduplicated storage for Hadoop and NoSQL

NoSQL and Hadoop clusters are not like the applications of yesteryear. They’re a new breed. One that is built with “webscale” or “hyperscale” as their core DNA. This new approach is needed to gather, manipulate, and analyze trillions of pieces of structured and unstructured data. With a new generation of big data apps comes the ability for you to predict customer sentiment, drive new business insights, and even develop new business models.

Hedvig provides a single underlying data management platform for these modern apps. This drives operational improvements, streamlines capacity management, and reduces capital through global deduplication and compression. Hedvig also provides the ability to tune replication. If you're big data application has built-in replication, then simply turn it off in Hedvig. This prevents an unnecessary exponential data growth.

Dedicated or tiered storage for storing IoT sensor data

IoT, like big data, is often a business-driven undertaking. The business has already decided that this IoT data is important – whether it’s sensor data, smartphone data, or other endpoint data – and that’s why it decided to deploy all of the connected devices, sensors and the like.

With an enormous influx of data comes a commensurate increase in data storage requirements. You can deploy Hedvig as an entire cluster just to your IoT data, or carve out a portion of it and apply specific policies and SLAs to create an IoT tier. If it’s real-time sensor data, then you’ll need compute and storage to be closer to the sensor for immediate processing. Because Hedvig is a true distributed system, you can deploy nodes in a geographically dispersed environment – ensuring data locality for IoT sensors while still providing a single, virtualized storage pool.

Operational analytics storage for Splunk and log management platforms

Enterprises are turning to Splunk and other log management platforms to monitor, search, and analyze machine-generated data. Performing operational analytics on this data gives insight for IT and business units into the health of various business assets and processes. However, these platforms typical require hot, warm, and cold storage tiers. Traditionally, enterprises have used different storage platforms with inefficient staging and de-staging across these tiers.

With Hedvig, you can provide a single, software-defined storage platform for warm and cold tiers. Simply deploy Hedvig on the right commodity hardware and set the right storage policies – including client-side caching, pin-to-flash, and deduplication. Hedvig scales indefinitely and greatly reduces the cost of managing operational analytics data.