In the 1990s, each application server tended to have direct attached storage (DAS). SANs were created to provide shared, pooled storage for greater scale and efficiency. Hadoop has reversed that trend back towards DAS. Each Hadoop cluster has its own — albeit scale-out — direct-attached storage. It helps to Hadoop manage data locality, but it trades off the scale and efficiency of shared storage. If you have multiple instances or distributions of Hadoop, therefore, you’ll have multiple of these scale-out islands of storage.
“The biggest challenge we come across is balancing data locality with scale and efficiency,” said Avinash Lakshman, CEO and Founder, Hedvig.