If you’re on the fence about modernizing your data warehouse, consider this: Data warehouse technology hasn’t changed much in almost 40 years, the equivalent of eons in the fast-paced technology sector. During that time, it’s performed admirably as the centralized repository for enterprise-wide analytics, capable of storing and processing comparatively large and homogenous data sets. Support for strong-typing, enforced constraints, and CRUD (Create, Read, Update, Delete) operations was essential for its wide adoption and ultimate success. But in recent years, the advent of ‘Big Data,’ and a voracious appetite for data in general, has exposed some technological gaps that legacy warehouses just aren’t equipped to handle.
Massive data volumes require scalability
Organizations are continually looking to new technologies, such as Internet of Things (IoT) devices and log mining, to help answer critical business questions. These sources generate petabytes of data. IDC predicts that the amount of IoT data that is analyzed and used to change business processes in 2025 will reach nearly 80 zettabytes(1). In the near future, legacy warehouses simply won’t be able to contain the amount of data produced.
Businesses are already anticipating the data deluge - 71 percent of enterprises expect investments in data and analytics to increase in the next three years and beyond (2). In the past, you’d react to this increased demand by building out existing infrastructure. The velocity of data growth, combined with long implementation times and exorbitant costs, render this scalability strategy unsustainable. The static nature of bare metal resources that house legacy warehouses creates a constant tug-of-war between under and over utilization, ultimately leading to poor performance or wasted dollars.
The Hadoop ecosystem was developed to solve some of these problems, but it comes with its own challenges. Adoption demands a large investment in human capital, highly skilled engineers, and technicians to build, scale, and maintain a complex environment. The hodge-podge assemblance of bolt-on utilities, each with their own varied levels of support and community interest, creates a maintenance nightmare with no simple way to gauge the larger platform’s efficacy.
Some formats compromise data quality
Web-friendly formats, like JSON, enable applications to pass around complex data structures with relative ease. NoSQL databases were built to store these formats natively, unburdening developers from the strictures of things like predefined schemas, foreign keys, and enforced constraints. This freedom, however, comes with a significant cost to data consistency and cleanliness, confusing even simple relationships that analytic models depend on to be valuable.
In other words, each of these technologies only solves a specific problem, and at the cost of functionality that has made the data warehouse so successful. Until now.
(1) 152,000 Smart Devices Every Minute in 2025: IDC Outlines the Future of Smart Things," Forbes, March 3, 2016
(2) Global State of Enterprise Analytics, 2018," Forbes, August 8, 2018