Data warehousing is not dead, but it is changing as new technologies, including Hadoop and cloud platforms, have an impact.
Nostalgia might be a strong driver in music, fashion, and entertainment, but it has never been a captivating force in technology. Mainframes and COBOL are still very much with us, but I don’t run into people polishing up their IBM System/390 as they would an Oldsmobile Toronado. The tech industry is always about the new, so why is one of 2018’s top trends the revival of interest in data warehousing?
Enterprises need to solve present-day requirements for trusted, curated, and integrated data that are only going to grow in the years ahead as organizations strive to become more data-driven in their decision making.
Vendors and Technology Trends
To spot evidence of this data warehousing trend, I needed to look no further than the Strata Data Conference in New York this past September. Strata has long been a bastion of big data, with a particular focus on the Apache Hadoop and Spark ecosystem. However, in meetings with vendors at this fall’s event, the dominant topics were data warehousing, data cataloging, metadata and semantic data integration, and governance.
For example, a key focus at the event for Cloudera, which partners with O’Reilly Media to produce Strata, was the Cloudera Data Warehouse. Cloudera’s solution melds technology advances in big data platforms that have enabled organizations to collect and manage petabytes of data with functionality to support the concurrency demands generated by democratized, self-service BI and analytics. Cloudera and other vendors made it clear that the revival of interest in data warehousing is not about stepping back. The “modernized” data warehouse is about exploiting the maturity of big data technologies and self-service data preparation and integration to expand the scale, scope, and usability of data warehouses.
Cloudera and erstwhile rival Hortonworks made further news in October when they announced plans to merge and bring to market a unified data platform and Hadoop distribution. Although the combined company under Cloudera management will begin work on the technology merger as soon as the deal closes, both distributions will continue to be supported for at least three more years.
The merger demonstrates the maturity of the Hadoop distribution market as well as the pressure being brought to bear by the growth in cloud-based data management and storage. MapR, which offers its Converged Data Platform on premises and in the cloud, provides the primary alternative Hadoop distribution to this combine. MapR and Cloudera both have data warehouse modernization and optimization solutions that, for example, advocate use of Hadoop and Spark platforms for ETL workloads. Cloud computing is unquestionably the biggest change agent in the data management and data warehouse landscape. Vendors that have been prominent in the market for on-premises data warehousing and big data platform solutions are having to adjust fast to the surge of interest in cloud-based services.
Venture capitalists are excited about the market opportunity of cloud as a new platform for BI and data warehouse management; prime evidence of this was the massive $450 million growth funding invested in Snowflake Computing by Sequoia Capital and several other leading VC firms in October. Amazon, with Redshift, is perhaps the most prominent cloud-native competitor for data warehousing given the “data gravity” of its large share of the data storage and Web services market. Google, with BigQuery, has a growing presence, and major platform vendors such as IBM, Microsoft, Oracle, and SAP are also players in the cloud-based data warehousing arena