Hitting the Enterprise Data Warehouse Optimization target is much easier now than you think. Thanks to maturity of Hadoop stack and other related developments, affordability has gone up several notches. CTOs and CIOs really need to take a close look at their IT portfolio and re-strategize in case their earlier attempts at taming the data with the help of the holy yellow elephant i.e. Hadoop, didn’t prove very fruitful.
It is time for a serious rethink. Here is a prescribed roadmap that will get you started fairly quickly without compromising or eliminating any scaling options that come standard with the architecture. You’ll need those options on the day the system will acquire the critical mass.
Start Ingesting The Data: Figure out what are the useful pieces of data that you currently store. Presently you may be storing only a handful of data items and that too for only so much duration due to technology limitations and cost prohibitive nature of maintaining the data. As a first step, setup a Hadoop stack and start ingesting. Don’t worry too much about filtering the data or upfront keeping the data short and sleek as the cost per GB of data storage under Hadoop is generally well within your IT budget. It will cost you half a dollar per GB to begin with and keep going down due to economies of volume with every additional GB of data.
Defer the ETL puzzle: Don’t think too much about ETL jobs to be setup in order to zero in on useful pieces of data out of the whole set of data and data sources that you started ingesting.
Business Intelligence connectivity and continuity: This is much easier than what it used to be a couple of years ago. All leading BI tools- be it Tableau, OBIEE, QlikView have started supporting SQL like data access to Hadoop based storage using Hive and other SQL-like tools and frameworks. Hence it is fast becoming a non-issue.Type your paragraph here.
Build a parallel environment: Now that all the data is started coming into a Hadoop cluster on regular basis, it is time for a parallel run. Start building dashboards and reports off the Hadoop based Big Data Warehouse. Keep doing it for a couple of months or a couple of logical business cycles as per your needs. If all is turning out to be well with the new Big Data warehouse, it is time to think about end of life for your traditional data warehouse. If not, iterate for a couple of months more or another business cycle till you get it right.
The New Use Cases: If the cluster has been running for a while, start thinking about leveraging the high retentively that is inherent in a Hadoop cluster based data warehouse. Think about building newer dashboards and reports that can help unearth new insights that depend upon longer horizons of data availability like 6 months, 1 year and even beyond that. Since the data is very much around in its fairly original golden form, think about exposing it to divisions and business units who historically didn’t get a handle on it due to resource crunch and security vulnerabilities that would creep in.
Governance, Security and Compliance: Have a tool in place that helps you govern distribution of this data to other divisions and business units who might have important use cases that it can be applied to. New governance strategies and technologies are now available to help out with this democratization of data. Trehanz.com is doing pioneering work in this area to make it easier and more secure to share data within the organization. Look forward to our upcoming blogs for more guidance on this topic.
The Social and IoT advantage: Now that you have your current data being ingested into a Hadoop cluster, start thinking about juxtaposing it with the social and IoT data sources and enriching your dashboards and reports with the help of these additional orthogonal insights.
Think outside the box and inside the yellow elephant i.e. the Hadoop cluster: While doing so keep an eye on the hidden architectural needs of these new uses case. For instance,
(a) Some may need real-time streamed processing,
(b) Others may need a low latency solution,
(c) Another one may need asynchronous high volume post processing.
With the wide gamut of frameworks like Storm, Spark, Kafka, etc. that are available around the Hadoop eco-system, all these architectural needs are well within reach. Don’t forget the schema-on-read nature of Hadoop storage that offers unlimited schema extensibility to what you may be currently storing. Any columns, rows, tables that may be currently missing can be added to Hadoop storage on a going forward basis.
Big Data is fast becoming the technology differentiator for the overall success of IT strategy for any enterprise. With the kind of technology maturity that is now available, it is about time you take a relook at your big data strategy as an integral part of IT infrastructure optimization. What is your Big Data strategy? Do you have any plans to take advantage of Big Data in 2015 to grow your business? Feel free to contact us for a free assessment at email@example.com. Or give us a call at +1.925.400.8475 to discuss possibilities.