This vendor-written piece has been edited by Executive Networks Media to eliminate product promotion, but readers should note it will likely favour the submitter's approach.
This first series of articles describe foundational steps that enable agile data warehouse development - something that has been a challenge in enterprise data management for years.
My prior articles published thus far describe how to develop a Business Conceptual Model as a starting point, then building a "grass roots" (at a minimum!) Data Governance capability.
The next focus for setting yourself up for a best in class agile data warehouse environment is to develop a high level data flow architecture that is inherently flexible and leverages repeatable design patterns.
In the end, every data warehouse has an architecture, composed of technical and data related components. The architecture is either planned, or it's developed without a plan.
When data warehouses are developed without a predefined architecture, it can severely limit flexibility, and ultimately impact the amount of work it takes to enhance and maintain it.
Without a planned architecture, subject areas don't fit together, connections lead to nowhere, and the whole warehouse is difficult to manage and even more difficult and time consuming to change.
This can have an even larger negative impact when doing agile development.
The high level architecture should always be designed with an eye toward update and expansion. It should be based on the results of the initial interviews that led you to the business conceptual model, and reviewed by Data Governance, as described in my prior blogs. As a part of the interview process, you should have gotten a sense of the expected user base and usage.
For example, does your company have data scientists or data analysts who will use analytical tools against raw data? If so, your data architecture will need to take that into account. Will your data warehouse be updated with new records or with modifications to existing records? Will there be new data sources that need to be integrated into the data warehouse frequently? The answers to these questions will have an impact on your architectural design.
The architecture we designed in my last organization included our version of a Data Lake that allowed for a permanent history of raw data with very little modification. The Data Lake allowed us to retain a full version history of every source record to support "as is" and "as was" queries. Our data analysts were able to query against the Data Lake for exploration and predictive purposes.
The Data Lake also has a number of technical advantages, such as supporting many load patterns and enabling very fast loads of new data so that our data analysts could obtain new source data quickly (agile in action)!
Sign up for Computerworld eNewsletters.