A business driven model is a model that describes business concepts and how they relate to each other, irrespective of what the structure of any current or future source data for these concepts looks like.
When constructing a business driven model, the target of the model is the real world. Hence a business driven model contains real-world phenomena that are derived from the purpose of the model, on a detail level that matches the actual need and that uses familiar and established terminology. It should also be robust enough to not break down because of changes in the business.
The reasoning behind this strategy is that the DW serves its purpose optimally only if the data exists in a structure that correctly describes the real world. Understanding the real world translates directly to understanding the data model, making the data infinitely more usable than data that is stored in source system-like structures, which can literally be anything from metadata based structures through name/value pairs all the way to actually decent structures.
Pros
- the structure of the data is immediately familiar to people who know the business
- the model can make sense to anybody, is generic and not dependent on any particular source system
- the model serves additional purposes unknown at design time in a more natural way
Cons
- the data may be hard to transform to the required format when loading it to the DW. As a result, the Business model may become compromised when changed to accommodate the suboptimal data.
- if the business changes, there may in worst case be a need to reload the data, in case implementation guidelines dictate that the data should always comply with the model (on the other hand, if it doesn’t dictate this, the model will be rendered useless over time, as it no longer describes the data)
Data Driven Modeling as DW Strategy
In data driven modeling the data itself is the target of the implementation. The reasoning is that the data is what it is, and will not historically change: what we have loaded into the DW is what we got from the source at the time we got it.
In this case the focus is to get the model to accurately describe the data itself. This approach states that it makes no sense to put effort into something that does not exist (the data in any other format than it is delivered in, or data that simply does not exist at all), but instead to work with what’s there.
Pros
- straightforward and quick to implement: you load what you get
- almost purely mechanical and therefore highly automatable loading process in all cases
- flexible: any additional layer (business vault) can be built on top of the raw vault without fear of the underlying structure ever changing drastically
Cons
- the structure is not understood by business people, and is by nature hard to use
- the user needs to learn a new structure each time a new source is integrated into the DW
- the raw vault is, for all practical purposes, just a “staging area” for any specific need: there may be a significant additional effort needed to make the data usable for the business
- the model may slowly become overwhelming when more source systems are added
Hybrid Strategy
A hybrid approach combines the best of both worlds: the data should be available in a pure source-system agnostic business driven format, and the raw vault implementation should be flexible and not purely dependent on a business driven model which may or may not contain the elements that are factually extracted from the source systems and need a home in the DW.
In short: Irrespective of how the raw vault is built, always publish the data through a separate Business driven interface.
This can be accomplished using several approaches. One extreme approach is to built the Raw Vault using a pure data driven approach and worry about everything else when implementing the Business Vault using a purely business driven model. Using this approach, all source systems can be completely separated from each other, gaining the maximum flexibility of the purest data driven approach. However, keeping source systems fully separated (for example, not loading persons around a shared hub) misses the point of the Data Vault, so going quite this extreme is not recommended. At a bare minimum, one should integrate the different source systems on a business key level. And as long as that is happening, one might just as well always try to fit everything to the structure of the Business driven model, and resort to a more data driven approach when the business model does not directly support the situation at hand, and is not easily changed to do so.
The Pros and Cons are more or less the ones from each approach, but specifically:
Pros
- the data is always presented to the user through a pure Business driven model
- the Raw Vault may be implemented using any strategy
- if the two models are kept strictly separated, nothing ever needs to be reloaded to the raw vault if the Business model changes
Cons
- depending on the level of integration between the models, and data quality (as always), the extra layer may be a resource hog