Celebrating 15 years of Data Warehouse Automation

In business intelligence, as far as basic needs go, not much has really changed in 15 years. People still need data for decision making. Technology has changed, luckily, and keeps doing it, rapidly. Cloud based BI solutions evolve and change between page reloads, it appears. For me, though, the biggest change is that finally there is some reusability in Data Warehousing. 15 years ago there was really nothing except personal experience that had any impact on how quickly one could implement the loading mechanism of something that had been done several times previously, in other projects. As far as code reusability goes, DW projects were notoriously bad. Degree of reusability was somewhere close to 0%. Actually, they probably even weren’t considered to be “real” software projects at that time, I’d guess for statistical reasons. No reusability inevitably shows in the amount of work needed to get them done. Many failed.

What was to become D♯ SmartEngine took its first baby steps in 2006. Back then, many commercially available tools could generate database table structures from UML models, but the cause & effect (“the model looks like this, therefore the tables should look like this”) was more or less hardwired into the tool and the internal mappings were not available to the user.

Being a consultant having worked with Conceptual models and implementations based on them since the late 90’s, I was not going to abandon them. Several times I had given a model and a set of conversion rules to a programmer and he would produce the correct table structure based on it, and provided with description of the source data, he could also implement the loading mechanisms correctly. So it felt like something could be done to ease the workload of the programmer.

Enter SmartEngine Jr.

SmartEngine Jr could import a UML object model and generate a 3NF table structure out of it, exactly like many other tools. The result was structurally identical to the UML model: every class became a table, every attribute a column, every association a foreign key reference. So why not just draw the table structure? Well, first of all, it was more work. You still had to add extra columns that were a distraction when showing the table diagram to the customer. Foreign keys and primary keys had to be defined (eventually) and so on. Secondly, using speshöl käräktörs and spaces in table and column names was best to avoid. So the Conceptual model functioned on the correct level of abstraction: it was a more natural tool to be used in workshops and, in general, as a communications tool working with the customer. It presented the classes and attributes using their natural names, and it did not care about technical details. It saved time, as no ER model needed to be separately maintained, and the converted ER model never contained more spelling mistakes than the original Conceptual model.

Other information could be extracted from the model. Documentation could be generated, including the relationships between the model, which the customer was familiar with, and its technical implementation, the Data Warehouse. Table load order could be deduced: any referenced table should be loaded before any table referencing it. Using this information, orchestration packages could be generated, either procedures calling other procedures or SSIS packages executing other SSIS packages. Basically, any mechanism that was text based.

At this time we used to write the actual loading mechanism by hand, but it was obvious that if there existed a mapping between the source data in the staging area and the elements in the Conceptual model, we could automate the table loading as well, chaining these new mappings to the property-to-column mappings (property here = attribute or association end). The first Data Vault automation implementation in late 2008 (that went into production 2009) showed us that we had the right approach: The Conceptual model that was the basis for the Data Warehouse did not change (why should it?), but the table structure generated from it did. It was no longer a 1:1 conversion. Now the Conceptual model more clearly functioned as a “what”, and we could generate a “how” from it. As a final step we wanted to eliminate the manual mapping of source to model, and instead we generated what we called a “loading interface” to the Staging Area, which was a set of empty tables looking more or less exactly like the classes in the Conceptual model. These were automatically generated and mapped to the Conceptual model internally in SmartEngine Jr. From this point on, all the developer had to do was either fill the generated staging area tables with the data he wanted to load into the DW, or replacing them with identically looking views that select the data from the raw data. Doesn’t really get any easier than that.

This approach went into production in 2009, and several implementations still are. Zero changes have been made during the years to the logic. It just works.

Come 2018, SmartEngine Jr has grown up. Tens of installations, but time to upgrade to Data Vault 2.0. Compared to the “DV light” approach of the previous 1.4 version described above, version 2.0 went. No longer do you know what the table structure looks like just by looking at the Conceptual model. Attribute metadata and source systems define the tables, but the

and just by looking at the Conceptual model, you have absolutely no idea what the table structure that implements it is.

Looking forward, technologies come and go, but modeling will remain. And as long as implementations are generated from models, be it 3NF, Data Vault,

Celebrating 15 years of Data Warehouse Automation

Solutions

FAQ

Legal

Contact us!