The mapping file contains a textual representation of the source ER model as seen by the implementer, with additional metadata. Each source column is mapped to either an attribute or an association end in the model.
You should name the files using the format XX_Mappings_SOURCE.csv, where XX is a running number 01…99 denoting the order of the source system begin mapped, and SOURCE denotes the source system. Using separate mapping files allows for easy collaboration within the project in such a way the developers working on different source systems’ data don’t get their mappings mixed up.
Mapping File Structure
Column (in order) | Content |
---|---|
SourceSystems | The source systems the data is extracted from. Each source corresponds to an existing SourceSystem instance in the model and to a generated schema in the staging area. This column may contain several comma-separated source names. The mappings will be generated separately for all sources.
Normally only one source system is declared in this field. |
Table | The name of the table in the staging area’s source system specific schema where the data will be loaded from.
Use the following naming conventions: – RAW_NameOfOriginalSourceTable, if the content is unchanged from the original source data – OUT_NameOfMappedSourceTable, if the content has been altered (denormalized etc). Typically an OUT_ object is a view, selecting a subset of columns from a RAW_ table containing the original data. Always preserve the original column names if possible. |
Column | The column in the source table that contains the data for the target property. |
InPrimaryKey | “x” indicates that the column is part of the technical primary key of the table. |
References | A dot-notation reference to another source column. Use the syntax SourceSystem.Table.ReferencedColumn for referencing any column, Table.ReferencedColumn for referencing a column within the same source system and only ReferencedColumn to reference a column within the same table.
The referenced column is usually the primary key column of another table, or a part of it. |
Skip | Use “x” to indicate that this mapping row should not be processed. |
Class | Name of the target class in the Raw Data Model. |
Property | Name of the target attribute or association end (or referenced class, if no role name has been assigned).
If the column is a part of a reference, Property should contain an association end. |
Group | Generates a new satellite. If used, this will steer the corresponding data to a different satellite than what the rate of change -value for the property would indicate. Note that this will be applied on attribute level, so if this is specified for an attribute in any file, all other mappings to the same attribute will use the same sastellite structure. Also, the last mapping file processed that defines the group for an attribute is the one that will remain valid.
In other words, the Group setting is not source specific! |
AlreadyHashed | “x” indicates that the content of the column is already hashed. |
VersionColumnIndex | To be documented alongside an upcoming tutorial. |
InLogicalKey | To be documented alongside an upcoming tutorial. |
Transformation | A transformation operation to be applied to the column before loading it into the Raw Vault.This can either be SQL code or a predefined transformation. Use the keyword %VALUE% to represent the column being transformed (for instance LTRIM(RTRIM(%VALUE%))). |
ColumnSize | For strings, the string length of the column. Needs only be used when the actual data length exceeds the length of the attribute’s datatype’s length. |
Additional Tags
For each mapping group (set of column-to-property mappings from the same table to the same class) you can specify the following additional tags:
Column (in order) | Content |
---|---|
Alias=AliasName | When mapping the same source table to several classes, or when mapping several source tables onto the same class from within the same schema, assign an Alias to the mapping group. The alias is then treated as a virtual table, and will have its own separate data flow. The alias name should be unique for each source table, as it is appended to the end of the name of the work-table derived from that source table.
When using the alias, it should be written in the first column, on the row before the first mapping row belonging to the alias. |
distinct=false/true | You can force a distinct selection from the source table by defining the distinct key word value as true or false. The default is false, so you can omit this every time you just want to load all rows.
“Fill from the left”; write this rule in the first leftmost vacant cell on the row before the first mapping row that the keyword should affect. You can use this with or without the alias or the where-clause. |
where-clause | You can specify any SQL-compatible where clause here, and it will be added as is to the query that extracts the data from the source table and inserts it to the corresponding work table.
“Fill from the left”. You can use this with or without the alias or the distinct keyword. |