DSharp.fi
  • Home
  • Solutions
    • DSharp PathFinder
    • DSharp Studio
    • DSharp Studio Modeler
    • Pricing and Licenses
  • Services
    • Professional services
    • DSharp Training Program and Certifications
    • Developer Support
  • Customers
    • Our Customers
    • Customer Success Stories
      • Pirte’s Data Platform Modernization Boosts Data-Driven Healthcare
      • A City Expanded Its Data Warehouse with Library Data
      • Varha – a wellbeing services county – built a data-driven management system
      • From Double Checks to Smooth Automation – DSharp Studio Simplified Early Childhood Education Data Management
      • Productivity Leap improved Metsähallitus’ data management with DSharp Studio
  • Partners
    • Our Partners
    • Partner Stories
    • Become a Partner
  • Resources
  • Articles
    • Data warehousing
      • Data Warehouse concepts and data models
      • DSharp Studio Release: Extended Data Catalog and New Find Command
    • Reporting automation
      • Automate data pipelines
      • Making Data Management Easier with Automation
      • Metsähallitus enhanced its data management with DSharp’s data platform tool
    • Data modeling and mapping
      • Data Warehouse concepts and data models
      • Data vs Business Driven Modeling
  • About
    • About us
    • Contact us
    • Recruiting
  • Book a Demo
  • Start a Trial
  • Menu Menu
  • Features
    • DSharp Studio Features
    • DSharp Studio Modeler Features
    • Command Reference
    • Release Notes
    • Installation Guide
  • Modeling
    • Design Considerations
      • Data vs Business Driven Modeling
      • The Business Key
    • Basics
      • Model Types
      • Archetypes
      • Datatypes
    • Raw Model
      • Source Mappings
      • Metadata
    • Business Model
      • Metadata
  • Course Material
    • DSharp Studio Professional Course
    • DSharp Studio Expert Course
  • Legacy
    • DSharp Engine Features
      • Mapping Source Data in DSharp Engine
    • DSharp Engine Command Reference
    • DSharp Engine Installation Guide
    • 3rd Party Modeling Tools
      • Visual Paradigm
        • Modeling With Visual Paradigm
        • Configuring Data Vault 2.0 And Servers
        • Implementing Mappings
        • Setting Up
          • Import Stereotypes
          • Configure UI
      • Ellie
        • Modeling With Ellie
        • Ellie Metadata
    • Course Material
      • Intro Course
      • Advanced Course
  • Community
  • FAQ

Tutorial 09 – Handling Legitimate Duplicates

What We Learn

How to handle data sets that contain legitimate duplicates.

Mapping

Let’s continue with the HumMaster expansion. Let’s take a look at DS_DemoData.HumMaster.SALARY_PAYMENT_PART and let’s also end up with this model:

Every time an actual payment is sent out to the employee, these rows get recorded. Payments made with the same date to the same employee represent one payment/transaction. What’s noteworthy is that there are two identical rows, and they are both valid (two bonuses the same size), so it is not possible to define a natural key for this data. Hence, the Salary payment part class simply has no key. To deal with this, we will set its key type to NoKey. This forces a running index number within otherwise identical rows to be added to the hash calculation, making the hash unique even for those identical rows.

The data is also transactional in nature, as the same payment never happens more than once.

Mapping And Modeling

Model the remaining two classes like in the image above (re-use the Cost center class). Set Implement to True for new classes. Additional model details:

Do this… …and this happens
Set Salary component.Code as Business key. Same old, same old…
Set Salary payment part to be a Transaction. Salary payment part will be implemented by a non-historized link.
Set Salary payment part’s key type to NoKey and arrange all hash components into an order of your liking. Or just go with the default order. While hashing, all identical rows will get a running index which will be included in the hash. Hence all hashes will be unique.

Having already implemented Employer, the rest of the mappings could go like this:

Source table… …goes to the Staging Area as… …and maps to the Class
DS_DemoData.HumMaster.SALARY_PAYMENT_PART HumMaster.RAW_SALARY_PAYMENT_PART Salary payment part
DS_DemoData.HumMaster.SALARY_COMPONENT HumMaster.RAW_SALARY_COMPONENT Salary component

Add the necessary mapping rows to the 05_Mappings_HumMaster.csv file and save it.

Save the project and run the model. Open the Human Resources submodel and check its implementation. Your Raw Vault should look like this:

If so, deploy the new classes.

Run Tutorial Scripts

Run the following tutorial script commands fron the Help -> Tutorials -> DSharp Studio Professional Course -> Hash Duplicate Handling menu, and inspect the results.

Script Source data Main points of interest
Step 1: Load Salary Data There are real duplicate rows. Among identical rows, the hashing procedure has generated a running index number to be included in the hash, making the hashes unique.

Next tutorial

  • Developer Content
    • No Access
    • Model Types
    • Archetypes
    • Datatypes
    • Source Mappings
    • Metadata
    • Metadata
    • Data vs Business Driven Modeling
    • The Business Key
    • DSharp Studio Professional Course
      • Tutorial 01: Person Tutorial
      • Tutorial 02: Add Attributes
      • Tutorial 03: Project Tutorial
      • Tutorial 04: Hash Error Handling
      • Tutorial 05: Simple Hierarchy
      • Tutorial 06: Transactions
      • Tutorial 07: Filtering Data
      • Tutorial 08: Loading Multiple Classes From One Source
      • Tutorial 09: Handling Legitimate Duplicates
      • Tutorial 10: Multiple Attribute Values
    • Intro Course
      • Intro Course – Before You Begin
      • Tutorial 01 – Visual Paradigm
      • Tutorial 01 – Ellie
      • Tutorial 01: Person Tutorial
      • Tutorial 02: Add Attributes
      • Tutorial 03: Project Tutorial
      • Tutorial 04: Hash Error Handling
      • Tutorial 05: Simple Hierarchy
      • Tutorial 06: Transactions
      • Tutorial 07: Changing Load Behaviour
      • Tutorial 08: Loading Multiple Classes From One Source
      • Tutorial 09: Handling Legitimate Duplicates
      • Tutorial 10: Multiple Attribute Values
    • Advanced BDP Developer Course
      • Tutorial 01: Key Groups
      • Tutorial 02: Implementing the Participation Design Pattern
      • Tutorial 03: Versioned Source Data
      • Tutorial 04: Advanced State Handling
      • Tutorial 05: Hierarchy Alternatives
      • Tutorial 06: Hierarchy Depth Changes
      • Tutorial 07: Partial Load Deletion Detection
      • Tutorial 08: Merge Hashes Non-Destructively
      • Tutorial 09: Implementing Business Objects
      • Tutorial 10: Custom Value Transformations
    • DSharp Studio Expert Course
      • Tutorial 01: Key Groups
      • Tutorial 02: Implementing the Participation Design Pattern
      • Tutorial 03: Versioned Source Data
      • Tutorial 04: Advanced State Handling
      • Tutorial 05: Hierarchy Alternatives
      • Tutorial 06: Hierarchy Depth Changes
      • Tutorial 07: Partial Load Deletion Detection
      • Tutorial 08: Merge Hashes Non-Destructively
      • Tutorial 09: Derived Classes
    • Mapping Source Data
    • Command Reference
    • Release Notes And Downloads
    • DSharp Studio Command Reference
    • DSharp Studio Command Reference
    • DSharp Studio Features
    • DSharp Studio Modeler Features
    • DSharp Studio Modeler Installation Guide
    • Installation Guide
    • Configuring Data Vault 2.0 And Servers
    • Modeling With Visual Paradigm
    • Implementing Mappings
    • Import Stereotypes
    • Configure Visual Paradigm UI
    • Modeling With Ellie
    • Ellie Metadata
  • Frequently Asked Questions

Finland
Yliopistonkatu 31, 20100, Turku
Bertel Jungin aukio 5, 02600 Espoo

Germany
Podbielskistrasse 333,
5th floor, 30659, Hannover

Solutions

  • DSharp Studio
  • DSharp Studio Modeler
  • Pricing and Licenses

FAQ

  • About us
  • Contact us
  • Recruiting

Legal

  • Privacy policy
  • DSharp Studio License Subscription Agreement

Contact us!

    © Copyright DSharp Oy
    • Link to LinkedIn
    • Link to Mail
    Scroll to top Scroll to top Scroll to top