DSharp.fi
  • Home
  • Solutions
    • DSharp PathFinder
    • DSharp Studio
    • DSharp Studio Modeler
    • Pricing and Licenses
  • Services
    • Professional services
    • DSharp Training Program and Certifications
    • Developer Support
  • Customers
    • Our Customers
    • Customer Success Stories
      • Pirte’s Data Platform Modernization Boosts Data-Driven Healthcare
      • A City Expanded Its Data Warehouse with Library Data
      • Varha – a wellbeing services county – built a data-driven management system
      • From Double Checks to Smooth Automation – DSharp Studio Simplified Early Childhood Education Data Management
      • Productivity Leap improved Metsähallitus’ data management with DSharp Studio
  • Partners
    • Our Partners
    • Partner Stories
    • Become a Partner
  • Resources
  • Articles
    • Data warehousing
      • Data Warehouse concepts and data models
      • DSharp Studio Release: Extended Data Catalog and New Find Command
    • Reporting automation
      • Automate data pipelines
      • Making Data Management Easier with Automation
      • Metsähallitus enhanced its data management with DSharp’s data platform tool
    • Data modeling and mapping
      • Data Warehouse concepts and data models
      • Data vs Business Driven Modeling
  • About
    • About us
    • Contact us
    • Recruiting
  • Book a Demo
  • Start a Trial
  • Menu Menu
  • Features
    • DSharp Studio Features
    • DSharp Studio Modeler Features
    • Command Reference
    • Release Notes
    • Installation Guide
  • Modeling
    • Design Considerations
      • Data vs Business Driven Modeling
      • The Business Key
    • Basics
      • Model Types
      • Archetypes
      • Datatypes
    • Raw Model
      • Source Mappings
      • Metadata
    • Business Model
      • Metadata
  • Course Material
    • DSharp Studio Professional Course
    • DSharp Studio Expert Course
  • Legacy
    • DSharp Engine Features
      • Mapping Source Data in DSharp Engine
    • DSharp Engine Command Reference
    • DSharp Engine Installation Guide
    • 3rd Party Modeling Tools
      • Visual Paradigm
        • Modeling With Visual Paradigm
        • Configuring Data Vault 2.0 And Servers
        • Implementing Mappings
        • Setting Up
          • Import Stereotypes
          • Configure UI
      • Ellie
        • Modeling With Ellie
        • Ellie Metadata
    • Course Material
      • Intro Course
      • Advanced Course
  • Community
  • FAQ

What We Learn

Handling hash errors so that they do not crash the target table load process.

If the hashing process produces non-unique hash values, the next step (load into hub, load into satellite, load into link) will fail due to an attempt to insert duplicate primary key values into the target table. Letting the load crash may be a valid way to deal with the bad data; the crash and the reason for it will be registered in the log, and can then be dealt with. The data in the target table has not been updated at all, leaving it in the same state it was in before the failed load attempt, so it contains no incomplete partially loaded batches. This portion of the load will not succeed until the error has been corrected in the source system.

If you want to enable loads that loads valid rows but skips invalid ones, you can choose to do that by using D# Engine’s automatic duplicate hash handling mechanism. This is useful particularly during development, when you want to load at least some data in order to be able to get the Info Mart development going. In a production environment you should have an agreed-upon process in place for handling bad data.

How It Works

To enable this functionality, in D# Engine select the class (or classes) you want enabled for hash tests and run the command Parameters.Set Hash Duplicate Handling On found in the popup-menu for the selected classes. This sets the appropriate parameter value in the ETLParameters table to 1.

Don’t do this yet, wait until you have completed Step 1 of the tutorial scripts below.

The error handling procedure will check the value of this parameter, and move all rows containing duplicate hashes from the working table to the error handling table. Once all offending rows have been removed from the working table, the next steps (load hub, load satellite, load link) in the loading process will proceed without errors.

The rows in the error table can be checked and reported back to the customer. Additionally, they may also be processed so that the rows containing obviously wrong data can be dropped from the error table, and the remaining correct (and verified to be so!) rows can then be moved back to the working table and loaded successfully into the raw vault.

The same duplicate hashes will be generated in each load batch until the error has been corrected in the source system. However, the impact of these errors may be zero if the correct rows have been previously manually cleaned and loaded into the DW.

Run Tutorial Scripts

Run the following tutorial script commands fron the Help -> Tutorials -> Intro Course -> Hash Duplicate Handling menu, and inspect the results.

Script Source data Main points of interest
Step 1: Load Persons With Duplicate Data There are duplicate rows for Immonen. Hashing will produce duplicate person data, and the subsequent loads will fail. Neither Immonen nor Jansson will be in the DW, since the entire batch failed. The root cause is that Immonen is present in the data twice. One of the Immonen rows contains the wrong gender code (1 = male). While debugging, the code meanings can be verified from the K_Gender view.

At this point, turn the Hash Error handling parameter value on for the Person class, as instructed above.

Step 2: Reload Persons With Duplicate Data There are still duplicate rows for Immonen. Hashing will produce duplicate person data, but the duplicate rows will be moved to the error table, and the load will proceed without them.

Jansson will exist in the DW, but Immonen won’t.

Step 3: Manually Correct Error And Reload Trim the error table so that only error free rows from the last batch remain. Do this step by step:

– Drop the bad data rows from the errors table. The SQL code for this found in the script.

– Run the Reload Last Failed Rows command, which will re-insert the last remaining rows from the error table into the working table, and load them into the DW.

Note that Immonen is in the DW.

Next tutorial

  • Developer Content
    • No Access
    • Model Types
    • Archetypes
    • Datatypes
    • Source Mappings
    • Metadata
    • Metadata
    • Data vs Business Driven Modeling
    • The Business Key
    • DSharp Studio Professional Course
      • Tutorial 01: Person Tutorial
      • Tutorial 02: Add Attributes
      • Tutorial 03: Project Tutorial
      • Tutorial 04: Hash Error Handling
      • Tutorial 05: Simple Hierarchy
      • Tutorial 06: Transactions
      • Tutorial 07: Filtering Data
      • Tutorial 08: Loading Multiple Classes From One Source
      • Tutorial 09: Handling Legitimate Duplicates
      • Tutorial 10: Multiple Attribute Values
    • Intro Course
      • Intro Course – Before You Begin
      • Tutorial 01 – Visual Paradigm
      • Tutorial 01 – Ellie
      • Tutorial 01: Person Tutorial
      • Tutorial 02: Add Attributes
      • Tutorial 03: Project Tutorial
      • Tutorial 04: Hash Error Handling
      • Tutorial 05: Simple Hierarchy
      • Tutorial 06: Transactions
      • Tutorial 07: Changing Load Behaviour
      • Tutorial 08: Loading Multiple Classes From One Source
      • Tutorial 09: Handling Legitimate Duplicates
      • Tutorial 10: Multiple Attribute Values
    • Advanced BDP Developer Course
      • Tutorial 01: Key Groups
      • Tutorial 02: Implementing the Participation Design Pattern
      • Tutorial 03: Versioned Source Data
      • Tutorial 04: Advanced State Handling
      • Tutorial 05: Hierarchy Alternatives
      • Tutorial 06: Hierarchy Depth Changes
      • Tutorial 07: Partial Load Deletion Detection
      • Tutorial 08: Merge Hashes Non-Destructively
      • Tutorial 09: Implementing Business Objects
      • Tutorial 10: Custom Value Transformations
    • DSharp Studio Expert Course
      • Tutorial 01: Key Groups
      • Tutorial 02: Implementing the Participation Design Pattern
      • Tutorial 03: Versioned Source Data
      • Tutorial 04: Advanced State Handling
      • Tutorial 05: Hierarchy Alternatives
      • Tutorial 06: Hierarchy Depth Changes
      • Tutorial 07: Partial Load Deletion Detection
      • Tutorial 08: Merge Hashes Non-Destructively
      • Tutorial 09: Derived Classes
    • Mapping Source Data
    • Command Reference
    • Release Notes And Downloads
    • DSharp Studio Command Reference
    • DSharp Studio Command Reference
    • DSharp Studio Features
    • DSharp Studio Modeler Features
    • DSharp Studio Modeler Installation Guide
    • Installation Guide
    • Configuring Data Vault 2.0 And Servers
    • Modeling With Visual Paradigm
    • Implementing Mappings
    • Import Stereotypes
    • Configure Visual Paradigm UI
    • Modeling With Ellie
    • Ellie Metadata
  • Frequently Asked Questions

Finland
Yliopistonkatu 31, 20100, Turku
Bertel Jungin aukio 5, 02600 Espoo

Germany
Podbielskistrasse 333,
5th floor, 30659, Hannover

Solutions

  • DSharp Studio
  • DSharp Studio Modeler
  • Pricing and Licenses

FAQ

  • About us
  • Contact us
  • Recruiting

Legal

  • Privacy policy
  • DSharp Studio License Subscription Agreement

Contact us!

    © Copyright DSharp Oy
    • Link to LinkedIn
    • Link to Mail
    Scroll to top Scroll to top Scroll to top