Here I am trying to explain the methods to implement SCD types in BO Data Service. The different types of slowly changing dimension types are given below.
1. Type 0
2. Type 1
3. Type 2
4. Type 3
5. Type 4
6. Type 6/Hybrid.
In this document I will explain about first five types of SCD types with examples.
Source Data
Name | IDate | designation |
John | 2002.12.01 | A |
Jay | 2012.12.01 | A |
Jasil | 2012.12.01 | A |
SCD TYPE 0
The SCD Type 0 method is passive. Value remains the same as it were at the time the dimension record was first entered. Type 0 also applies to most date dimension attributes.
SCD TYPE 1
This method does not track any history data .This methodology overwrite old data with new data without keeping the history. This method mainly used for misspelled names.
Let consider below given data is our target data after the first run.
ID | Name | IDate | designation |
1 | John | 2002.12.01 | A |
2 | Jay | 2012.12.01 | A |
3 | Jasil | 2012.12.01 | A |
During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.
ID | Name | IDate | designation |
1 | John | 2003.12.01 | B |
2 | Jay | 2012.12.01 | A |
3 | Jasil | 2012.12.01 | A |
Here the no history is preserved the designation and Idate are updated with new value.
In BODS it can be implemented by using following transformation.
Source:- Test_Effective_date is our source given above (Section - Source Data)
QR_MAP :- Map the source data to query transform without applying any transformation.
TBL_CPM :- Table comparison used to compare source data and the target table data .
MP_OPR :- This will be used to insert new data and update old data.
KEY_GEN :- This transform used to generate a surrogated key .
SCD TYPE 2
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key. Here 2 new column columns are inserted called start_date and end_date.
Let consider below given data is our target data after the first run.
ID | Name | IDate | designation | Start_Date | End_Date |
1 | John | 2002.12.01 | A | 2002.12.01 | 9000.12.31 |
2 | Jay | 2002.12.01 | A | 2002.12.01 | 9000.12.31 |
3 | Jasil | 2002.12.01 | A | 2002.12.01 | 9000.12.31 |
During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.
ID | Name | IDate | Designation | Start_date | End_date |
1 | John | 2002.12.01 | A | 2002.12.01 | 2003.12.01 |
2 | Jay | 2002.12.01 | A | 2002.12.01 | 9000.12.31 |
3 | Jasil | 2002.12.01 | A | 2002.12.01 | 9000.12.31 |
4 | John | 2003.12.01 | B | 2003.12.01 | 9000.12.31 |
Here a new row is inserted and the end_date of the first row will be the start date of the newly updated value. (note:-‘ 9000.12.31’ is the default value)
In BODS it can be implemented by using following transformation.
SOURCE : This is our data as given above(Section - Source Data)
QR_MAP : Map the source data to query transform without applying any transformation.
TBL_CPM :- Table comparison used to compare source data and the target table data .
HIS_PRES:- This transform is used to store the history data and if any updates on source data then new row will be inserted.
KEY_GEN :- This transform used to generate a surrogated key .
SCD TYPE 3
This method tracks changes using separate columns and preserves limited history. The Type II preserves unlimited history as it's limited to the number of columns designated for storing historical data. The original table structure of Type II differs from Type II by Type III adds additional columns. In the following example, an additional column has been added to the table to record the original target- only the previous history is stored.
Let consider below given data is our target data after the first run.
ID | Name | IDate | Curr_Designation | Effective_date | Pre_Designation |
1 | John | 2002.12.01 | A | 2002.12.01 | A |
2 | Jay | 2002.12.01 | A | 2002.12.01 | A |
3 | Jasil | 2002.12.01 | A | 2002.12.01 | A |
During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be.
In the target output an extra column will be added to keep the previous value of that particular column.
ID | Name | IDate | Curr_Designation | Effective_date | Pre_designaion |
1 | John | 2003.12.01 | B | 2003.12.01 | A |
2 | Jay | 2002.12.01 | A | 2002.12.01 | A |
3 | Jasil | 2002.12.01 | A | 2002.12.01 | A |
(Note:- on the first run the Pre_designaion and Current Designation are same)
In BODS it can be implemented by using following transformation.
SOURCE : This is our data as given above(Section - Source Data).
QR_JOIN : This query transform is used to join the Source with target.
This will be a left outer join and the Source will be the outer source and the target is the inner source. This join will based on the name column on the both side.(Source.name = SCD3.name)
QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .
QR_UPDATE : This query transform used to filter the data which is already existed in target table but the designation is updated that means SCD3.name is not null and the designation from the source and the previous designation from the SCD3 table are not same.
MP_UPDATE : This transform used to update the target table by setting the map operation as ‘normal to update’
KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.
During the first runthe source file have the data as given above and the target table will not have any data so that all set of data will moved to QR_INSERT flow and loaded into target table.
SCD TYPE 4
SCD Type 4 design technique is used when SCD Type 2 dimension grows rapidly due to the frequently changing dimension attributes. In SCD Type 4, frequently changing attributes will be removed from the main Table and added in to a History Table.
The Type 4 method is usually referred to as using "history tables", where one table keeps the current data and an additional table is used to keep a record of some or all changes.
Let consider below given data is our target data after the first run.
ID | Name | IDate | designation |
1 | John | 2002.12.01 | A |
2 | Jay | 2002.12.01 | A |
3 | Jasil | 2002.12.01 | A |
And the History table will not have any data.
During the next run, Consider the designation of John is changed to ‘B’ on the date 2003.12.01 then the output will be. There will be two table one will keep the current data and other will be the changed data (history).
SCD4_CUR
ID | Name | IDate | designation |
1 | John | 2003.12.01 | B |
2 | Jay | 2002.12.01 | A |
3 | Jasil | 2002.12.01 | A |
SCD4_HIST
The history table will have an addition column compared to the source that will be created date.
ID | Name | IDate | designation | Created_date |
1 | John | 2002.12.01 | A |
|
(Note: created date will be is the date when the designation is changed)
In BODS it can be implemented by using following transformation.
SOURCE : This is our data as given above(Section - Source Data).
QR_JOIN : This query transform is used to join the Source with target.
This will be a left outer join and the Source will be the outer source and the target is the inner source. This join will based on the name column on the both side.(Source.name = SCD4_Curr.name)
QR_INSERT :- This query transform used to filter the data which is new that means SCD3.name is null .
QR_UPDATE : This query transform used to filter the data which is already existed in target table but the designation is updated that means SCD4_Curr.name is not null and the designation from the source and the previous designation from the SCD4_curr table are not same.
QR_CURR: This Query transform is used to updated the target table with new designation .
QR_HIST : This Query transform is used to updated the target table with Previous designation .
MP_UPDATE : This transform used to update the target table by setting the map operation as ‘normal to update’
KEY_GEN :- Key generation is used to generate an surrogated key for newly inserted row.
During the first runthe source file have the data as given above and the target table will not have any data so that all set of data will moved to QR_INSERT flow and loaded into target table.
There are different methods to implement SCD types in BODS and above explained is one method to implement the same.
Regards
Asgar