Databases like SQL Server,Oracle etc has CDC feature to enable tracking changed/inserted/deleted records. However, there are situation where we might need to implement the same using a flat file source.
For the purpose of this blog, I am going to consider an example of product master data that comes as flat file and loaded through data services.
Outline:
1) Load the initial data normally as you would do and dump it in a physical table.
2) Create separate data flow to handle the delta.
3) Use table comparison transform to identify the inserted/updated/deleted records with the target table.
4) Use map operation transform to filter out the records based out of operation type flag that is coming out of table comparison transform.
Below is the sample flow:
Configuration for Table Comparison transform
Configuration for Map Operation - Filter out only records that needs to be inserted
Similarly, filter for records that needs to be updated
Mapping fields in Map Operation transform
Attached atl file and sample data used for loading.
Below video demonstrates working example of flat file based CDC using table comparison and map operation transforms.