Quantcast
Channel: SCN : All Content - Data Services and Data Quality
Viewing all articles
Browse latest Browse all 4013

SAP Data Services Initial vs. Delta Load

$
0
0

Initial vs. Delta Load

As said in the SAP Data Services Sections blog, the fact table component/section consists of a first workflow named WF_xxxxDims with all the Components for the dimension tables of this star schema and a Conditional for either performing the initial or delta load of the fact table. But what is the actual difference between those two? Very often the initial load can be build similar to the delta load - only difference is the range of days that should be loaded. In other words, $G_SDATE will have a value of last year or so for the initial load whereas for a delta load it will be e.g. yesterday. So the embedded dataflow extracting the data can be the same for both. At the DataFlow level, you will find a difference: Initial load will truncate the table; Delta load will use either AutoCorrect load option or have Table Comparison Transform before the table loader.

As a result, it is beneficial to use the Embedded DataFlows at least here, so changes have to be applied just once reducing the maintenance overhead and reducing the chance of inconsistencies between the two.

 

Restartability for Initial Loads

In case the Initial Load failed or has to be commenced a second time, the End User simply has to restart it. Just imagine a new fact attribute should get added therefore all have to be reloaded - better think when building the flows rather than having tons of tasks that have to be followed or your load will fail or worse have some data twice then.

Tables already loaded get truncated automatically via either a script or using the "delete data before load" option.

And the PreLoad Stored Procedure takes care about all the Indexes or other database objects that might have been created already and should be disabled first.


Restartability for Delta Loads

Same with Delta Loads. In case a load failed during the night, you do not want to waste time in the morning with tasks that have to be completed before a new load can be performed. You will be in hurry, just execute the job and all should be taken care of automatically.

For simplicity, very often you will use "delete data before load" for the small to midsized dimension tables and use Table Comparison when performing a delta load into one table. This way, you can guarantee that no data is lost - maybe read twice but not lost - and does not occur twice as the Table Comparison (sorted input) Transform will update already existing rows.


Supports the Recovery Feature

If a Job was executed in recovery mode, in the repository a list of all already executed elements is kept. So in case the load failed, you have the option to restart the flow that caused the error and continue from there, rather than starting all again, just to find out that still the last flow fails because of e.g. not enough disk space.

But again, this feature restarts the entire e.g. DataFlow, it does not capture the status inside a flow - this would be impossible (how do you guarantee that the same data is read from the database in the same order??) or would at least require lots of overhead to write the current buffer to disk. So all your objects should be aware that they might be started a second time.

Actually this is not much different to the rules above, Restartability for Delta Loads, as we use e.g. Table Comparison Transform anyway. But there is one important issue: When you used a script to truncate/delete rows prior to loading. In case the job is restarted at the point of failure, the script was already executed successfully, hence it would be skipped. To deal with that issue, those WorkFlows have to have the property Recover as a Unit turned on, so they are treated as successfully executed only once everything inside was executed without a problem, otherwise they start the first object inside this WorkFlow again.


Viewing all articles
Browse latest Browse all 4013

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>