Introduction:
This document gives overview of standard recovery mechanism in Data Services.
Overview: Data Services provides one of the best inbuilt features to recover job from failed state. By enabling recovery, job will start running from failed instance
DS provides 2 types of recovery
Recovery: By default recovery is enabled at Dataflow level i.e. Job will always start from the dataflow which raised exception.
Recovery Unit: If you want to enable recovery at a set of actions, you can achieve this with recovery unit option. Define all your actions it in a Workflow and enable recovery unit under workflow properties. Now in recovery mode this workflow will run from beginning instead of running from failed point.
When recovery is enabled, the software stores results from the following types of steps:
- Work flows
- Batch data flows
- Script statements
- Custom functions (stateless type only)
- SQL function
- exec function
- get_env function
- rand function
- sysdate function
- systime function
Example:
This job will load data from Flat file to Temporary Table. (I am repeating the same to raise Primary Key exception)
Running the job:
To recover the job from failed instance, first job should be executed by enabling recovery. We can enable under execution properties.
Below Trace Log shows that Recovery is enabled for this job.
job failed at 3rd DF in 1st WF. Now i am running job in recovery mode
Trace log shows that job is running in Recovery mode using recovery information from previous run and Starting from Data Flow 3 where exception is raised.
DS Provides Default recovery at Dataflow Level
Recovery Unit:
With recovery, job will always starts at failed DF in recovery run irrespective of the dependent actions.
Example: Workflow WF_RECOVERY_UNIT has two Dataflows loading data from Flat file. If any of the DF failed, then both the DFs have to run again.
To achieve, This kind of requirement, we can define all the Activities and make that as recovery unit. When we run the job in recovery mode, if any of the activity is failed, then it starts from beginning.
To make a workflow as recovery unit, Check recovery Unit option under workflow properties.
Once this option is selected,on the workspace diagram, the black "x" and green arrow symbol indicate that a work flow is a recovery unit.
Two Data Flows under WF_RECOVERY_UNIT
Running the job by enabling recovery , Exception encountered at DF5.
Now running in recovery mode. Job uses recovery information of previous run. As per my requirement, job should run all the activities defined under Work Flow WF_RECOVERY_UNIT instead of failed DataFlow.
Now Job Started from the beginning of the WF_RECOVERY_UNIT and all the Activities defined inside the workflow will run from the beginning insted of starting from Failed DF (DF_RECOVERY_5).
Exceptions:
when you specify a work flow or a data flow should only execute once, a job will never re-execute that work flow or data flow after it completes successfully, except if that work flow or data flow is contained within a recovery unit work flow that re-executes and has not completed successfully elsewhere outside the recovery unit.
It is recommended that you not mark a work flow or data flow as Execute only once when the work flow or a parent work flow is a recovery unit.