Wednesday, November 7, 2012

Change Data Capture (CDC)


 This article explains the Change Data Capture mechanism using Informatica Mapping Variable. We can use the Informatica Mapping Variable to extract the CDC data without using any other custom table. Here it goes.
Suppose We have an employee source table having a column namely LAST_UPDATED_DATE. Whenever there is any change in the status of the employee or when a new employee joins the organisation this column value also gets updated. Now we load this employee data in our data warehouse. To extract only the changed dataset we can use a custom etl load table that keeps a track of the last load date and runs the etl the next time etracting only those record sets having the LAST_UPDATED_DATE of the source table greater than the last etl load date time.
What is Informatica Variable
A Informatica Mapping Variable represents a value that can change through the session. The Integration Service saves the value of a mapping variable to the repository at the end of each successful session run. It uses that saved value the next time when we run the session.
At the beginning of a session, the Integration Service evaluates references to a variable to determine the start value. We can define the start value either as the initial default value or may define the variable name and value in the parameter file. Variable functions like SetMaxVariable, SetMinVariable, SetVariable, SetCountVariable are used in the mapping to change the value of the variable. At the end of a successful session, the Integration Service saves the final value of the variable to the repository. The next time we run the session, the Integration Service evaluates references to the variable to the saved value. To override the saved value, define the start value of the variable in the parameter file or assign a value in the pre-session variable assignment in the session properties.
Initial and Default Values of Informatica Variable
When we declare a mapping variable in a mapping, we can enter an initial value. The Integration Service uses the configured initial value for the mapping variable when the variable value is not defined in the parameter file or there is no saved variable value in the repository.
Default Values for Mapping Variables Based on Datatype:
§  String - Empty string
§  Numeric - 0
§  Datetime - 1/1/1
For example, we create a new mapping using an Datetime mapping variable, $$CDC_DT. Suppose we do not configure an initial value for the variable or define it in a parameter file. The first time when we run the session, the Integration Service uses the default value for Datetime datatypes i.e. 1/1/1.
The Integration Service holds two different values for a mapping variable during a session run.
§  Start value of a mapping variable: The start value is the value of the variable at the start of the session.
§  Current value of a mapping variable: The current value is the value of the variable as the session progresses.
When a session starts, the current value of a variable is the same as the start value. As the session progresses, the Integration Service calculates the current value using the variable function [SetMaxVariable, SetMinVariable, SetVariable or SetCountVariable] used in the mapping. Use variable functions only once for each mapping variable in a pipeline else it will give inconsistent results. The Integration Service evaluates the current value of a variable as each row passes through the mapping. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository. The Integration Service states the value saved to the repository for each mapping variable in the session log also.
Note: If any of the variable functions is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository.
The precedence to check initial or start value of the variable by the integration server is as follows:
§  Value in parameter file.
§  Value in pre-session variable assignment.
§  Value saved in the repository.
§  Initial value.
§  Datatype default value.
Aggregation Type of Informatica Variable
When we declare a mapping variable in a mapping, we need to configure the aggregation type for the variable. The Integration Service uses the aggregate type of a mapping variable to determine the final current value of the mapping variable and saves the value into the repository. Three types of Aggregation available are CountMax and Min
Using Mapping Variable
Let us now use mapping variables to perform incremental reads of a source table
§  The session fails to complete.
§  The session is configured for a test load.
§  The session is a debug session.
§  The session runs in debug mode and is configured to discard session output.

No comments:

Post a Comment

Thank you :
- kareem