identification
and elimination of performance bottlenecks will obviously optimize session
performance. After tuning all the mapping bottlenecks, we can further optimize
session performance by increasing the number of pipeline partitions in the
session. Adding partitions can improve performance by utilizing more of the
system hardware while processing the session.
PowerCenter
Informatica Pipeline Partition:
The PowerCenter® Partitioning Option
increases the performance of PowerCenter through parallel data processing. This
option provides a thread-based architecture and automatic data partitioning
that optimizes parallel processing on multiprocessor and grid-based hardware
environments.
There are different Types of
Informatica Partitions, eg.
- · Database partitioning
- · Hash auto-keys
- · Hash user keys
- · Key range
- · Pass-through
- · Round-robin
Each mapping contains one or more
pipelines. A pipeline consists of a source qualifier, all the transformations
and the target. When the Integration Service runs the session, it can achieve
higher performance by partitioning the pipeline and performing the extract,
transformation, and load for each partition in parallel.
A partition is a pipeline stage that
executes in a single reader, transformation, or writer thread. The number of
partitions in any pipeline stage equals the number of threads in the stage. By
default, the Integration Service creates one partition in every pipeline stage.
If we have the Informatica Partitioning option, we can configure multiple
partitions for a single pipeline stage.
Setting partition attributes includes
partition points, the number of partitions, and the partition types. In the
session properties we can add or edit partition points. When we change
partition points we can define the partition type and add or delete partitions
(number of partitions).
We can set the following attributes to
partition a pipeline:
Partition
point:
Partition points mark thread boundaries
and divide the pipeline into stages. A stage is a section of a pipeline between
any two partition points. The Integration Service redistributes rows of data at
partition points. When we add a partition point, we increase the number of
pipeline stages by one. Increasing the number of partitions or partition points
increases the number of threads.
We cannot create partition points at
Source instances or at Sequence Generator transformations.
Number
of partitions:
A partition is a pipeline stage that
executes in a single thread. If we purchase the Partitioning option, we can set
the number of partitions at any partition point. When we add partitions, we
increase the number of processing threads, which can improve session
performance. We can define up to 64 partitions at any partition point in a
pipeline. When we increase or decrease the number of partitions at any
partition point, the Workflow Manager increases or decreases the number of
partitions at all partition points in the pipeline. The number of partitions
remains consistent throughout the pipeline. The Integration Service runs the
partition threads concurrently.
Partition
types:
The Integration Service creates a
default partition type at each partition point. If we have the Partitioning
option, we can change the partition type. The partition type controls how the
Integration Service distributes data among partitions at partition points.
We can define the following partition
types here: Database partitioning, Hash auto-keys, Hash user keys, Key range,
Pass-through, Round-robin.
Database
partitioning:
The Integration Service queries the
database system for table partition information. It reads partitioned data from
the corresponding nodes in the database.
Pass-through:
The Integration Service processes data
without redistributing rows among partitions. All rows in a single partition
stay in the partition after crossing a pass-through partition point. Choose
pass-through partitioning when we want to create an additional pipeline stage
to improve performance, but do not want to change the distribution of data
across partitions.
Round-robin:
The Integration Service distributes
data evenly among all partitions. Use round-robin partitioning where we want
each partition to process approximately the same numbers of rows i.e. load
balancing.
Hash
auto-keys:
The Integration Service uses a hash
function to group rows of data among partitions. The Integration Service groups
the data based on a partition key. The Integration Service uses all grouped or
sorted ports as a compound partition key. We may need to use hash auto-keys
partitioning at Rank, Sorter, and unsorted Aggregator transformations.
Hash
user keys:
The Integration Service uses a hash
function to group rows of data among partitions. We define the number of ports
to generate the partition key.
Key
range:
The Integration Service distributes
rows of data based on a port or set of ports that we define as the partition
key. For each port, we define a range of values. The Integration Service uses
the key and ranges to send rows to the appropriate partition. Use key range
partitioning when the sources or targets in the pipeline are partitioned by key
range.
Points to consider while using
Informatica partitions:
- · We cannot create a partition key for hash auto-keys, round-robin, or pass-through types partitioning
- · If you have bitmap index defined upon the target and you are using pass-through partitioning to, say Update the target table - the session might fail as bitmap index creates serious locking problem in this scenario
- · Partitioning considerably increases the total DTM buffer memory requirement for the job. Ensure you have enough free memory in order to avoid memory allocation failures
- · When you do pass-through partitioning, Informatica will try to establish multiple connection requests to the database server. Ensure that database is configured to accept high number of connection requests
- · As an alternative to partitioning, you may also use native database options to increase degree of parallelism of query processing. For example, in Oracle database you can either specify PARALLEL hint or alter the DOP of the table in subject.
- · If required you can even combine Informatica partitioning with native database level parallel options - e.g. you create 5 pass-through pipelines, each sending query to Oracle database with PARALLEL hint.
No comments:
Post a Comment
Thank you :
- kareem