Identification and elimination of
performance bottlenecks will
obviously optimize session performance. After tuning all the mapping
bottlenecks, we can further optimize session performance by increasing the
number of pipeline partitions in the session. Adding partitions can improve
performance by utilizing more of the system hardware while processing the
session.
Informatica PowerCenter
Pipeline Partition:
The Partitioning Option increases
the performance of PowerCenter through parallel data processing. This option
provides a thread-based architecture and automatic data partitioning that
optimizes parallel processing on multiprocessor and grid-based hardware
environments.
There are different Types of
Informatica Partitions, eg.
- Database partitioning
- Hash auto-keys
- Hash user keys
- Key range
- Pass-through
- Round-robin
Informatica Pipeline Partitioning
Explained
Each mapping contains one or more
pipelines. A pipeline consists of a source qualifier, all the transformations
and the target. When the Integration Service runs the session, it can achieve
higher performance by partitioning the pipeline and performing the extract,
transformation, and load for each partition in parallel.
A partition is a pipeline stage
that executes in a single reader, transformation, or writer thread. The number
of partitions in any pipeline stage equals the number of threads in the stage.
By default, the Integration Service creates one partition in every pipeline
stage. If we have the Informatica Partitioning option, we can configure
multiple partitions for a single pipeline stage.
Setting partition attributes
includes partition points, the number of partitions, and the partition types.
In the session properties we can add or edit partition points. When we change
partition points we can define the partition type and add or delete partitions
(number of partitions).
We can set the following attributes
to partition a pipeline:
Partition point:
Partition points mark thread
boundaries and divide the pipeline into stages. A stage is a section of a
pipeline between any two partition points. The Integration Service redistributes
rows of data at partition points. When we add a partition point, we increase
the number of pipeline stages by one. Increasing the number of partitions or
partition points increases the number of threads.
We cannot create partition points
at Source instances or at Sequence Generator transformations.
Number of partitions:
A partition is a pipeline stage
that executes in a single thread. If we purchase the Partitioning option, we
can set the number of partitions at any partition point. When we add partitions,
we increase the number of processing threads, which can improve session
performance. We can define up to 64 partitions at any partition point in a
pipeline. When we increase or decrease the number of partitions at any
partition point, the Workflow Manager increases or decreases the number of
partitions at all partition points in the pipeline. The number of partitions
remains consistent throughout the pipeline. The Integration Service runs the
partition threads concurrently.
Partition types:
The Integration Service creates a
default partition type at each partition point. If we have the Partitioning
option, we can change the partition type. The partition type controls how the
Integration Service distributes data among partitions at partition points.
We can define the following
partition types here: Database partitioning, Hash auto-keys, Hash user keys,
Key range, Pass-through, Round-robin.
·
Database
partitioning:
The Integration Service queries the database system for
table partition information. It reads partitioned data from the corresponding
nodes in the database.
·
Pass-through:
The Integration Service processes data without
redistributing rows among partitions. All rows in a single partition stay in
the partition after crossing a pass-through partition point. Choose
pass-through partitioning when we want to create an additional pipeline stage
to improve performance, but do not want to change the distribution of data
across partitions.
·
Round-robin:
The Integration Service distributes data evenly among
all partitions. Use round-robin partitioning where we want each partition to
process approximately the same numbers of rows i.e. load balancing.
·
Hash auto-keys:
The Integration Service uses a hash function to group
rows of data among partitions. The Integration Service groups the data based on
a partition key. The Integration Service uses all grouped or sorted ports as a
compound partition key. We may need to use hash auto-keys partitioning at Rank,
Sorter, and unsorted Aggregator transformations.
·
Hash user keys:
The Integration Service uses a hash function to group
rows of data among partitions. We define the number of ports to generate the
partition key.
·
Key range:
The Integration Service distributes rows of data based
on a port or set of ports that we define as the partition key. For each port,
we define a range of values. The Integration Service uses the key and ranges to
send rows to the appropriate partition. Use key range partitioning when the
sources or targets in the pipeline are partitioned by key range.
Points to consider while using Informatica partitions:
·
We cannot create
a partition key for hash auto-keys, round-robin, or pass-through types
partitioning If you have bitmap index defined upon the target and you are using
pass-through partitioning to, say Update the target table - the session might
fail as bitmap index creates serious locking problem in this scenario
·
Partitioning
considerably increases the total DTM buffer memory requirement for the job.
Ensure you have enough free memory in order to avoid memory allocation failures
·
When you do
pass-through partitioning, Informatica will try to establish multiple
connection requests to the database server. Ensure that database is configured
to accept high number of connection requests
·
As an alternative
to partitioning, you may also use native database options to increase degree of
parallelism of query processing. For example, in Oracle database you can either
specify PARALLEL hint or alter the DOP of the table in subject.
·
If required you
can even combine Informatica partitioning with native database level parallel
options - e.g. you create 5 pass-through pipelines, each sending query to
Oracle database with PARALLEL hint.
No comments:
Post a Comment
Thank you :
- kareem