Database partitioning. The Integration Service queries the IBM DB2 or Oracle system for
table partition information. It reads partitioned data from the corresponding
nodes in the database. Use database partitioning with Oracle or IBM DB2 source
instances on a multi-node table space. Use database partitioning with DB2
targets.
Hash partitioning. Use hash partitioning when you want the Integration Service to
distribute rows to the partitions by group. For example, you need to sort items
by item ID, but you do not know how many items have a particular ID number.
You can use the following types of hash
partitioning:
·
Hash auto-keys.
The Integration Service uses all grouped or sorted ports as a compound
partition key. You may need to use hash auto-keys partitioning at Rank, Sorter,
and unsorted Aggregator transformations.
·
Hash user keys.
The Integration Service uses a hash function to group rows of data among
partitions. You define the number of ports to generate the partition key.
·
Key range.
You specify one or more ports to form a compound partition key. The Integration
Service passes data to each partition depending on the ranges you specify for
each port. Use key range partitioning where the sources or targets in the
pipeline are partitioned by key range.
·
Pass-through. The
Integration Service passes all rows at one partition point to the next
partition point without redistributing them. Choose pass-through partitioning
where you want to create an additional pipeline stage to improve performance,
but do not want to change the distribution of data across partitions.
·
Round-robin.
The Integration Service distributes data evenly among all partitions. Use
round-robin partitioning where you want each partition to process approximately
the same number of rows.
Setting Partition Types in the Pipeline
Source qualifier. To read data from the three flat
files concurrently, you must specify three partitions at the source qualifier.
Accept the default partition type, pass-through.
Filter transformation. Since the source files vary in
size, each partition processes a different amount of data. Set a partition
point at the Filter transformation, and choose round-robin partitioning to
balance the load going into the Filter transformation.
Sorter transformation. To eliminate overlapping groups
in the Sorter and Aggregator transformations, use hash auto-keys partitioning
at the Sorter transformation. This causes the Integration Service to group all
items with the same description into the same partition before the Sorter and
Aggregator transformations process the rows. You can delete the default
partition point at the Aggregator transformation.
Target. Since the target tables are
partitioned by key range, specify key
The
Integration Services creates a default partition type at each partition point.
If you have the Partitioning option, you can change the partition type. The
partition type controls how the Integration Service distributes data among
partitions at partition points.
·
When you configure
the partitioning information for a pipeline, you must define a partition type
at each partition point in the pipeline. The partition type determines how the
Integration Service redistributes data across partition points.
·
You can delete the
default partition point at the Aggregator transformation because hash auto-keys
partitioning at the Sorter transformation sends all rows that contain items
with the same description to the same partition. Therefore, the Aggregator
transformation receives data for all items with the same description in one
partition and can calculate the average costs and prices for this item
correctly.
·
When you use this
mapping in a session, you can increase session performance by defining
different partition types at the following partition points in the pipeline:
No comments:
Post a Comment
Thank you :
- kareem