Saturday, January 19, 2013

Partition Types Overview

Database partitioning. The Integration Service queries the IBM DB2 or Oracle system for table partition information. It reads partitioned data from the corresponding nodes in the database. Use database partitioning with Oracle or IBM DB2 source instances on a multi-node table space. Use database partitioning with DB2 targets. 

Hash partitioning. Use hash partitioning when you want the Integration Service to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number.

You can use the following types of hash partitioning:

·         Hash auto-keys. The Integration Service uses all grouped or sorted ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and unsorted Aggregator transformations.

·         Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. You define the number of ports to generate the partition key.

·         Key range. You specify one or more ports to form a compound partition key. The Integration Service passes data to each partition depending on the ranges you specify for each port. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.

·         Pass-through. The Integration Service passes all rows at one partition point to the next partition point without redistributing them. Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions.

·         Round-robin. The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows.


Setting Partition Types in the Pipeline

Source qualifier. To read data from the three flat files concurrently, you must specify three partitions at the source qualifier. Accept the default partition type, pass-through.

Filter transformation. Since the source files vary in size, each partition processes a different amount of data. Set a partition point at the Filter transformation, and choose round-robin partitioning to balance the load going into the Filter transformation.

Sorter transformation. To eliminate overlapping groups in the Sorter and Aggregator transformations, use hash auto-keys partitioning at the Sorter transformation. This causes the Integration Service to group all items with the same description into the same partition before the Sorter and Aggregator transformations process the rows. You can delete the default partition point at the Aggregator transformation.

Target. Since the target tables are partitioned by key range, specify key
The Integration Services creates a default partition type at each partition point. If you have the Partitioning option, you can change the partition type. The partition type controls how the Integration Service distributes data among partitions at partition points.


·         When you configure the partitioning information for a pipeline, you must define a partition type at each partition point in the pipeline. The partition type determines how the Integration Service redistributes data across partition points.

·         You can delete the default partition point at the Aggregator transformation because hash auto-keys partitioning at the Sorter transformation sends all rows that contain items with the same description to the same partition. Therefore, the Aggregator transformation receives data for all items with the same description in one partition and can calculate the average costs and prices for this item correctly.

·         When you use this mapping in a session, you can increase session performance by defining different partition types at the following partition points in the pipeline:


No comments:

Post a Comment

Thank you :
- kareem