Thursday, November 8, 2012

Informatica Tuning - Step by Step Approach


Informatica Tuning - Step by Step Approach
This is the first of the number of articles on the series of Data Warehouse Application performance tuning scheduled to come every week. This one is on Informatica performance tuning.
Source Query/ General Query Tuning
1.1 Calculate original query cost
1.2 Can the query be re-written to reduce cost?
§  Can IN clause be changed with EXISTS?
§  Can a UNION be replaced with UNION ALL if we are not using any DISTINCT cluase in query?
§  Is there a redundant table join that can be avoided?
§  Can we include additional WHERE clause to further limit data volume?
§  Is there a redundant column used in GROUP BY that can be removed?
§  Is there a redundant column selected in the query but not used anywhere in mapping?
1.3 Check if all the major joining columns are indexed
1.4 Check if all the major filter conditions (WHERE clause) are indexed
§  Can a function-based index improve performance further?
1.5 Check if any exclusive query hint reduce query cost
§  Check if parallel hint improves performance and reduce cost
1.6 Recalculate query cost
§  If query cost is reduced, use the changed query
Tuning Informatica LookUp
2.1 Redundant Lookup transformation
§  Is there a lookup which is no longer used in the mapping?
§  If there are consecutive lookups, can those be replaced inside a single lookup override?
2.2 LookUp conditions
§  Are all the lookup conditions indexed in database? (Uncached lookup only)
§  An unequal condition should always be mentioned after an equal condition
2.3 LookUp override query
§  Should follow all guidelines from 1. Source Query part above
2.4 There is no unnecessary column selected in lookup (to reduce cache size)
2.5 Cached/Uncached
§  Carefully consider whether the lookup should be cached or uncached
§  General Guidelines
§  Generally don't use cached lookup if lookup table size is > 300MB
§  Generally don't use cached lookup if lookup table row count > 20,000,00
§  Generally don't use cached lookup if driving table (source table) row count < 1000
2.6 Persistent Cache
§  If found out that a same lookup is cached and used in different mappings, Consider persistent cache
2.7 Lookup cache building
§  Consider "Additional Concurrent Pipeline" in session property to build cache concurrently
§  "Prebuild Lookup Cache" should be enabled, only if the lookup is surely called in the mapping
Tuning Informatica Joiner
3.1 Unless unavoidable, join database tables in database only (homogeneous join) and don't use joiner
3.2 If Informatica joiner is used, always use Sorter Rows and try to sort it in SQ Query itself using Order By (If Sorter Transformation is used then make sure Sorter has enough cache to perform 1-pass sort)
3.3 Smaller of two joining tables should be master
Tuning Informatica Aggregator
4.1 When possible, sort the input for aggregator from database end (Order By Clause)
4.2 If Input is not already sorted, use SORTER. If possible use SQ query to Sort the records.
Tuning Informatica Filter
5.1 Unless unavoidable, use filteration at source query in source qualifier
5.2 Use filter as much near to source as possible
Tuning Informatica Sequence Generator
6.1 Cache the sequence generator
Setting Correct Informatica Session Level Properties
7.1 Disable "High Precision" if not required (High Precision allows decimal upto 28 decimal points)
7.2 Use "Terse" mode for tracing level
7.3 Enable pipeline partitioning (Thumb Rule: Maximum No. of partitions = No. of CPU/1.2) (Also remember increasing partitions will multiply the cache memory requirement accordingly)
Tuning Informatica Expression
8.1 Use Variable to reduce the redundant calculation
8.2 Remove Default value " ERROR('transformation error')" for Output Column.
8.3 Try to reduce the Code complexity like Nested If etc.
8.4 Try to reduce the Unneccessary Type Conversion in Calculation

1 comment:

Thank you :
- kareem