Informatica Tuning - Step by Step Approach
This is the first of the number of
articles on the series of Data Warehouse Application performance tuning
scheduled to come every week. This one is on Informatica performance tuning.
Source Query/ General Query Tuning
1.1 Calculate original query cost
1.2 Can the query be re-written to
reduce cost?
§
Can IN clause be
changed with EXISTS?
§
Can a UNION be
replaced with UNION ALL if we are not using any DISTINCT cluase in query?
§
Is there a redundant
table join that can be avoided?
§
Can we include
additional WHERE clause to further limit data volume?
§
Is there a redundant
column used in GROUP BY that can be removed?
§
Is there a redundant
column selected in the query but not used anywhere in mapping?
1.3 Check if all the major joining
columns are indexed
1.4 Check if all the major filter
conditions (WHERE clause) are indexed
§
Can a function-based
index improve performance further?
1.5 Check if any exclusive query hint
reduce query cost
§
Check if parallel
hint improves performance and reduce cost
1.6 Recalculate query cost
§
If query cost is
reduced, use the changed query
Tuning Informatica LookUp
2.1 Redundant Lookup transformation
§
Is there a lookup
which is no longer used in the mapping?
§
If there are
consecutive lookups, can those be replaced inside a single lookup override?
2.2 LookUp conditions
§
Are all the lookup
conditions indexed in database? (Uncached lookup only)
§
An unequal condition
should always be mentioned after an equal condition
2.3 LookUp override query
§
Should follow all
guidelines from 1. Source Query part above
2.4 There is no unnecessary column
selected in lookup (to reduce cache size)
2.5 Cached/Uncached
§
Carefully consider
whether the lookup should be cached or uncached
§
General Guidelines
§
Generally don't use
cached lookup if lookup table size is > 300MB
§
Generally don't use
cached lookup if lookup table row count > 20,000,00
§
Generally don't use
cached lookup if driving table (source table) row count < 1000
2.6 Persistent Cache
§
If found out that a
same lookup is cached and used in different mappings, Consider persistent cache
2.7 Lookup cache building
§
Consider "Additional
Concurrent Pipeline" in session property to build cache concurrently
§
"Prebuild Lookup
Cache" should be enabled, only if the lookup is surely called in the
mapping
Tuning Informatica Joiner
3.1 Unless unavoidable, join database
tables in database only (homogeneous join) and don't use joiner
3.2 If Informatica joiner is used,
always use Sorter Rows and try to sort it in SQ Query itself using Order By (If
Sorter Transformation is used then make sure Sorter has enough cache to perform
1-pass sort)
3.3 Smaller of two joining tables
should be master
Tuning Informatica Aggregator
4.1 When possible, sort the input for
aggregator from database end (Order By Clause)
4.2 If Input is not already sorted, use
SORTER. If possible use SQ query to Sort the records.
Tuning Informatica Filter
5.1 Unless unavoidable, use filteration
at source query in source qualifier
5.2 Use filter as much near to source
as possible
Tuning Informatica Sequence Generator
6.1 Cache the sequence generator
Setting Correct Informatica Session Level
Properties
7.1 Disable "High Precision"
if not required (High Precision allows decimal upto 28 decimal points)
7.2 Use "Terse" mode for
tracing level
7.3 Enable pipeline partitioning (Thumb
Rule: Maximum No. of partitions = No. of CPU/1.2) (Also remember increasing
partitions will multiply the cache memory requirement accordingly)
Tuning Informatica Expression
8.1 Use Variable to reduce the
redundant calculation
8.2 Remove Default value "
ERROR('transformation error')" for Output Column.
8.3 Try to reduce the Code complexity
like Nested If etc.
8.4 Try to reduce the Unneccessary Type
Conversion in Calculation
please explain any real project
ReplyDelete