ETL-Life Cycle
The typical real-life ETL cycle consists of the
following execution steps:
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for
data integrity, create aggregates or disaggregates)
6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with
business rules. Also, in case of failure, helps to diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up
Best practices
Four-layered approach for ETL architecture design
- Functional layer: Core functional ETL processing
(extract, transform, and load).
- Operational management layer: Job-stream definition and management,
parameters, scheduling, monitoring, communication and alerting.
- Audit, balance and control (ABC) layer: Job-execution statistics, balancing and
controls, rejects- and error-handling, codes management.
- Utility layer: Common components supporting all other
layers.
Use file-based ETL processing where possible
- Storage costs relatively little
- Intermediate files serve multiple purposes:
- Used for testing and debugging
- Used for restart and recover processing
- Used to calculate control statistics
- Helps to reduce dependencies - enables modular
programming.
- Allows flexibility for job-execution and
-scheduling
- Better performance if coded properly, and can
take advantage of parallel processing capabilities when the need arises.
Use data-driven methods and minimize custom ETL coding
- Parameter-driven jobs, functions, and job-control
- Code definitions and mapping in database
- Consideration for data-driven tables to support
more complex code-mappings and business-rule application.
Qualities of a good ETL architecture design:
- Performance
- Scalable
- Migratable
- Recoverable (run_id, ...)
- Operable (completion-codes for phases, re-running
from checkpoints, etc.)
- Auditable (in two dimensions: business
requirements and technical troubleshooting)
No comments:
Post a Comment
Thank you :
- kareem