· Staging area needs to clean
operational data before loading into data warehouse.
· Cleaning in the sense your
merging data which comes from different source.
· It’s the area where most of the
ETL is done
Data
Cleansing
· It is used to remove
duplications
· It is used to correct wrong
email addresses
· It is used to identify missing
data
· It used to convert the data
types
· It is used to capitalize name
& addresses.
Types of
Dimensions:
· There are three types of
Dimensions
· Confirmed Dimensions
· Junk Dimensions Garbage
Dimension
· Degenerative Dimensions
Slowly
changing Dimensions
· Garbage Dimension or Junk
Dimension
· Confirmed is something which
can be shared by multiple Fact Tables or multiple Data Marts.
· Junk Dimensions is grouping
flagged values
· Degenerative Dimension is
something dimensional in nature but exist fact table. (Invoice No)
Which
is neither fact nor strictly dimension attributes. These are useful for
some kind of analysis. These are kept as attributes in fact table called
degenerated dimension
Degenerate
dimension: A column
of the key section of the fact table that does not have the associated
dimension table but used for reporting and analysis, such column is
called degenerate dimension or line item dimension.
For
ex, we have a fact table with customer_id, product_id,
branch_id, employee_id, bill_no, and date in key section and price,
quantity, amount in measure section. In this fact table, bill_no from key
section is a single value; it has no associated dimension table. Instead of
creating a Separate dimension table for that single value, we can Include it in
fact table to improve performance. SO here the column, bill_no is a degenerate
dimension or line item dimension.
No comments:
Post a Comment
Thank you :
- kareem