Wednesday, January 30, 2013

Why need staging area for DWH?

·         Staging area needs to clean operational data before loading into data warehouse.
·         Cleaning in the sense your merging data which comes from different source.
·         It’s the area where most of the ETL is done

Data Cleansing
·         It is used to remove duplications
·         It is used to correct wrong email addresses
·         It is used to identify missing data
·         It used to convert the data types
·         It is used to capitalize name & addresses.

Types of Dimensions:
·         There are three types of Dimensions
·         Confirmed Dimensions
·         Junk Dimensions Garbage Dimension
·         Degenerative Dimensions

   Slowly changing Dimensions
·         Garbage Dimension or Junk Dimension
·         Confirmed is something which can be shared by multiple Fact Tables or multiple Data Marts.
·         Junk Dimensions is grouping flagged values
·         Degenerative Dimension is something dimensional in nature but exist fact table. (Invoice No)
            Which is neither fact nor strictly dimension attributes. These are useful for some kind of analysis. These are kept as attributes in fact table called degenerated dimension

Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis, such column is called degenerate dimension or line item dimension.

For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and date in key section and price, quantity, amount in measure section. In this fact table, bill_no from key section is a single value; it has no associated dimension table. Instead of creating a Separate dimension table for that single value, we can Include it in fact table to improve performance. SO here the column, bill_no is a degenerate dimension or line item dimension.


No comments:

Post a Comment

Thank you :
- kareem