The "Slowly Changing Dimension" problem is a
common one particular to data warehousing. In a nutshell, this applies to cases
where the attribute for a record varies over time. We give an example below:
Christina
is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the
original entry in the customer lookup table has the following record:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
At a
later date, she moved to Los Angeles, California on January, 2003. How should
ABC Inc. now modify its customer table to reflect this change? This is the
"Slowly Changing Dimension" problem.
There are
in general three ways to solve this type of problem, and they are categorized
as follows:
Slowly
Changing Dimension Type 1: The new
record replaces the original record. No trace of the old record exists.
In Type 1 Slowly Changing Dimension, the new information simply
overwrites the original information. In other words, no history is kept.
In our example, recall we
originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from
Illinois to California, the new information replaces the new record, and we
have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
California
|
Advantages:
- This is the easiest way
to handle the Slowly Changing Dimension problem, since there is no need to keep
track of the old information.
Disadvantages:
- All history is lost. By
applying this methodology, it is not possible to trace back in history. For example,
in this case, the company would not be able to know that Christina lived in
Illinois before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing
dimension should be used when it is not necessary for the data warehouse to
keep track of historical changes.
Slowly
Changing Dimension Type 2: A new
record is added into the customer dimension table. Therefore, the customer is
treated essentially as two people.
In Type 2 Slowly Changing
Dimension, a new record is added to the table to represent the new information.
Therefore, both the original and the new record will be present. The newe
record gets its own primary key.
In our example, recall we
originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from
Illinois to California, we add the new information as a new row into the table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
1005
|
Christina
|
California
|
Advantages:
- This allows us to
accurately keep all historical information.
Disadvantages:
- This will cause the size
of the table to grow fast. In cases where the number of rows for the table is
very high to start with, storage and performance can become a concern.
- This necessarily
complicates the ETL process.
Usage:
About 50% of the time.
When to use Type 2:
Type 2 slowly changing
dimension should be used when it is necessary for the data warehouse to track
historical changes.
Slowly
Changing Dimension Type 3: The
original record is modified to reflect the change.
In Type 3 Slowly Changing
Dimension, there will be two columns to indicate the particular attribute of
interest, one indicating the original value, and one indicating the current
value. There will also be a column that indicates when the current value
becomes active.
In our example, recall we
originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
To accommodate Type 3
Slowly Changing Dimension, we will now have the following columns:
- Customer Key
- Name
- Original State
- Current State
- Effective Date
After Christina moved from
Illinois to California, the original information gets updated, and we have the
following table (assuming the effective date of change is January 15, 2003):
Customer Key
|
Name
|
Original State
|
Current State
|
Effective Date
|
1001
|
Christina
|
Illinois
|
California
|
15-JAN-2003
|
Advantages:
- This does not increase
the size of the table, since new information is updated.
- This allows us to keep
some part of history.
Disadvantages:
- Type 3 will not be able
to keep all history where an attribute is changed more than once. For example,
if Christina later moves to Texas on December 15, 2003, the California
information will be lost.
Usage:
Type 3 is rarely used in
actual practice.
When to use Type 3:
Type III slowly changing
dimension should only be used when it is necessary for the data warehouse to
track historical changes, and when such changes will only occur for a finite
number of time.
We next
take a look at each of the scenarios and how the data model and the data looks
like for each of them. Finally, we compare and contrast among the three
alternatives.
No comments:
Post a Comment
Thank you :
- kareem