What is Slowly Changing
Dimensions (SCD) ?
Slowly Changing Dimensions
Dimensions that change over time
are called Slowly Changing Dimensions. For instance, a product price changes
over time; People change their names for some reason; Country and State names
may change over time. These are a few examples of Slowly Changing Dimensions
since some changes are happening to them over a period of time.
Slowly Changing Dimensions are
often categorized into three types namely Type1, Type2 and Type3. The following
section deals with how to capture and handling these changes over time.
The "Product" table
mentioned below contains a product named, Product1 with Product ID being the
primary key. In the year 2004, the price of Product1 was $150 and over the
time, Product1's price changes from $150 to $350. With this information, let us
explain the three types of Slowly Changing Dimensions.
Product Price in 2004:
Product ID(PK) Year Product Name
Product Price
1 2004 Product1 $150
1.SCD TYPE1(Slowly Changing
Dimension) : contains current data.
2.SCD TYPE2(Slowly Changing
Dimension) : contains current data + complete historical data.
3.SCD TYPE3(Slowly Changing
Dimension) : contains current data + one type historical data.
Type 1: Overwriting the old
values.
In the year 2005, if the price of
the product changes to $250, then the old values of the columns
"Year" and "Product Price" have to be updated and replaced
with the new values. In this Type 1, there is no way to find out the old value
of the product "Product1" in year 2004 since the table now contains
only the new price and year information.
Product
Product ID(PK) Year Product Name
Product Price
1 2005 Product1 $250
Type 2: Creating an additional
record.
In this Type 2, the old values
will not be replaced but a new row containing the new values will be added to
the product table. So at any point of time, the difference between the old
values and new values can be retrieved and easily be compared. This would be
very useful for reporting purposes.
Product
Product ID(PK) Year Product Name
Product Price
1 2004 Product1 $150
1 2005 Product1 $250
The problem with the above mentioned
data structure is "Product ID" cannot store duplicate values of
"Product1" since "Product ID" is the primary key. Also, the
current data structure doesn't clearly specify the effective date and expiry
date of Product1 like when the change to its price happened. So, it would be
better to change the current data structure to overcome the above primary key
violation.
Product
Product ID(PK) Effective
DateTime(PK) Year Product Name
Product Price Expiry
DateTime
1 01-01-2004 12.00AM 2004
Product1 $150 12-31-2004 11.59PM
1 01-01-2005 12.00AM 2005
Product1 $250
In the changed Product table's
Data structure, "Product ID" and "Effective DateTime" are
composite primary keys. So there would be no violation of primary key
constraint. Addition of new columns, "Effective DateTime" and
"Expiry DateTime" provides the information about the product's
effective date and expiry date which adds more clarity and enhances the scope
of this table. Type2 approach may need additional space in the data base, since
for every changed record, an additional row has to be stored. Since dimensions
are not that big in the real world, additional space is negligible.
Type 3: Creating new fields.
In this Type 3, the latest update
to the changed values can be seen. Example mentioned below illustrates how to
add new columns and keep track of the changes. From that, we are able to see
the current price and the previous price of the product, Product1.
Product
Product ID(PK) Current
Year Product
Name Current
Product Price Old Product
Price Old Year
1 2005 Product1 $250 $150 2004
The problem with the Type 3
approach, is over years, if the product price continuously changes, then the
complete history may not be stored, only the latest change will be stored. For
example, in year 2006, if the product1's price changes to $350, then we would
not be able to see the complete history of 2004 prices, since the old values
would have been updated with 2005 product information.
Product
Product ID(PK) Year Product
Name Product
Price Old Product
Price Old Year
1 2006 Product1 $350 $250 2005
Example:
In order to store data,
over the years, many application designers in each branch have made their
individual decisions as to how an application and database should be built. So
source systems will be different in naming conventions, variable measurements,
encoding structures, and physical attributes of data. Consider a bank that has
got several branches in several countries, has millions of customers and the
lines of business of the enterprise are savings, and loans. The following
example explains how the data is integrated from source systems to target
systems.
Example of Source Data
System Name Attribute Name Column
Name Datatype Values
Source System 1 Customer
Application Date CUSTOMER_APPLICATION_DATE NUMERIC(8,0) 11012005
Source System 2 Customer
Application Date CUST_APPLICATION_DATE DATE 11012005
Source System 3 Application Date
APPLICATION_DATE DATE 01NOV2005
In the aforementioned example,
attribute name, column name, datatype and values are entirely different from
one source system to another. This inconsistency in data can be avoided by
integrating the data into a data warehouse with good standards.
Example of Target Data(Data
Warehouse)
Target System Attribute Name
Column Name Datatype Values
Record #1 Customer Application
Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #2 Customer Application
Date CUSTOMER_APPLICATION_DATE DATE 01112005
Record #3 Customer Application
Date CUSTOMER_APPLICATION_DATE DATE 01112005
In the above example of target
data, attribute names, column names, and datatypes are consistent throughout
the target system. This is how data from various source systems is integrated
and accurately stored into the data warehouse.
Very simple yet effective example of SCD2. now i got it how it is handled in my working place.
ReplyDeleteNice and very informative explanation
ReplyDeleteinformatica training, informatica training in bangalore , informatica online training , informatica online training in bangalore