Saturday, October 20, 2012

DWH Modeling Methodologies




Conformed facts

In addition to conformed dimensions, you need conformed facts. Conforming a fact really amounts to standardizing the definitions of terms across individual marts. Often, different divisions or departments use the same term in different ways. Does “revenue” refer to “gross revenue” or “adjusted revenue”? Does “units shipped” refer to cases of items or individual items?
Make certain your design team develops, early on, a uniform enterprise taxonomy—and enforce it.
Conformed dimensions
Conformed dimensions can be used to analyze facts from two or more data marts. Suppose you have a “shipping” data mart (telling you what you’ve shipped to whom and when) and a “sales” data mart (telling you who has purchased what and when). Both marts require a “customer” dimension and a “time” dimension. If they’re the samedimension, then you have conforming dimensions, allowing you to extract and manipulate facts relating to a particular customer from both marts, answering questions such as whether late shipments have affected sales to that customer.
Suppose now that you add a “marketing” data mart to help you analyze product promotions. Again, with conformed customer and time dimensions, you’re able to analyze the effects of a particular product promotion on sales. (Analyzing facts from more than one fact table in this way is termed “drilling across.” My previous article, “Thinking dimensionally aids business intelligence design and use,” explains the function of facts and dimensions.)

As this example shows, the very same conformed dimensions—in this case, time and customer dimensions—have meaning in the context of three independently developed data marts. These dimensions become enterprise property and can be used later in other marts as you evolve the enterprise data warehouse.

Semi Additive Facts
A semi additive fact is one where the measure can either have only a subset of aggregations applied to it, it you can count the measures but not sum them, or the measures are only additive over a subset of the dimensions.
Using our "daily_balances" fact above would be a good example of a semi additive fact. The daily balances can be aggregated by customer if the customer has multiple accounts to give the customers daily balance, however the balances could not be aggregated over time as adding last week’s balance onto this week’s balance would result in a nonsensical figure.
Non Additive Facts
A non-additive fact is one where the measure is non agreeable over any dimensions. These are commonly where percentages have been calculated and stored in the fact. Another example could be a profit margin on a sale, there is this figure other than at an individual sale level.
Additive Facts

A fully additive fact is one where the measures can be aggregated. 
For example, our Sales fact above would be fully additive as you can aggregate the sales amount over time, by product, by region or by salesman and still get the correct answer.

Transaction Grain
This is the most common type of fact. You would declare the grain of the fact, ie the level of detail and then this is what would be stored. For example, you may have a sales order fact, every time a new sales order a new row would be created in the fact table. alternatively, you may have a "monthly_sales" fact. At the end of every month you would aggregate up all the sales that happened in that month and record the single total value.

Snapshot Facts
The snapshot fact contains a reflection of the state of an entity at a given point in time. A classic example of this would be a "daily_balance" fact in a banking system. This would, on a daily basis record the balance of each account, it would NOT list the individual transactions that happened on the account.
Factless Facts

A Factless fact is where the fact does not store an actual numerical measure, the mere existence of a fact record indicates that an event has happened that you wish to track. The classic example of this would be an "Attendance" fact. If you had dimensions to record date, scheduled_course, instructor and delegate then you could create a fact table that held the permutations of these dimensions. From this you could evaluate the number of courses you run, the number of delegates, the number of courses by instructor etc.
I would never simply leave a Factless fact as a bare collection of foreign key columns I would always add a dummy measure column which would be set to 1 which you would then sum.

Accumulating Fact Table
An accumulating fact table is where all of the dimensional attributes are not available at the time of creation and the dimensions that are linked to a fact table change over time. The most common implementation of this is in the recording of dates against facts. 

Take a "Sales" fact, typical dates you may be intersted in when tracking an individual sale is maybe, order_date, ship_date, delivery_date and payment_date. These would not all be available when the fact is first created. Over time the fact record would accumulatemore relationships with the dimensions as the relevant date milestones were passed for the sale.

The differences between a logical data model and physical data model



Logical vs Physical Data Modeling

Logical Data Model
Physical Data Model
Represents business information and defines business rules
Represents the physical implementation of the model in a database.
Entity
Table
Attribute
Column
Primary Key
Primary Key Constraint
Alternate Key
Unique Constraint or Unique Index
Inversion Key Entry
Non Unique Index
Rule
Check Constraint, Default Value
Relationship
Foreign Key
Definition
Comment

Physical Data Modeling

Features of physical data model include:
·         Specification all tables and columns.
·         Foreign keys are used to identify relationships between tables.
·         Demoralization may occur based on user requirements.
·         Physical considerations may cause the physical data model to be quite different from the logical data model.
At this level, the data modeler will specify how the logical data model will be realized in the database schema.
The steps for physical data model design are as follows:
1.        Convert entities into tables.
2.        Convert relationships into foreign keys.
3.        Convert attributes into columns.
4.        Modeling is an efficient and effective way to represent the organization’s needs; It provides information in a graphical way to the members of an organization to understand and communicate the business rules and processes. Business Modeling and Data Modeling are the two important types of modeling.

Logical Data Model
Features of logical data model include:
·         Includes all entities and relationships among them.
·         All attributes for each entity are specified.
·         The primary key for each entity specified.
·         Foreign keys (keys identifying the relationship between different entities) are specified.
·         Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
1.        Identify all entities.
2.        Specify primary keys for all entities.
3.        Find the relationships between different entities.
4.        Find all attributes for each entity.
5.        Resolve many-to-many relationships.
6.        Normalization. 

Conceptual Data Model

Features of conceptual data model include:
·         Includes the important entities and the relationships among them.
·         No attribute is specified.
·         No primary key is specified.
At this level, the data modeler attempts to identify the highest-level relationships among the different entities. 
What is Data modeling?
There are three levels of data modeling. They are conceptual, logical, and physical. This section will explain the difference among the three, the order with which each one is created, and how to go from one level to the other.

What is De Generated Dimension?
An item that is in the fact table but is stripped off of its description, because the description belongs in dimension table, is referred to as Degenerated Dimension.  Since it looks like dimension, but is really in fact table and has been degenerated of its description, hence is called degenerated dimension.
What is Junk Dimension?
A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.
When you consolidate lots of small dimensions and instead of having 100s of small dimensions, that will have few records in them, cluttering your database with these mini ‘identifier’ tables, all records from all these small dimension tables are loaded into ONE dimension table and we call this dimension table Junk dimension table.  (Since we are storing all the junk in this one table) For example: a company might have handful of manufacture plants, handful of order types, and so on, so forth, and we can consolidate them in one dimension table called junked dimension table
It’s a dimension table which is used to keep junk attributes

What is Conformed Dimension?
Conformed Dimensions (CD): these dimensions are something that is built once in your model and can be reused multiple times with different fact tables.   
For example, consider a model containing multiple fact tables, representing different data marts.  Now look for a dimension that is common to these facts tables.  In this example let’s consider that the product dimension is common and hence can be reused by creating short cuts and joining the different fact tables. Some of the examples are time dimension, customer dimensions, product dimension.

No comments:

Post a Comment

Thank you :
- kareem