Conformed facts
In addition to conformed dimensions,
you need conformed facts. Conforming a fact really amounts to standardizing the
definitions of terms across individual marts. Often, different divisions or
departments use the same term in different ways. Does “revenue” refer to “gross
revenue” or “adjusted revenue”? Does “units shipped” refer to cases of
items or individual items?
Make certain your design
team develops, early on, a uniform enterprise taxonomy—and enforce it.
Conformed dimensions
Conformed
dimensions can be used to analyze facts from two or more data marts. Suppose
you have a “shipping” data mart (telling you what you’ve shipped to whom and
when) and a “sales” data mart (telling you who has purchased what and when).
Both marts require a “customer” dimension and a “time” dimension. If they’re
the samedimension, then you have conforming dimensions,
allowing you to extract and manipulate facts relating to a particular customer
from both marts, answering questions such as whether late shipments have
affected sales to that customer.
Suppose now that you add a “marketing”
data mart to help you analyze product promotions. Again, with conformed
customer and time dimensions, you’re able to analyze the effects of a
particular product promotion on sales. (Analyzing facts from more than one fact
table in this way is termed “drilling across.” My previous article, “Thinking
dimensionally aids business intelligence design and use,” explains the
function of facts and dimensions.)
As this example shows, the very same
conformed dimensions—in this case, time and customer dimensions—have meaning in
the context of three independently developed data marts. These
dimensions become enterprise property and can be used later in other marts as
you evolve the enterprise data warehouse.
Semi Additive Facts
A semi additive fact is one where the
measure can either have only a subset of aggregations applied to it, it you can
count the measures but not sum them, or the measures are only additive over a
subset of the dimensions.
Using our "daily_balances"
fact above would be a good example of a semi additive fact. The daily balances
can be aggregated by customer if the customer has multiple accounts to give the
customers daily balance, however the balances could not be aggregated over time
as adding last week’s balance onto this week’s balance would result in a
nonsensical figure.
Non Additive Facts
A non-additive fact is one where the
measure is non agreeable over any dimensions. These are commonly where
percentages have been calculated and stored in the fact. Another example could
be a profit margin on a sale, there is this figure other than at an individual
sale level.
Additive Facts
A fully additive fact is one where the
measures can be aggregated.
For example,
our Sales fact above would be fully additive as you can aggregate the sales amount
over time, by product, by region or by salesman and still get the correct
answer.
Transaction Grain
This is the most common type of fact.
You would declare the grain of the fact, ie the level of detail and then this
is what would be stored. For example, you may have a sales order fact, every
time a new sales order a new row would be created in the fact table. alternatively,
you may have a "monthly_sales" fact. At the end of every month you
would aggregate up all the sales that happened in that month and record the
single total value.
Snapshot Facts
The
snapshot fact contains a reflection of the state of an entity at a given point
in time. A classic example of this would be a "daily_balance" fact in
a banking system. This would, on a daily basis record the balance of each
account, it would NOT list the individual transactions that happened on the
account.
Factless
Facts
A Factless
fact is where the fact does not store an actual numerical measure, the mere existence
of a fact record indicates that an event has happened that you wish to
track. The classic example of this would be an "Attendance"
fact. If you had dimensions to record date, scheduled_course, instructor and
delegate then you could create a fact table that held the permutations of these
dimensions. From this you could evaluate the number of courses you run, the
number of delegates, the number of courses by instructor etc.
I
would never simply leave a Factless fact as a bare collection of foreign key
columns I would always add a dummy measure column which would be set to 1 which
you would then sum.
Accumulating Fact Table
An
accumulating fact table is where all of the dimensional attributes are not
available at the time of creation and the dimensions that are linked to a fact
table change over time. The most common implementation of this is in the
recording of dates against facts.
Take a
"Sales" fact, typical dates you may be intersted in when tracking an
individual sale is maybe, order_date, ship_date, delivery_date and
payment_date. These would not all be available when the fact is first created.
Over time the fact record would accumulatemore relationships with the
dimensions as the relevant date milestones were passed for the sale.
The
differences between a logical data model and physical data model
Logical vs Physical Data
Modeling
Logical
Data Model
|
Physical
Data Model
|
Represents
business information and defines business rules
|
Represents
the physical implementation of the model in a database.
|
Entity
|
Table
|
Attribute
|
Column
|
Primary
Key
|
Primary
Key Constraint
|
Alternate
Key
|
Unique
Constraint or Unique Index
|
Inversion
Key Entry
|
Non
Unique Index
|
Rule
|
Check
Constraint, Default Value
|
Relationship
|
Foreign
Key
|
Definition
|
Comment
|
Physical Data Modeling
Features of physical data model
include:
·
Specification
all tables and columns.
·
Foreign
keys are used to identify relationships between tables.
·
Demoralization
may occur based on user requirements.
·
Physical
considerations may cause the physical data model to be quite different from the
logical data model.
At this level, the data modeler will
specify how the logical data model will be realized in the database schema.
The steps for physical data model
design are as follows:
1. Convert
entities into tables.
2. Convert
relationships into foreign keys.
3. Convert
attributes into columns.
4. Modeling
is an efficient and effective way to represent the organization’s needs; It
provides information in a graphical way to the members of an organization to
understand and communicate the business rules and processes. Business Modeling
and Data Modeling are the two important types of modeling.
Logical Data Model
Features of logical data model include:
·
Includes
all entities and relationships among them.
·
All
attributes for each entity are specified.
·
The
primary key for each entity specified.
·
Foreign
keys (keys identifying the relationship between different entities) are
specified.
·
Normalization
occurs at this level.
At this level, the data modeler
attempts to describe the data in as much detail as possible, without regard to
how they will be physically implemented in the database.
In data warehousing, it is common for
the conceptual data model and the logical data model to be combined into a
single step (deliverable).
The steps for designing the logical
data model are as follows:
1. Identify
all entities.
2. Specify
primary keys for all entities.
3. Find
the relationships between different entities.
4. Find
all attributes for each entity.
5. Resolve
many-to-many relationships.
6. Normalization.
Conceptual
Data Model
Features of conceptual data model
include:
·
Includes
the important entities and the relationships among them.
·
No
attribute is specified.
·
No
primary key is specified.
At this level, the data modeler
attempts to identify the highest-level relationships among the different
entities.
What is Data modeling?
There are three levels of data
modeling. They are conceptual, logical, and physical. This section
will explain the difference among the three, the order with which each one is
created, and how to go from one level to the other.
What is De Generated
Dimension?
An item that is in the fact table but
is stripped off of its description, because the description belongs in
dimension table, is referred to as Degenerated Dimension. Since it looks
like dimension, but is really in fact table and has been degenerated of its
description, hence is called degenerated dimension.
What
is Junk Dimension?
A "junk" dimension is a
collection of random transactional codes, flags and/or text attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure
that provides a convenient place to store the junk attributes. A good example
would be a trade fact in a company that brokers equity trades.
When you consolidate lots of small
dimensions and instead of having 100s of small dimensions, that will have few
records in them, cluttering your database with these mini ‘identifier’ tables,
all records from all these small dimension tables are loaded into ONE dimension
table and we call this dimension table Junk dimension table. (Since
we are storing all the junk in this one table) For example: a company might
have handful of manufacture plants, handful of order types, and so on, so
forth, and we can consolidate them in one dimension table called junked
dimension table
It’s a dimension table which is used to
keep junk attributes
What is Conformed Dimension?
Conformed Dimensions (CD): these
dimensions are something that is built once in your model and can be reused
multiple times with different fact tables.
For example, consider a model
containing multiple fact tables, representing different data marts. Now
look for a dimension that is common to these facts tables. In this
example let’s consider that the product dimension is common and hence can be
reused by creating short cuts and joining the different fact tables. Some of
the examples are time dimension, customer dimensions, product dimension.
No comments:
Post a Comment
Thank you :
- kareem