Data integrity refers to the validity of data,
meaning data is consistent and correct. In the data warehousing field, we
frequently hear the term, "Garbage In, Garbage Out." If there is no
data integrity in the data warehouse, any resulting report and analysis will
not be useful.
In a data warehouse or a data
mart, there are three areas of where data integrity needs to be enforced:
Database level
We can enforce data
integrity at the database level. Common ways of enforcing data integrity
include:
Referential integrity
The relationship
between the primary key of one table and the foreign key of another table must
always be maintained. For example, a primary key cannot be deleted if there is
still a foreign key that refers to this primary key.
Primary key / Unique
constraint
Primary keys and the
UNIQUE constraint are used to make sure every row in a table can be uniquely
identified.
Not NULL vs NULL-able
For columns
identified as NOT NULL, they may not have a NULL value.
Valid Values
Only allowed values
are permitted in the database. For example, if a column can only have positive
integers, a value of '-1' cannot be allowed.
ETL process
For each step of the
ETL process, data integrity checks should be put in place to ensure that source
data is the same as the data in the destination. Most common checks include
record counts or record sums.
Access level
We need to ensure that
data is not altered by any unauthorized means either during the ETL process or
in the data warehouse. To do this, there needs to be safeguards against
unauthorized access to data (including physical access to the servers), as well
as logging of all data access history. Data integrity can only ensure if there
is no unauthorized access to the data.
No comments:
Post a Comment
Thank you :
- kareem