Data WareHouse: Test Scenarios for a DW

I will discuss possible test scenarios for each of the components of the DW.

ETL
- Source-to-Target Mapping: Verify that, for each field (in each table) in the Target DB, there is one (and only one) source field. Verify that, corresponding source and target fields have matching data types, and matching lengths. Any mismatch here can result into disasters later on. For example, if a long source field is being copied into a shorter target field, truncation may occur.
One of the common pitfalls is decimal field. If the precision and scale are not matching correctly, the rounding error might occur. Like, source field has scale = 3, and target field has scale = 2, then the target field will round off the third decimal digit. So, 5.446 will result in 5.45 . Not so accurate.
Remember that, 'source' can consist of one (or a combination ) of the following: database tables, Access tables, EXCEL sheets, flat files etc. Hence, it is advisable to maintain a source-to-target mapping document.

- ETL Execution: Verify that the ETL execution is error-free. Common practice is that, the target database will have 2 tables: Audit and Error. After the ETL execution, the Audit field will store the following information: Rows extracted from source, Rows in Target before loading, rows in Target after loading. Error table will have the rows from the source, that didn't make it to the target, due to some error.
If there are any rows in the Error table, then one has to debug why those rows were dropped by the ETL.

- Row Count Reconciliation: This is the most important test of an ETL. The data in the source and target has to be identical. This is a little tricky subject. The data organization on source and target can be significantly different. On target side, it is a DW, comprising of Fact and Dimension tables, with FK and REF constraints. On source side, it can be almost anything. And these source-to-data configurations are different for different projects (to each its own), So, in general, there has to be a way, to determine that the data rows in a Fact table (or rather ALL the Fact tables) on the target side, and the data on the source side. This largely depends on the logic of the ETL and ETL transformations. And that's why, I call it a white-box testing. QA person has to know the ETL internals, in order to validate the data.

Database
Database tests, in the context of DW, include 2 major aspects: Referencial Integrity and Foreign Key constraints between Fact tables and Dimension tables.
- Check for Referencial Integrity constraint: If you try to delete a record from a dimentional table for which at least one row exists in a Fact table, it should stop you from doing so.
- Check for Foreign Key Constraint: If you try to update a key value in a Fact table, to a value that does not exist in the corresponding Dim table, it should stop you from doing so.
Exception: In some DW setups, it may happen that, Fact table may be refreshed daily, but Dim tables are refreshed weekly (or at a different frequency). In such exceptional cases, these kind of constraints should be used judiciously.

I shall cover the Reports Testing Scenarios in the next post.

Data WareHouse

Wednesday, August 19, 2009

Test Scenarios for a DW

No comments:

Post a Comment

My Blog List

About Me

Followers

Blog Archive