Data WareHouse: August 2009

Reports
In a DW environment, there are various ways to create reports. A report writing tool like Crystal Reports may be used. Many BI tools (like Business Objects) include tools like WEBi that enable users to create reports.
I will discuss the Reports testing in general terms, without getting into the details of any single tool.
All reports have front-end and back-end. Back-end is, of course, the DW database. Most reports access data from Fact tables.
Front end consists of the body of report, that is actually seen by the user: a chart (bar, line, pie, donut etc), or diagram (Venn etc), or just a 2-dimensional table.

Each report has an associated query (or set of queries) that runs on the database and retrieves rows. The logic for processing and displaying the retrieved data is another aspect of a report. In a BO environment, it's the Universe that contains all this logic.
Based on the discussion so far, following are the possible test scenarios for reports in a DW environment:
- Queries: As with any software, a report will (should) have a specification document. The document specifies which information is to be displayed, in which format.
Verify that the query (or queries) associated with the report does actually retrieve the data (information) from the database, that it is supposed to. This test is geared toward functionality of a report query.
Additionally, (if the scope provides for it), query may be tested for its performance (testing the joins, access paths, Explain etc).

- Output: Output of a report should be tested for the following:
. Report Inputs
. Report Layout (Bar, Pie, Donut, tabular etc)
. Filter conditions (correspond to WHERE clause)
. Grouping conditions (correspond to GROUP BY clause)

- Report Contents: This is the most important part of the reports testing. As a user, I run a report, I see the data, I am happy. BUT, how do I know that, what I see IS what it is? In other words, how do I know if the data displayed in the report is accurate? A billion-dollar question.

The answer goes back to the ETL process, that populated the DW database. If the ETL is designed properly, and it runs error-free, then we can have some level of confidence about the accuracy of the report. And that's why, it is almost always advisable to perform the ETL QA before Reports QA. I would go ahead one step further and say, that reports should not even be developed before the ETL is signed off. This may sound like paranoia, and may not always be possible in the real world.
There are two aspects to this issue. One, whether the data retrieved by the query is accurate. Two, whether the data processing logic (build into the report) is accurate.
Hence, two sets of tests are required.
- The associated query is run on the DW database. Remember, the data retrieved by the query is RAW data. It is not yet processed. This data needs to be compared to the source data (I mean, the source of the ETL). This can be littlebit tricky, since the source can be almost any data structure. If the source is also a database, then the report query can be 'translated' to run on the source database, and the retrieved data can be compared with the data retrieved by the reports query.
Again, remember, that the two sets of data (source and target) may not be identical, since the whole ETL logic is sitting in between them. So, some level of manual judgement is involved here.

If there are any discrepancies at this step, then it points to the ETL design (assuming that the reports query is accurate).

- Once we are sure that the data we got is accurate, then we need to know how it is processed, in order to bring it to the final output format. For example, if we have a bar chart, that shows percentages of revenue growth across regions, then we should process the raw data to calculate the required percentages and compare them with what we see in the report. This way, we can verify the report output.

I will discuss possible test scenarios for each of the components of the DW.

ETL
- Source-to-Target Mapping: Verify that, for each field (in each table) in the Target DB, there is one (and only one) source field. Verify that, corresponding source and target fields have matching data types, and matching lengths. Any mismatch here can result into disasters later on. For example, if a long source field is being copied into a shorter target field, truncation may occur.
One of the common pitfalls is decimal field. If the precision and scale are not matching correctly, the rounding error might occur. Like, source field has scale = 3, and target field has scale = 2, then the target field will round off the third decimal digit. So, 5.446 will result in 5.45 . Not so accurate.
Remember that, 'source' can consist of one (or a combination ) of the following: database tables, Access tables, EXCEL sheets, flat files etc. Hence, it is advisable to maintain a source-to-target mapping document.

- ETL Execution: Verify that the ETL execution is error-free. Common practice is that, the target database will have 2 tables: Audit and Error. After the ETL execution, the Audit field will store the following information: Rows extracted from source, Rows in Target before loading, rows in Target after loading. Error table will have the rows from the source, that didn't make it to the target, due to some error.
If there are any rows in the Error table, then one has to debug why those rows were dropped by the ETL.

- Row Count Reconciliation: This is the most important test of an ETL. The data in the source and target has to be identical. This is a little tricky subject. The data organization on source and target can be significantly different. On target side, it is a DW, comprising of Fact and Dimension tables, with FK and REF constraints. On source side, it can be almost anything. And these source-to-data configurations are different for different projects (to each its own), So, in general, there has to be a way, to determine that the data rows in a Fact table (or rather ALL the Fact tables) on the target side, and the data on the source side. This largely depends on the logic of the ETL and ETL transformations. And that's why, I call it a white-box testing. QA person has to know the ETL internals, in order to validate the data.

Database
Database tests, in the context of DW, include 2 major aspects: Referencial Integrity and Foreign Key constraints between Fact tables and Dimension tables.
- Check for Referencial Integrity constraint: If you try to delete a record from a dimentional table for which at least one row exists in a Fact table, it should stop you from doing so.
- Check for Foreign Key Constraint: If you try to update a key value in a Fact table, to a value that does not exist in the corresponding Dim table, it should stop you from doing so.
Exception: In some DW setups, it may happen that, Fact table may be refreshed daily, but Dim tables are refreshed weekly (or at a different frequency). In such exceptional cases, these kind of constraints should be used judiciously.

I shall cover the Reports Testing Scenarios in the next post.

Data WareHouse

Thursday, August 20, 2009

Reports

Wednesday, August 19, 2009

Test Scenarios for a DW

My Blog List

About Me

Followers

Blog Archive