Many times, it is imperative to join two fact tables in a DW, because the reports (rather queries behind the reports) need it.
In general, it is not a good idea to join two fact tables. Reason is: When you join (inner join) two tables on a condition, only the rows that satisfy the condition are returned. So, if you join two fact tables, factA and factB, then, only those rows from factA would be returned, that have matching rows in factB, and non-matching rows would be dropped (and hence, won’t show up in reports). This may or may not be consistent with the business rules. For example, the situation where there is a row in factA with no matching row in factB may perfectly be a valid business situation. However, the join will drop the row. So, effectively we are eliminating a valid row from the report.
Work-arounds:
- Outer join may be a solution. An outer join will return ALL rows from factA. For non-matching rows, it will return null values for the factB fields. This may or may not be OK, depending on the business logic. For example, many times, reports are designed to fill in dummy values like ‘Unknown’, ‘Undefined’ etc, when they see a null value. So, the users may be fine with this. But again, if you are going to use a field from factB, to fetch another field from another table, then this strategy may not work.
- Another approach goes back to database (ETL) design. The ETL designer should be (made) aware of the reports that are going to be generated out of this system. And, the database should be so designed, that we won’t need to do a join.
In general, it is not a good idea to join two fact tables. Reason is: When you join (inner join) two tables on a condition, only the rows that satisfy the condition are returned. So, if you join two fact tables, factA and factB, then, only those rows from factA would be returned, that have matching rows in factB, and non-matching rows would be dropped (and hence, won’t show up in reports). This may or may not be consistent with the business rules. For example, the situation where there is a row in factA with no matching row in factB may perfectly be a valid business situation. However, the join will drop the row. So, effectively we are eliminating a valid row from the report.
Work-arounds:
- Outer join may be a solution. An outer join will return ALL rows from factA. For non-matching rows, it will return null values for the factB fields. This may or may not be OK, depending on the business logic. For example, many times, reports are designed to fill in dummy values like ‘Unknown’, ‘Undefined’ etc, when they see a null value. So, the users may be fine with this. But again, if you are going to use a field from factB, to fetch another field from another table, then this strategy may not work.
- Another approach goes back to database (ETL) design. The ETL designer should be (made) aware of the reports that are going to be generated out of this system. And, the database should be so designed, that we won’t need to do a join.
No comments:
Post a Comment