The typical fact table measures activities and is known as a transaction fact table. They support a wide variety of analytic possibilities and can be used to capture detailed information about a particular process. Certain facts cannot be studied easily using this kind of design, if at all.
This blog will outline the characteristics of a transaction fact table vs those of a snapshot fact table, and when the need for a snapshot fact table arises.
Transaction fact table
Transaction fact tables track events and their details. For example, tracking the sale of an item or a shipment of a product. By storing facts and giving them dimensional context, they allow the events to be studied individually and in aggregation. It is the facts themselves that measure the events e.g. the value of a sale or the quantity shipped. Each value recorded in the fact table describes the specific event represented by the row and nothing else.
Fact Table Properties
Declaring the grain is a crucial step in the design of every star schema, which identifies the level of detail presented by a row. Typically, the grain can be defined by referencing an actual transaction identifier at the lowest level (e.g., an order line) or can be specified in purely dimensional terms “orders by day, customer, product and salesperson”.
Transaction tables are sparse. This means that rows are only recorded for activities that take place, not for every possible combination of dimension values. For example, if salesperson x does not make a sale to customer x, nothing is recorded in the fact table, despite those persons existing. Otherwise, the fact table would have excessive growth and many extraneous rows of zero valued facts.
Transaction fact tables normally record additive facts. Most nonadditive measurements, for example ratios, can be broken down into additive facts that should instead be recorded in the fact table. This allows the granular data to be aggregated and analysed at any desired level of detail.
Snapshot Fact Tables
Sometimes, measuring the effect of a series of events is just as useful as measuring the individual events. These effects are known as status measurements.
For example, account balances and inventory levels are key metrics that a business may want to analyse.
When the measurement of status is important a transaction fact table is at best inefficient.
Instead, a snapshot fact table is called for!
The snapshot fact table samples the measurement in question at a predetermined interval making it easy to study these measurements without the need to aggregate a long sequence of events.
The challenge of studying status measurements in a transaction fact table
Some status measurements such as account balances can be constructed from history. For example, balances can be determined from all transaction history that contribute to that status measurement (withdrawals, payments, interest, fees etc.), doing so requires aggregating all transactions to that point in time. This is not an effective strategy.
Sometimes, transaction data is not stored:
You still may be tempted to track the measurements in the fact table that records transactions, but this is a poor solution and will not work for status measurements that cannot be calculated from corresponding transactions, if the corresponding transactions are purged or if they are not stored at all.
To convince you further:
Transaction tables are sparse. If there is no activity on a day, there is no row in which to record this important fact.
Some days will have more than one transaction, which will lead to double counts in queries.
There are workarounds for both problems. To address problem 1) you could record a row even for accounts with no activity, but this clutters up reports and is not the intention of a transaction fact table. The use of filtering and correlated subqueries can be used for analysis to address problem 2). However, using the snapshot model is a more elegant solution and less error prone.
The snapshot Model
As mentioned, the snapshot fact table samples the measurements in question at a predetermined interval making it easier to study.
Figure 1: key differences
Whereas the grain of a transaction fact table can be expressed by referring to something business related like an order line or transaction ID, the grain of a snapshot is almost always declared dimensionally.
The interval, combined with one or more dimensions is used to define the grain. Each row will contain a fact that records the status measurement.
Examples of declaring the grain:
- A snapshot that captures month-end snapshots for each bank account.
- Inventory level recorded for each store and product for a day.
Density vs Sparsity
In a snapshot, rows are recorded regardless of an activity taking place, to allow for easy analysis. So, snapshot fact tables are usually dense.
Status measurements are usually semi-additive, unlike the facts in a transaction fact table.
This means they cannot be summed meaningfully over a specific, non-additive dimension.
Taking account balances for example, for any given day it makes sense to add balances together from more than one account or create the total balance for all accounts within a region. However, unfortunately for us, the account balance is obviously not additive across time.
Semi-additive facts must be used carefully. When summing, the query used must be constrained by a unique row in the nonadditive dimension (usually time) or grouped by these rows. The same applies for subtotals and grand totals – do not sum across the nonadditive dimension.
Important Snapshot considerations
Coupling with a transaction fact table
Two aspects of the same process can be reflected by the use of a transaction and snapshot fact table. The transaction model allows for detailed analysis of the process activities, whilst the snapshot model sacrifices some detail in the way of individual transactions, for the flexible and powerful analysis of the effect of the transactions. Using the stars in conjunction with each other can pose powerful analysis opportunities.
When including both, it is recommended to use the transaction fact table as the data source for the snapshot fact table. This removes duplicating the ETL of the source data and ensures the data will be loaded consistently.
Other facts can support the Snapshot fact table
The snapshot fact table doesn’t need to be limited to storing facts that measure status. Including other facts that can be summarised over the snapshot period can prevent the need to refer to the transaction star.
Figure 2 shows a snapshot fact table for bank accounts, supplemented with various facts.
The redundant period-begin balance is added to aid queries and reports.
Additive, summary-level information on the activities that occurred during the period can also be included, for example the number of transactions.
You may be interested in recording non-additive facts. The average daily balance of an account can be used to analyse individual accounts within a period. Or it can be broken down into fully additive components: the number of days and a sum of account balances. The fact table stores the latter.
The snapshot fact table is also the perfect home for period-to-date measurements: it is sampled on a regular basis and is defined by a specific combination of dimensions, so in essence is a status measurement.
Figure 2: Snapshot fact table with multiple facts
Pay careful attention to slowly changing dimensions.
The dimensional references in each grain statement describe the circumstances in which a row is recorded in the fact table. These references describe unique natural key values. When these dimensions undergo type 2 slow changes, any given key may have more than one row in the dimension table. However, when the month-end snapshot is recorded, the fact table should contain only one row, using the current version at the end of the snapshot period, to avoid double counting.
This blog post discussed the key differences between a transaction fact table and the less well-known snapshot fact table, as well as some other important considerations.
Be sure to read Star Schema by Christopher Adamson as Snapshot Fact tables are just one of the many topics covered in this comprehensive book.