Forecast Reconciliation in Python | by Benton Tripp

In my experience, it is somewhat unusual to work with real time-series data that does not have underlying “levels”. For example, consider the data that might be produced via transactions at a grocery store. Transactions can be described at an individual product level, a shopper level, or at a store level. The products purchased might also be categorized into a specific type of products, and those categories might in turn fall into broader categories. For a business owner, this complex hierarchical structure can complicate making accurate and unbiased forecasts for their business. A data-driven person will most likely produce forecasts for each of the different levels, but they don’t always add up. This is where reconciliation becomes necessary.

The majority of my explanations throughout this post are based on the book Forecasting: Principles and Practice by Rob J. Hyndman and George Athanasopoulos. It’s an excellent resource for forecasting in general and completely free online. I highly recommend that you spend some time reading Hyndman’s more detailed explanation of reconciling hierarchical and grouped forecasts at some point. It is not my intention here to replace this book, although I still intend to give an explanation of forecast reconciliation. Instead, my intention in writing this blog post is to provide whomever is reading this with the background needed to develop their own forecast reconciliation code. I have found that the existing packages that I have found are not consistently maintained, or they are not (in my opinion) at a stage in their development where they can be considered reliable in a production environment. But even if you do use one of the existing reconciliation packages available, staring at a mathematical formula in a book or applying some pre-defined function that somebody else created are not the best ways to gain a firm understanding of what is actually happening. Hopefully by sharing a brief explanation of how one might go about writing their own Python code to reconcile hierarchical forecasts I can also provide some additional insight that might not be attained otherwise.

To begin, I will give a brief introduction to the differences between hierarchical and grouped time series (no, they are not quite the same thing). A time series is considered hierarchical when lower levels within the hierarchy only fall under one domain. For example consider the following hierarchy tree:

Sub-Category 1 falls exclusively under Category 1, Sub-Category 4 falls exclusively under Sub-Category 2, etc. This exclusivity is what defines a hierarchical time series. It is also important to note that mathematically, each of the sub-categories should add up to the category above them, and Total is the total sum of each of the bottom-level categories. The mathematical expression looks like this:

T = C1 + C2 = SC1 + SC2 + SC3 + SC4 + SC5
C1 = SC1 + SC2 + SC3
C2 = SC4 + SC5

On the other hand, a grouped time series is when that exclusivity between sub-domains does not exist. For example, consider the same hierarchy tree but with only three sub-categories spread across each of the two primary categories:

The same hierarchy could also be described by the following tree:

This complexity of multiple arrangements of the hierarchy groups means that the original formulas used are no longer valid. Hyndman describes this structural concept as “not naturally disaggregat[ing] in a unique hierarchical manner.”

The difference between hierarchical and grouped time series is important to understand, because the summing matrix (explained in the next section) is dependent on the structure of the time series.

Reconciled — or coherent — forecasts are constructed from a few key components:

Base Forecasts:
Forecasts at each hierarchical level represented as an m x n matrix (m rows and n columns), where the columns of the matrix represent the hierarchical levels of the data and the rows represent each time period of the forecast horizon.

Summing matrix:
The summing matrix describes the hierarchical/grouped structure of the data. For hierarchical summing matrices, the number of columns matches the number of unique categories on the bottom level. The number of rows is determined by the total number of unique categories across all levels. The values of the summing matrix are binary values that represent which bottom-level (column) categories map to each of the hierarchies across all levels. Consider the example shared previously of hierarchical data with two categories and five sub-categories. The summing matrix would look like the following:

Because grouped time series data does not dis-aggregate in a “unique manner,” there is not one unique bottom level. This means that the summing matrix needs additional columns as well as additional rows. The grouped data from earlier in the post would look like this:

Mapping Matrix:
The key component of forecast reconciliation is the mapping matrix. This matrix varies depending on the reconciliation method used, but the principle remains the same. When you multiply the mapping matrix with the summing and base forecast matrices (for either hierarchical or grouped time series), the result is a set of coherent forecasts. The challenge then is finding an “optimal” mapping matrix that can be used to reconcile the forecasts with the least amount of variance.

Many different reconciliation methods exist to find the optimal mapping matrix, and some are better than others. I will not go into detail on the majority of them, but I will still list a few. Reconciliation methods fall under two categories: single-level and minimum trace methods.

Some single-level methods include:

Bottom-Up
Top-Down
Middle-Out

And some minimum trace methods include:

Ordinary least squares (OLS)
Weighted least squares (WLS) with variance scaling
WLS with structural scaling

For a more rigorous explanation of the different minimum trace methods — along with a few additional approaches — feel free to check out this paper.

As mentioned previously, there are a couple of existing tools/packages that can apply the majority of reconciliation methods. The aforementioned Rob Hyndman helped to develop an R package called fabletools that in my opinion is one of the best forecasting packages, and the best tool out there for forecast reconciliation. An effort is also underway to develop a Python package called scikit-hts. However, it is still pretty buggy and doesn’t have all of the capabilities that exist in fabletools.

I will not be sharing an explanation for coding all of the different methods available, nor will I be strictly showing all of the typical parts of machine learning (i.e. I won’t be splitting data into training/test sets, generating forecasts, looking at accuracy metrics, etc.). My intention is to demonstrate how one might go about programming their own reconciliation algorithm, so I am going to just be working with a sample dataset with pre-existing forecasts.

Below are plots of each of the hierarchical levels in the sample dataset:

The first steps to reconcile the forecasts in this dataset are to define the Base Forecast and Summing matrices. In order to do this, we need to define the bottom/middle/top levels of the hierarchical structure. I have found that having the data in the following format makes this process pretty simple to do:

Notice I have included date, parent, child, and forecast as columns in the dataset. Of course, however you choose to structure your data is completely up to you. But the methods I use to gather all of the information needed to reconcile the forecasts requires that the data is formatted this way.

Next, we can go ahead and define the different hierarchies within this dataset, and restructure the data into the correct base forecast matrix structure:

From the restructured dataset, defining the summing matrix can easily be accomplished using the following code:

Finally, we can define our mapping matrix. For this post, I will be demonstrating how to reconcile the forecasts using the Ordinary Least Squares method. The other methods can easily be implemented without too many major adjustments to the code now that you have the Base Forecast and Summing matrices defined. If you would like to use any of the other methods, you can refer to this section of Hyndman’s book for the different optimal reconciliation approaches.

Using the OLS reconciliation method, we will use the following formula:

Let the summing matrix sm = S, the base forecast matrix bf = F, and the forecast horizon = h. Then, for the mapping matrix M:

And the reconciled forecast output R:

You should also double-check that everything adds up coherently using the formulas given earlier in the post:

T = C1 + C2 = SC1 + SC2 + SC3 + SC4 + SC5
C1 = SC1 + SC2
C2 = SC3 + SC4 + SC5

And that’s it! You have successfully reconciled the forecasts using the ordinary least squares method.

*All images unless otherwise stated are my own.

[1] Rob J. Hyndman and George Athanasopoulos, Forecasting: Principles and Practice 3rd ed. (2022), https://otexts.com/fpp3/
[2] Shanika L. Wickramasuriya, George Athanasopoulos, and Rob Hyndman; Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization (2019); https://doi.org/10.1080/01621459.2018.1448825
[3] Mitchell O’Hara-Wild, Rob Hyndman, Earo Wang; fabletools (2022); https://fabletools.tidyverts.org/reference/reconcile.html
[4] Carlo Mazzaferro, scikit-hts (2019), https://scikit-hts.readthedocs.io/en/latest/readme.html