Forecast averaging example

Especially in economics/econometrics, modellers do not believe their models reflect reality as it is. No, the yield curve does NOT follow a three factor Nelson-Siegel model, the relation between a stock and its underlying factors is NOT linear, and volatility does NOT follow a Garch(1,1) process, nor Garch(?,?) for that matter. We simply look at the world, and try to find an apt description of what we see.
A case in point taken from the yield curve literature is the recent progression towards the “shadow rate” modelling approach, which tackles the zero lower bound problem. A class of models which was nowhere to be found before reality dawned on this special level of policy rates. Model development is often dictated not by our understanding mind you, rather by the arrival of new data which does not fit well existing perceptions. This is not to criticize, but to emphasize: models should be globally viewed as approximations. We don’t really realize reality. Some may even go as far as to argue that reality has no underlying model (or data generating process). As Hansen writes in Challenges for econometric model selection:

“models should be viewed as approximations, and econometric theory should take this seriously”

All theory naturally follow the lines of “if this is the process, then we show convergence to the true parameter”. Convergence is important, but that is a big IF there. Whether there is, or there isn’t such a process, such a true model, we don’t know what it is. Again, especially in social sciences. Furthermore still, even if there is a one true DGP, you can bet on it being variable.

This discussion gives rise to combination of models, or when dealing with the future, combination of forecasts. If we don’t know the underlying truth, combining different choices, or different modelling approaches may yields better results. I consider intelligent combination of forecasts to be weakly dominate (at least not worse, often better than) a single choice made a priori. This is in all usually prevailing situations where the true underlying process is unknown or unstable or both.

How does it work?

Let’s generate some fictitious data, and forecast it using 3 different models. Simple regression (OLS), Boosting and Random forest. Once the three forecasts are obtained, we can average them.

The most accurate method in this case is boosting. However, in some other cases depending on the situation, Random Forest would be better than boosting. If we use constraint least squares we achieve almost the most accurate results, but this is without the need to choose the Boosting over Random Forest method beforehand. That is the idea, and it is important. Picking up on the introductory discussion, we just don’t know which model will deliver the best results and when it will do so.

Appendix
Formally, say y_t is your target, \widehat{y}_{i,t} is a forecast at time t from method i, and i = 1 is for example OLS, i=2 is boosting and i = 4 is Random Forest$. You can just take the average of the forecasts:

    \[\frac{\sum^3_{i=1} \widehat{y}_{i,t} }{3}.\]

Often, though not very intelligent, this simple average performs very well.

In OLS averaging we simply project the forecast onto the objective, and the resulting coefficients serves as the weights:

    \[\widehat{y}^{combined}_t = \widehat{w}_{0t} + \sum_{i = 1}^3 \widehat{w}_{i,t} \widehat{y}_{i,t}.\]

This is quite unstable. All forecasts are chasing the same target, so they are likely to be correlated, which makes it hard to estimate the coefficients. A decent way to stabilize the coefficients is to use constraint optimization, whereby you solve the least squares problem, but under the following constraints:

    \[w_{0t} = 0 \quad \text{and}  \quad \sum_{i = 1}^3 w_{it} = 1, \qquad \forall t.\]

Another way is to average the forecasts according to how accurate they have been, up until that point based on some metric like root MSE. We inverse the weights such that the more accurate (low RMSE) gets more weight:

    \[w_{it} =  \frac{\left(\frac{RMSE_{i,t}  }{\sum_{i = 1}^3 RMSE_{i,t}}\right)^{-1}}{\sum_{i = 1}^3 \left(\frac{RMSE_{i,t}  }{\sum_{i = 1}^3 RMSE_{i,t}}\right)^{-1} } = \frac{\frac{1}{RMSE_{i,t}}}{\sum_{i=1}^3\frac{1}{RMSE_{i,t}}}.\]

You can plot the weights of the individual methods:
CLS_weights

IMSE_weights

LS_weights

Here is the forecast averaging function (click on the corner arrow to unfold).

4 comments on “Forecast averaging example”

  1. Awesome post! We are trying to catch a distribution that at best we can only approximate and that in the worst case might change its property over time but we never know when, how and if… Have you ever tried a dynamic approach for the weights, e.g. state space models?

Leave a Reply

Your email address will not be published. Required fields are marked *