- Introduction
- Combination schemes
- Examples
- Discussion and takeaways
- Manual can be found here

Option 1 = Agree

Option 2 = Disagree

Simple enough:

\begin{equation} f^{combined} = \frac{\sum_{i = 1}^P f_i }{P} \end{equation}

But should we?

Yes we should

It works

- Biases
- Model risk

in different circumstances and/or in different points in time:

Source:
__ Forecasting day-ahead electricity prices __

So, as we don't bet on the *one* horse in investments, we don't bet on the *one* horse here neither

It works in forecasting in the same manner it works when *investing*

*That is the idea, but how to combine? (Down arrow) *

$$ y_t = {\alpha} + \sum_{i = 1}^P {\beta_i} f_{i,t} +\varepsilon_t, $$

The combined forecast is then given by:

$$f^{comb} = \widehat{\alpha} + \sum_{i = 1}^P \widehat{\beta}_i f_i,$$

$$y_t = {\alpha} + \sum_{i = 1}^P {\beta_i} f_{i,t} +\varepsilon_t,$$ (as before)

But minimise the absolute loss function:$$\sum_t |\varepsilon_t|$$ instead of the squared loss function $$\sum_t {\varepsilon_t}^2$$

$$y_t = {\alpha} + \sum_{i = 1}^P {\beta_i} f_{i,t} +\varepsilon_t,$$

Minimise the squared loss function: $$\sum_i {\varepsilon_t}^2,$$ but under additional constraints: $\beta_i \geq 0, \; \forall i, \; \text{or}$ $\sum_{i = 1}^P \beta_i = 1, \; \text{or both} $$$ \operatorname {MSE_i} ={\frac {1}{T}}\sum _{t=1}^{T}({{f_{i,t}}} - y_{t})^{2} , $$

and combine the forecasts based on how well each individual is doing:

$$ f^c = \frac{\left(\frac{MSE_{i} }{\sum_{i = 1}^P MSE_{i}}\right)^{-1}}{\sum_{i = 1}^P \left(\frac{MSE_{i} }{\sum_{i = 1}^P MSE_{i}}\right)^{-1} } f_i = \frac{\frac{1}{MSE_{i}}}{\sum_{i=1}^P\frac{1}{MSE_{i}}} f_i $$

$$ f^c = w_i f_i, \quad \mbox{where} \qquad $$

$w_i = 1 \quad \mbox{if} \quad MSE_{i} < MSE_{-i} \quad \forall i \in \{1, \dots, P\} $

$ w_i = 0 \quad \mbox{otherwise} $

There are **14** attributes in each case of the dataset. They are:

- CRIM - per capita crime rate by town
- ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS - proportion of non-retail business acres per town.
- CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX - nitric oxides concentration (parts per 10 million)
- RM - average number of rooms per dwelling
- AGE - proportion of owner-occupied units built prior to 1940
- DIS - weighted distances to five Boston employment centres
- RAD - index of accessibility to radial highways
- TAX - full-value property-tax rate per $10,000
- PTRATIO - pupil-teacher ratio by town
- B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT - % lower status of the population
- MEDV - Median value of owner-occupied homes in $1000's

Description of the Boston dataset Source: U.S Census Service

Individual forecasts (RMSE) |

Linear: 4.73 |

Principal component regression: 7.62 |

Boosting: 3.85 |

Random forests: 3.26 |

Support vector machine: 3.06 |

Neural network: 3.97 |

——————————————————————————————– |

Forecast Combinations (RMSE) |

Simple: 3.64 |

OLS : 2.77 |

LAD: 2.77 |

Variance based : 3.2 |

CLS : 2.95 |

BI: 3.06 |

“The current system emphasizes data on spending, but the bureau also collects data on income. In theory the two should match perfectly - a penny spent is a penny earned by someone else. But estimates of the two measures can diverge widely” [Aruoba et al., 2015]

$$ D_t = (1-\lambda) \sum_{t=1}^ \infty \lambda^{t-1} (\varepsilon_{t-1}\varepsilon^ \prime_{t-1}) = (1-\lambda)(\varepsilon_{t}\varepsilon^ \prime_{t})+\lambda D_{t-1} $$

- Interpretation is lost
- Does not always add value (garbage in $\Rightarrow$ garbage out)

- Good "hedge" against wrong modelling choices
- No consensus on the best approach
- Simple average is very robust
- Useful in changing environment where structural breaks are likely