Multivariate volatility forecasting (1)


When hopping from univariate volatility forecasts to multivariate volatility forecast, we need to understand that now we have to forecast not only the univariate volatility element, which we already know how to do, but also the covariance elements, which we do not know how to do, yet. Say you have two series, then this covariance element is the off-diagonal of the 2 by 2 variance-covariance matrix. The precise term we should use is “variance-covariance matrix”, since the matrix consists of the variance elements on the diagonal and the covariance elements on the off-diagonal. But since it is very tiring to read\write “variance-covariance matrix”, it is commonly referred to as the covariance matrix, or sometimes less formally as var-covar matrix.

If you are still reading this, I can imagine you have some experience in modelling correlation. The link between correlation and covariance is direct, given that you know the variance \sigma^2 of the individual series:

    \[cor(x1,x2) = \frac{cov(x1,x2)}{\sigma_{x1} \sigma_{x2} }.\]

So when I first looked at this subject I did not understand why do we need the colossal literature describing all kinds of multivariate models. Why not model all the off-diagonal individually, using for example a rolling window of the sample pairwise correlation? quickly replied, you want to have a valid correlation matrix, which means symmetric (easy enough to impose) and positive-semidefinite, also known as nonnegative-definite (not easy to impose).

First, why the nonnegative-definite property is important, and second, why it is not easy to impose. Think about nonnegative-definite imposition as the multivariate equivalent to the positivity imposition on volatility in the univariate case. You don’t want to have your machine spit out negative volatility do you? In the univariate case multiplying \sigma by any squared number, we trivially remain in positive territory. In a higher dimension, ensuring “positivity” of the covariance involves multiplying, not by a squared scalar, but by a “squared” vector.

Denote XC as the centered random variable X, so XC=x-\mu_x. Now by definition COV.MAT=E[(XC)(XC^t)] is a covariance matrix, clearly nonnegative-definite. Now if we multiply the matrix by a “squared” vector, v^t E[(XC)(XC^t)] v we can “insert” the vector into the expectation (because (1) the vector is not a random variable, and (2) linearity of the expectation operator). We (should) still get nonnegative-definite matrix: E[(v^tXC)((v^tXC)^t)]. It does not matter which vector v you use, since it is “squared”. This is the origin to the algebraic reverse: A symmetric matrix \Sigma of size N is positive semi-definite if and only if v^t \Sigma v \geq 0 for all possible vectors v (of size N of course).

If we model the covariance entries individually, pairwisely, and “patch” them together into a matrix, placing each pairwise covariance in the right place (for example the covariance between variable 1 and variable 3 in entries \{1,3\} and \{3,1\}, there is no guarantee that we eventually end up with a nonnegative-definite matrix. Since there is no covariance matrix which is nonnegative-definite, then what we can potentially get an invalid covariance matrix. Practitioners, free from a tedious academic refereeing process, may get away with this theoretical blunder. However, in addition there are other problems, computational in nature. A nonnegative-definite matrix can have zero or negative determinant. In many Bayesian application we would like to use precision matrix instead of the covariance matrix. To compute precision matrix, we simply invert the covariance matrix, but this means that we divide by the determinant, so a determinant which is zero creates a problem if you are not Chuck Norris, who can supposedly divide by zero as the rumor goes.

The main building block in the literature is the GARCH process. Say we have a random variable X1, we can model its volatility using:

(1)   \begin{equation*} \sigma^2_t = \omega + \alpha \varepsilon^2_{t-1} + \beta \sigma^2_{t-1}.  \end{equation*}

Easy enough to understand. What is important for the volatility today is the volatility yesterday \sigma^2_{t-1}, with specific emphasis on the shock yesterday, \varepsilon^2_{t-1}. Remember that if E[\varepsilon_{t-1}] = 0, then \varepsilon^2_{t-1} is simply an estimate of the variance of \varepsilon_{t-1} only, without accounting for anything dating back past t-1.

Up the dimension

Now, add another random variable X2. You now have 2 volatilities and a covariance term. But why not extend this process in the same way Vector Auto Regression (VAR) extends Auto regression? Enter VEC model:

(2)   \begin{equation*}  vec(\Sigma_t) = vec(\Omega) + A vec(\varepsilon_{t-1} \varepsilon^ \prime_{t-1}) + B vec(\Sigma _{t-1}), \end{equation*}

where vec is a vectorize operator, taking a matrix and stacking it as a vector. Naturally, because of the symmetry of the matrix, we don’t need all the coefficients, so for clarity here is a better representation:

(3)   \begin{equation*}  \sigma_{t}  =  \begin{pmatrix}   \sigma_{11,t} \\   \sigma_{21,t}\\   \sigma_{22,t}  \end{pmatrix} =    \begin{pmatrix}   \alpha_{11} & \alpha_{12} & \alpha_{13} \\   \alpha_{21} & \alpha_{22} & \alpha_{23} \\  \alpha_{31} & \alpha_{31} & \alpha_{33}  \end{pmatrix}  \begin{pmatrix}   \varepsilon^2_{1,t-1} \\   \varepsilon_{1,t-1} \varepsilon_{2,t-1} \\   \varepsilon^2_{2,t-1}  \end{pmatrix} +   \begin{pmatrix}   \beta_{11} & \beta_{12} & \beta_{13} \\   \beta_{21} & \beta_{22} & \beta_{23} \\  \beta_{31} & \beta_{31} & \beta_{33}  \end{pmatrix}    \begin{pmatrix}   \sigma_{11,t-1} \\   \sigma_{21,t-1}\\   \sigma_{22,t-1}  \end{pmatrix}. \end{equation*}

The intuition behind this model is the same as that underlying the VAR. Maybe when volatility of equities is high, volatility in bonds is lower, maybe when volatility in bonds is high, covariance with equities is high, etc.

A potential problem with this model, also analogues to the VAR, is that maybe volatilities are independent processes, meaning that only the diagonal of A and B matters, in which case we simply trouble the model with unnecessary estimation noise. Another computational issue mentioned before, is that since we do not model the matrix-process per se, but the three terms one by one, we don’t ensure the result is a valid covariance matrix, in particular no nonnegative-definite constraint is imposed. The BEKK model (Baba, Engle, Kraft and Kroner, 1990) makes this advancement, the curious reader is referred to the original paper. There is a good reason not to discuss those “first-generation” models in length. They are very difficult to estimate for more than a handful of variables. I have not tried those personally, it is a hearsay from those unfortunate souls who did. For those models, even if one is successful in estimation, the complications in estimation casts a heavy shadow on the results as far as practitioners are concerned.


A surge in the literature followed the next sizable step delivered by Engle (2002) in his seminal paper: “Dynamic Conditional Correlation: A Simple Class of Multivariate Generalized Autoregressive Conditional Heteroskedasticity Models”. From the abstract: “These (models) have the flexibility of univariate GARCH models coupled with parsimonious parametric models for the correlations.” Key entry point to this class of conditional correlation models is to realize that

(4)   \begin{equation*} \Sigma_t = D_t R_t D_t, \end{equation*}

with D_t being a matrix with the volatility of the individual series (now estimated individually) on the diagonal and zeros off-diagonal. This is just a manipulation in matrix form of the usual equation we started with: \mathrm{corr}(X,Y)={\mathrm{cov}(X,Y) \over \sigma_X \sigma_Y}, since R_t=D^{-1}_t \Sigma_t D^{-1}_t. Now few things are in place:

  • You separate the diagonal from the off-diagonal in \Sigma. You can “fill-in” the diagonal using the usual univariate GARCH estimates. The off diagonal is given by the correlation matrix, on which we can now decide. When we assume a constant correlation matrix (CCC), meaning R_t = R, we can naturally use the sample correlation matrix. We can assume the R matrix is time-varying and estimate it using rolling window or exponentially decaying weights or in another way.
  • Because of the quadratic form \Sigma_t = D_t R_t D_t and because R is a correlation matrix, we are certain to get a valid covariance matrix, which is time varying even when we use constant correlation matrix.
  • Because of this separation, diagonal from off-diagonal, we can actually handle many variables, very much unlike the “first-generation” class of models. This, I think, is the main reason for the acceptance and popularity of the model.
  • Now we estimate.

    Estimation using R

    Let’s get some data. We pull the last few years of three ETF’s. The SPY (tracking the S&P 500), TLT and IEF (tracking long and medium term US bonds respectively).

    Now to demonstrate how to construct covariance matrix using the CCC and DCC models. We first get the univariate volatilities. We need them, they sit on the diagonal of the diagonal matrix D_t. We estimate them using asymmetric GARCH with heavy tails.

    Now, once we have the meat in the sandwich \Sigma_t = D_t R_t D_t, we are able to create the CCC and DCC-based covariance matrices. For the CCC (constant conditional correlation), we use the sample correlation matrix, and for the DCC (dynamic..), we use correlation matrix which is estimated based on a moving window of say 3 months.


    results are annualized and multiplied by 100 to move to percentage points for better readability. Plot it:

    DCC-based covariance
    In the upper panel we have the diagonal of the covariance matrix. We see (1) the medium term bonds have the lowest volatility as expected, (2) SPY is volatile, and the variance of variance is also high (spiky-looking variance series). (3) the variance of the long end of the curve is higher than the variance of the medium term, which is a stylized fact in the yield curve literature. (4) Interestingly, the volatility of the long term bonds in US has been on the rise, probably a heightened sense of alert towards an impending(?) Fed hike in policy rates.

    In the lower panel we have the covariance terms, all three of them, once estimated assuming CCC (solid) and once estimate assuming DCC (broken). For the covariance between the medium and long term bonds it does not matter much if you assume constant or dynamic correlation matrix. It does matter for the covariance terms of the SPY with the bonds however. For example, a DCC-based covariance matrix argue for almost zero covariance between stocks and bonds during mid 2013, while CCC-based suggest a more negative covariance during that period. Which is it, constant or dynamic, potentially matters a great deal for cross asset portfolio construction.

    Side notes

  • We rely in estimation heavily on the code written by Alexios Ghalanos.
  • There is a more complete package than the one I used here, which is more suitable for multivariate estimation [library(rmgarch)]. What I did here has a didactic flavor and is not as efficient as can be.
  • 4 comments on “Multivariate volatility forecasting (1)”

    1. Super interesting post. Where, in the R Code, is the “TT” variable defined? (that would be the number of rows for the Uvolatility_fit object).



    2. What you have implemented is not a “genuine” DCC but, opposite, an approximation of the DCC model by Engle (2002). In fact, you are not estimating dynamic conditional correlations by maximum likelihood, whether you calibrate them with a rolling estimator.
      One of the elements leading to the diffusion of DCC is the possiblity of adopting a multi-stage estimation approach: first the variances on the marginals by univariate GARCH; then, the correlation dynamic which is driven by only two parameters in the scalar DCC representation.
      Then, you should anticipate my question: why didn’t you implement the estimation of the DCC model?
      If you tackle this issue, I would also suggest to have a look at Aielli (2013), JBES.

      1. Thanks for the reference. From there (page 287) : “… the joint quasi-maximum-likelihood (QML) estimation of the DCC model is infeasible for large N. As a feasible estimator, Engle (2002) suggested a three-step procedure called DCC estimator.” As you write: first the variances on the marginals by univariate GARCH. In the code, this is “Uvolatility_fit” (U for univariate, second code snippet) and only then accounting for dynamic correlation.

    Leave a Reply

    Your email address will not be published. Required fields are marked *