There are many statistical methods that are problematic to use for volatility modelling, exactly because the results are not guaranteed to be valid. The new reparametrization opens the door for all those methods.

Say you have a 3 by 3 matrix, the 3 off-diagonal elements could be, courtesy of the new proposal, stacked and modeled\estimated\predicted **individually**. Once you are done, it is assured that you get a valid correlation matrix. This is in contrast to working with the matrix “as a whole” (a la exponentially moving average), or coming up with underlying process’ suggestions (a la GARCH) so as to guarantee a valid result (reminder: symmetric and positive semi definite).

The most straightforward method I can think of, which was never before relevant for volatility modelling is linear (or non-linear) regression and variants thereof. There was never a point in applying a regression (or ridge regression, or LASSO regression) to the individual entries of the covariance matrix if, at the end of it, the result would not be invertible. But now it is possible using the new proposed parameterization.

The paper “A New parameterization of Correlation Matrices” is short and clear. Code is included and the numerical algorithm is fast enough for practical purposes. Code from the paper is below.

You need to have the packages `fBasics`

and `expm`

installed to be able to execute this code.

library(expm) library(fBasics) GFT_inverse_mapping <- function (gamma_in, tol_value) { C <- matrix (,nrow = 0, ncol = 0) iter_number <- -1 # Check if input is of proper format : gamma is of suitable length # and tolerance value belongs to a proper interval n <- 0.5 * (1+ sqrt(1+ 8*length(gamma_in)) ) if (!(is.vector(gamma_in) && n %% 1 == 0)) { stop("Dimension of 'gamma' is incorrect") } else if (!(tol_value >= (10)^(-10) && tol_value <= (10)^(-4)) ) { stop("Incorrect tolerance value") } else { # Place elements from gamma into off-diagonal parts # and put zeros on the main diagonal of nxn symmetric matrix A A <- matrix (0, nrow=n, ncol=n) A[upper.tri(A, diag = FALSE)] <- gamma_in A <- A + t(A) # Read properties of the input matrix diag_vec <- diag(A) # Iterative algorithm to get the proper diagonal vector dist <- sqrt(n) while (dist > sqrt(n) * tol_value) { diag_delta <- log(diag(expm(A))) diag_vec <- diag_vec - diag_delta diag(A) <- diag_vec dist <- norm(diag_delta, type = "2") # spectral norm, largest eigenvalue iter_number <- iter_number + 1 } # Get a unique reciprocal correlation matrix C <- expm(A) diag (C) <- rep(1, n) } return(list(C = C, iter_number = iter_number)) }

**Example**

Assume you have a 5X5 correlation matrix, so 10 off-diagonal elements. Simulate from a random normal the diagonal, and verify that the mapping proposed in the aforementioned paper indeed result in a valid correlation matrix.

TT <- 10 off_diag <- rnorm(TT, mean=0, sd= 3) C <- GFT_inverse_mapping(off_diag, tol_value= 10^(-10)) C isPositiveDefinite(C$C) $C [,1] [,2] [,3] [,4] [,5] [1,] 1.0000 0.817 -0.820 0.856 0.0351 [2,] 0.8170 1.000 -0.427 0.959 0.6037 [3,] -0.8203 -0.427 1.000 -0.415 0.4164 [4,] 0.8557 0.959 -0.415 1.000 0.4941 [5,] 0.0351 0.604 0.416 0.494 1.0000 $iter_number [1] 77 > isPositiveDefinite(C$C) [1] TRUE

If you don’t get a positive definite result try reducing the tolerance level `tol_value`

.

* See here for the rationale.

I recently published what I hope is an easy read for all of you modern-statistics ~~geeks~~ lovers; explaining the thrust behind this machine-learning class of models.

You can download the two-pager from Significance, specifically here (subscription required).

You may also like a previous Machine learning is simply statistics post.

]]>The first point made in the “

Using the bootstrap standard deviation as a plug-in estimate is a common mistake, why? For the special case of the mean it’s not a mistake, so perhaps people presumed that if it’s good for the mean, it’s good for other statistics as well (but it is not). More importantly, it’s an easy mistake to make because it’s so straightforward to do. The second point the authors make is to prove that if you use an estimate based on the second moment of the bootstrap, you are being overly conservative.

To illustrate this mistake that we are talking about – using the bootstrap standard deviation as a stand-in for the real standard deviation – consider the median as our statistic of interest; in this special case of the median wikipedia tells us what is the asymptotic variance of the estimate. This following figure shows the asymptotic variance versus using the bootstrap variance as an estimate (I use the median of exponential distribution, replication code for the figure in the end of this post). The dashed green line is the bootstrap estimate, while the density shows the distribution of the asymptotic variance of the estimate. You can see that the bootstrap estimate is on the high side.

Why should we consider this to be good news for the bootstrap? I am happy you ask.

For all inference which relied wrongly on the bootstrap standard deviation there are two options: (1) Rejection of the null hypothesis, or (2) a failure to reject the null hypothesis. The Econometrica paper shows an **overestimate** of the variance estimate based on the bootstrap, which means a **rejection is harder to achieve**. So all those hypotheses that were rejected are nonetheless certified, since going back and correcting the mistake will make it even easier to reject. Those researchers who failed to reject now have a chance for a test-retake, knowing that their inference was overly conservative. Personally speaking, it is easy to use the bootstrap standard deviation as an estimate, so I think I will simply keep doing that, even though I know I am theoretically wrong. Next time I will need inference I would not mind knowing that I am being overly conservative. In my opinion it’s a cheap price to pay for the bootstrap computational convenience. Good.

library(magrittr) TT <- 500 ratee <- 1 x <- rexp(TT, rate= ratee) # bootstrap rr <- 1000 boot_mat <- matrix(ncol= rr, nrow= TT) for (i in seq_len(rr)){ boot_mat[,i] <- sample(x, replace= T) } boot_med <- apply(boot_mat, 2, median) density(boot_med) %>% plot abline(v = log(2)/ratee, lty= "dashed", lwd=3) boot_med %>% var Asym_var <- 1/(4*TT*(ratee*exp(-ratee*boot_med))^2 ) density(Asym_var, adjust = 3) %>% plot(main="", ylab="", yaxt= "n", lwd=2) abline(v = var(boot_med), lty= "dashed", lwd=3, col= "green") abline(v = mean(Asym_var), lty= "dashed", lwd=3)]]>

Asking good questions is very important. Most researchers are feedback-thirsty. They are heavily invested in the conference, potentially traveling many hundreds, thousands even, of kilometers to attend. Therefore bad questions are annoying, not only for the presenters but also from the audience standpoint; any question through the gate comes unavoidably at the expense of other dormant questions from the rest of the audience. Here below are few thoughts I put on paper many years ago and have decided to share it here as it’s as relevant as always.

**Bad practice: being inconsiderate **

**Good practice: be helpful**

Both of the above are useful for the presenter as it helps her figure out which direction questions come from, so that she can better clarify it in the actual text.

Although written with conference sessions in mind, most of the above is relevant in general, and can be concisely summarized: be considerate and helpful.

As Paul R. Halmos wrote, “Do, please, as I say, and not as I do.” I too have trouble following my own advice, but awareness is always a solid first step. experience.

]]>The way to automate your workflow startup process is via the command

`shell.exec`

. Here is how you can use it to open whatever it is you need:library(magrittr) a_pdf <- "path to pdf" shell.exec(a_pdf) a_tex_file <- "path to tex" shell.exec(a_tex_file) shell.exec("Path to your note taking program.exe file") shell.exec("Path to you-get-the-idea.exe file")

I imagine you don’t often move around files once they are saved where they should be saved, so those paths are fairly fixed. You can use the a tip given in a previous post to quickly reverse the backslashes before pasting the path into your code editor.

You can open multiple files for the same application (e.g. multiple pdfs). You can also rework the code for a bit more elegance:

voila <- list(a_pdf, a_tex_file, application1.exe, application2.exe) voila %>% lapply(shell.exec)

Open your default browser with the pages you use most. Those few lines should help you feel comfortable clearing your web cache and data saved by aggressive browsers, your starting point is here:

url1 <- "https://something something 1" url2 <- "https://something something 2" url3 <- "https://take me back to my gmail please" url_list <- list(url1, url2, url3) url_list %>% lapply(browseURL)

For Python users, you can use the `subprocess`

module to do the same as.

import subprocess subprocess.call(['C:\Program Files\Mozilla Firefox\\firefox.exe'])]]>