Advances in post-model-selection inference (2)

In the previous post we reviewed a way to handle the problem of inference after model selection. I recently read another related paper which goes about this complicated issues from a different angle. The paper titled ‘A significance test for the lasso’ is a real step forward in this area. The authors develop the asymptotic distribution for the coefficients, accounting for the selection step. A description of the tough problem they successfully tackle can be found here.

The usual way to test if variable (say variable j) adds value to your regression is using the F-test. We once compute the regression excluding variable j, and once including variable j. Then we compare the sum of squared errors and we know what is the distribution of the statistic, it is F, or \chi^2, depends on your initial assumptions, so F-test or \chi^2-test. These are by far the most common tests to check if a variable should or should not be included. Problem arises if you search for variable j beforehand.

If you chose variable j from a pool of possible variables to add, the statistic is no longer distributed as you think, not F and not \chi^2. To continue with the usual significance testing using F-test for example after model selection is a blunt mistake. You think you will can have a 95% confidence level, but in fact it is much much lower. In the paper ‘A significance test for the lasso’ the authors show how to use the LASSO solution to construct a quantity (their equation (5)) which has a known asymptotic distribution. I replicate their simulation settings. We have 10 possible variables which non matters (in the simulation we know it, but in reality we do not). Meaning the process is actually:

    \[y_t &=  \varepsilon_t.\]

We check which is best from the 10 available and test if it is significant. Have a look at the next figure which replicates their Figure (1) in the actual paper:
post model selection inference
Upper panel shows a quantile-quantile plot of the statistic computed in the usual erroneous way. You can see it is not at all distributed the way it should be. The lower panel shows a quantile-quantile plot with the test statistic the authors develop. The statistic has an exponential distribution. Check the type one error rate:

The code to generate the figure is below. But lets have a look at an empirical example. Lets use the idea to check what is important for Industrial Production (U.S data). Consider the interest rate and the unemployment rate, their squares and their interaction. We can now test if anything is useful, accounting for the fact that we a-priori select the one that best fit the data.

Natural start is the usual linear regression:

Looks good, but is it really the case that we need all those asterisks?

Enter covariance test. This is the test they develop in the paper. What makes it a valid test is that they derive the asymptotic distribution (don’t ask me how) which turns out to be exponential.

LASSO for IP
The plot shows the ordering in which the variables enter. Unemployment rate is first (variable 1), following by its squared (variable 4), followed by the interest rate (variable 2), etc. By the way, this sequence is given the fancy name ‘regularization path’.

Now what we do is to test whether the next entry is significant, given the variables already existing in the regression, and also accounting for the greedy selection step:

The first one is important, but the square is not. However, if they are both already in, the interest rate is important, the rest is not. From this, depends on how parsimonious you want to be, you can use variable 1, or the first 3 (but not only 1 and 2). The authors write about this somewhat confusing non-transitive relation:

‘For a general X, however, a given predictor variable may enter the active set more than once along the lasso path, since it may leave the active set at some point. In this case we treat each entry as a separate problem. Therefore, our test is specific to a step in the path, and not to a predictor variable at large.’

So, those p.values should not be taken jointly but as an individual conditional tests. A warm recommendation to read that paper. A link to a working version of the paper is given below. It was eventually published in ‘The Annals of Statistics’.

Code for replicating the figure:

The paper:
A significance test for the lasso
————-
Related book:

Leave a Reply

Your email address will not be published. Required fields are marked *