Bootstrapping time series – R code

Bootstrapping in its general form (“ordinary” bootstrap) relies on IID observations which staples the theory backing it. However, time series are a different animal and bootstrapping time series requires somewhat different procedure to preserve dependency structure.

In what follows I use the Misery index as the time series for demonstration. Inflation and unemployment numbers concerning the Eurozone (17 countries) can be found here. In order to bootstrap time series, here are two things you can do:

1. Bootstrap based on IID innovations
The idea is to estimate the model, and then use the residuals that are, by construction, close to being independent. Bootstrap these residuals and “back out” the observations using your estimated parameters. Like so:

This is an example of one realization:

One Bootstrap Realization

One Bootstrap Realization

As you can see the variance of the realization seems a bit high compared with the original. For a real application I would try to get something closer to real series, this can be achieved via:

That’s that for the first option, quite simple but..
– you need to specify a model so that you can obtain the residuals.
– might be that even after “scaled.res = scale(ar1$res)” IID assumption is still false, which brings us to the second option:

2. Block Bootstrap (or MBB for moving block bootstrapping)
Essentially, we cannot sample the data directly because we lose the dependency structure. Solution is to sample whole blocks and concatenate them, in contrast to a single observation at a time.

Good idea yet with its own issues. For example, what’s best block size? too small or too large block size will fail to properly replicate the behavior of the original series and will mess up inference. Are blocks allowed to overlap? How boundaries should be treated, where there are no observation left to “complete” the block. For more details:

Resampling Methods for Dependent Data (Springer Series in Statistics)

Price: $157.72

5.0 out of 5 stars (2 customer reviews)

31 used & new available from $157.72

Here I show the Politis and Romano Idea (1994) AKA stationary bootstrap, applicable to stationary and weakly dependent data. Blocks are of a random size. Block size is a random variable in itself coming from a geometric distribution with some expected block size which should be determined by you.
There is more than one way to code this, I think the following is the easiest:

Thanks for reading.

1. As usual, there are packages out there and while I did dive in the code of function “tsboot”, it’s hard to figure out the subject without coding it, and even harder (impossible) to customize it to do what you need (also as usual). You don’t need to trust me on this one, type “page(tsboot)“ and have a look. “boot” is an excellent package, but as such, very general.
2. Here I demonstrated the methods using AR(p) models, but it is of course applicable to whatever model you can generate residuals from.


Subsampling (Springer Series in Statistics)

Price: $94.01

5.0 out of 5 stars (2 customer reviews)

38 used & new available from $94.01

New Introduction to Multiple Time Series Analysis

Price: $48.97

4.8 out of 5 stars (11 customer reviews)

63 used & new available from $45.97

3 thoughts on “Bootstrapping time series – R code

  1. Hello,

    Your article is interesting, but i have one question especially for the stationnary bootstrap, the choose of block length is crucial how do you determine it just by using acf?


    • For the stationary bootstrap we need only the AVERAGE block size, so it is (somewhat) less crucial. Similar issue can be found when determining the window size in smoothing procedures. Possible solution is to tailor the choice to what you need using data-driven methods, e.g., if you deal with forecasting, use a validation set and see which Block Length has best out-of-sample performance and use that for the test set.

  2. Thank you very much for sharing this. I would like to point out that for the sieve bootstrap, the column-wise loop can be vectorized, resulting in much faster code:[,] <- sample(na.omit(scaled.res),(n-numlags)*R, replace = T)[1:numlags,] <- b1 # for the first obs we plug the original data

    for (j in (numlags+1):n)[j,] <- ar1$x.intercept + ar1$ar%*[(j-1):(j-numlags),,drop=FALSE] +[(j-numlags),]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">