Random Forests (RF from here onwards) is a widely used pure-prediction algorithm. This post assumes good familiarity with RF. If you are not familiar with this algorithm, stop here and see the first reference below for an easy tutorial. If you used RF before and you are familiar with it, then you probably encountered those “importance of the variables” plots. We start with a brief explanation of those plots, and the concept of importance scores calculation. Main takeaway from the post: don’t use those importance scores plots, because they are simply misleading. Those importance plots are simply a wrong turn taken by our human tendency to look for reason, whether it’s there or it’s not there.
Courtesy of R Consortium, you can view my forecast combination talk (16 mins) given in France few months ago, below.
The useR! 2019 held in Toulouse ended couple of days ago.
The R Journal is the open access, refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that should be of interest to users or developers of R.
Christoph Weiss, Gernot Roetzer and myself have joined forces to write an R package and the accompanied paper: Forecast Combinations in R using the ForecastComb Package, which is now published in the R journal. Below you can find a few of my thoughts about the journey towards publication in the R journal, and a few words about working with a small team of three, from three different locations.
Just finished reading the paper Stock Market’s Price Movement Prediction With LSTM Neural Networks. The abstract attractively reads: “The results that were obtained are promising, getting up to an average of 55.9% of accuracy when predicting if the price of a particular stock is going to go up or not in the near future.”, I took the bait. You shouldn’t.
The evaluation of volatility models is gracefully complicated by the fact that, unlike other time series, even the realization is not observable. Two researchers would never disagree about what was yesterday’s stock price, but they can easily disagree about what was yesterday’s stock volatility. Because we don’t observe volatility directly, each of us uses own proxy of choice. There are many ways to skin this cat (more on volatility proxy here).
In a previous post Univariate volatility forecast evaluation we considered common ways in which we can evaluate how good is our volatility model, dealing with one time-series at a time. But how do we evaluate, or compare two models in a multivariate settings, with two covariance matrices?
Perhaps it is the different jargon used in different disciplines, not sure. But for some reason, the terms ‘predictions’, ‘forecasts’ and ‘projections’ are frequently used interchangeably.
One of my Ph.D papers was published recently. It deals with yield curve forecasting.
Here is the code for applying the Nelson-Siegel model to any yield curve.
“The Fed is certainly moving forward with plans to normalize interest rates.” We keep on hearing that, we believed it in the past and we believe it now. We believe that the Fed believes and that, in fact, this means something.
Should we become more suspicious and less trusting given history? Let’s take a look.
Overfitting is strongly related to variable selection. It is a common problem and a tough one, best explained by way of example.
In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when you have time for it. The paper which drew my attention is “Quantile Autoregression” found under his research tab, it is a significant extension to the time series domain. Here you will find short demonstration for stuff you can do with quantile autoregression in R.
Five months ago I generated forecasts for the Eurozone Misery index. I used the built-in “FitAR” package in R. Using different models differing in their memory length (how many lags were considered for each model) 24 months ahead forecasts were generated. Might be interesting to see how accurate are the forecasts. The previous post is updated and few bugs corrected in the code. The updated data is public and can be found here. It is the sum of inflation rate and unemployment rate in the Euro-zone area.