R vs MATLAB (Round 3)

At least for me, R by faR. MATLAB has its own way of doing things, which to be honest can probably be defended from many angles. Here are few examples for not so subtle differences between R and MATLAB:

  • Writing default arguments in MATLAB is exhausting compared to the amazingly trivial R-way of doing it:

    ?Download as.txt
    # In R
    mysum <- function(x,y=2){
    x+y
    }
    mysum(6)
    [1] 8
    # Elegant
    #---------------
    # In MATLAB
    function [result] = mysum(x,y)
      if nargin <2 
        y = 2 ;
      end
      result=x+y ;
    end
    >> mysum(6)
     
    ans =
     
         8
    # Irritating, especially if you have 
    # many of those you wish to assign defaults to
  • It is my preference not to clutter the directory with too many files (it is a mess as it is), R lets you write all the functions in one script, say ‘Proj_Functions.R’, then you can source that script and voila*. MATLAB forces each function to have its own file. One can claim it is better organized that way, I prefer to be more compact with my directories.
  • MATLAB, in a way, forces you to properly document your function. I often forget the arguments of a function I wrote, in R: type ‘function_name’ of the function to print it to screen to remember what you did, in MATLAB: help ‘function_name’. If you wrote the help section, there you have it, if not, go open the file (though you can relatively quickly do that with: edit ‘function_name’).
  • R lets you keep on writing your code, while MATLAB forces you to organize it for better readability, see what I mean:

    ?Download as.txt
    #In R this is possible:
    c(1:10)[1:4]
    [1] 1 2 3 4
    #---------------
    #In MATLAB this is not possible:
    [1:10](1:4)
    % [1:10](1:4)
    %      |
    % Error: Unbalanced or unexpected parenthesis or bracket
    You need to:
    temp = [1:10]
    temp(1:4)
  • Again, I see the rational behind these delays, shortcuts can cause issues down the road, but perhaps not bad where appropriate, it is a matter of preference.

  • Another thing which I find convenient in R and awkward in MATLAB is extracting singular result from a multiple outputs function. For example the function ?dm.test in ‘forecast’ package returns 6 different outputs, including the statistic and p.value. If you only need the p.value, you can use the “$” operator to extract it. In contrast, you need to let MATLAB know that you do not need:

    ?Download as.txt
    temp = dm.test(residuals(f1),residuals(f2),h=1)$p.value
     # this is not the dm.test function in MATLAB, just an illustration. 
    [~,~,~,p.value] = dm.test(residuals1,residuals2,h)
    # The "~" sign means to skip this output
    # until you get to the p.value which is output number 4 in the function
  • In MATLAB, you can forget about mixing different kinds of classes in the same matrix, which is a breeze in R, we just use data.frame:

    ?Download as.txt
    mat <- data.frame(string=rep('string',2),numeric=c(1,1))
    mat
     string numeric
    1 string       1
    2 string       1

    I don’t think this is possible in MATLAB, only workarounds using probably ‘cell’ class.

  • I like to work slow and to run each line separately, R lets you do that without selecting the line each time, it is enough that the cursor is there.
  • Printing to screen is well thought of in R, less so in MATLAB:

    ?Download as.txt
    ### In R 
     
    # Row Vector: 
    t(c(1:100))
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18]
    [1,]    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18
         [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
    [1,]    19    20    21    22    23    24    25    26    27    28    29    30    31    32    33    34    35
         [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52]
    [1,]    36    37    38    39    40    41    42    43    44    45    46    47    48    49    50    51    52
         [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66] [,67] [,68] [,69]
    [1,]    53    54    55    56    57    58    59    60    61    62    63    64    65    66    67    68    69
         [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86]
    [1,]    70    71    72    73    74    75    76    77    78    79    80    81    82    83    84    85    86
         [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99] [,100]
    [1,]    87    88    89    90    91    92    93    94    95    96    97    98    99    100
    #Fine
    # Column Vector: 
    > c(1:100)
      [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
     [26]  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50
     [51]  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75
     [76]  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
    # Elegant
    #---------------
    ### In MATLAB 
    # Row Vector: 
    >> [1:100]
     
    ans =
     
      Columns 1 through 20
     
         1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20
     
      Columns 21 through 40
     
        21    22    23    24    25    26    27    28    29    30    31    32    33    34    35    36    37    38    39    40
     
      Columns 41 through 60
     
        41    42    43    44    45    46    47    48    49    50    51    52    53    54    55    56    57    58    59    60
     
      Columns 61 through 80
     
        61    62    63    64    65    66    67    68    69    70    71    72    73    74    75    76    77    78    79    80
     
      Columns 81 through 100
     
        81    82    83    84    85    86    87    88    89    90    91    92    93    94    95    96    97    98    99   100
    #ok.. sort of.
    # Column Vector: 
    [1:100]'
    1
    2
    .
    .
    .
    99
    100
     # distasteful

Got to enjoy the little things e?

As a final word, I predict that in the future R will simply dominate all other paid competitive software. Think about it, how the hell can these firms compete with the open source community? Thinking about labor productivity, those commercial firms need to be more efficient than the thousands (it will be tens of thousands) programmers that are so passionate to give back, they are working for free. Moving forward, pushing the boundaries, for free. Don’t let it go over your head, see how the workflow is smoothed slowly but surely. From the clunky carriage it used to be to the glistening porsche it is going to be. Work in progress include (but not limited to) R-studio (Hadley), Knitr (Yihui Xie), Googleivs (Markus). Recently I used some tools that were developed for laymen like myself to be able to make my code public: devtools. The R-team, talented experienced professionals lubricating those tools to allow the safe service of thousands of packages written to facilitate applications with whatever state-of-the-art statistic and econometric techniques you can think of. Books are slowly becoming free as well: a good example is Forecasting: principles and practice which demonstrates the potential of the platform (the book is excellent as expected). Here are some other books posted by Francis Diabold.

Monetary compensation is secondary at best. There are many areas where ‘old-school’ business plans have to be reconsidered. The future is here, it is just not widely distributed yet (William Gibson). There are (way) too many altruistic selfless professionals giving their time for anyone to stop it.

*You can source few scripts: sapply(FilesToLoad,source,.GlobalEnv).

Bootstrap Critisim (example)

In a previous post I underlined an inherent feature of the non-parametric Bootstrap, it’s heavy reliance on the (single) realization of the data. This feature is not a bad one per se, we just need to be aware of the limitations. From comments made on the other post regarding this, I gathered that a more concrete example can help push this point across.
Continue reading

Detecting bubbles in real time

Recently, we hear a lot about a housing bubble forming in UK. Would be great if we would have a formal test for identifying a bubble evolving in real time, I am not familiar with any such test. However, we can still do something in order to help us gauge if what we are seeing is indeed a bubbly process, which is bound to end badly.
Continue reading

Comments on Comments in R

When you are busy with a lengthy project, like writing a paper, you create many objects along the way. Every time you log into the project, you need to remember what is what. In the past, each new working session I used to rerun the script anew and follow what each line is doing until I get back the objects I need and continue working. Apart from helping you remember what you are doing, it is very useful for reproducibility, at least given your data, in the sense that you are sure nothing is overrun using the console and it is all there. Those days are over.
Continue reading

Omitted Variable Bias

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.
In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is E(\widehat{\beta}) = \beta. This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.
Continue reading

Stocks with upside potential

THIS IS NOT INVESTMENT ADVICE. ACTING BASED ON THIS POST MAY, AND IN ALL PROBABILITY WILL, CAUSE MONETARY LOSS.

Quantile regression is now established as an important econometric tool. Unlike mean regression (OLS), the target is not the mean given x but some quantile given x. You can use it to find stocks that present good upside potential. You may think it has to do with the beta of a stock, but the beta is OLS-related, and is symmetric. High-beta stock rewards with an upside swing if the market spikes but symmetrically, you can suffer a large draw-down when the market drops. This is not an upside potential.
Continue reading

Bayesian vs. Frequentist in Practice

Rivers of ink have been spilled over the ‘Bayesian vs. Frequentist’ dispute. Most of us were trained as Frequentists. Probably because the computational power needed for Bayesian analysis was not around when the syllabus of your statistical/econometric courses was formed. In this age of tablets and fast internet connection, your training does not matter much, you can easily transform between the two approaches, engaging the right webpages/communities. I will not talk about the ideological differences between the two, or which approach is more appealing and why. Larry Wasserman already gave an excellent review.
Continue reading

Understanding Multicollinearity

Roughly speaking, Multicollinearity occurs when two or more regressors are highly correlated. As with heteroskedasticity, students often know what does it mean, how to detect it and are taught how to cope with it, but not why is it so. From Wikipedia: “In this situation (Multicollinearity) the coefficient estimates may change erratically in response to small changes in the model or the data.” The Wikipedia entry continues to discuss detection, implications and remedies. Here I try to provide the intuition.
Continue reading