# R vs MATLAB (Round 3)

At least for me, R by faR. MATLAB has its own way of doing things, which to be honest can probably be defended from many angles. Here are few examples for not so subtle differences between R and MATLAB:

• Writing default arguments in MATLAB is exhausting compared to the amazingly trivial R-way of doing it:

 # In R mysum <- function(x,y=2){ x+y } mysum(6) [1] 8 # Elegant #--------------- # In MATLAB function [result] = mysum(x,y) if nargin <2 y = 2 ; end result=x+y ; end >> mysum(6)   ans =   8 # Irritating, especially if you have # many of those you wish to assign defaults to
• It is my preference not to clutter the directory with too many files (it is a mess as it is), R lets you write all the functions in one script, say ‘Proj_Functions.R’, then you can source that script and voila*. MATLAB forces each function to have its own file. One can claim it is better organized that way, I prefer to be more compact with my directories.
• MATLAB, in a way, forces you to properly document your function. I often forget the arguments of a function I wrote, in R: type ‘function_name’ of the function to print it to screen to remember what you did, in MATLAB: help ‘function_name’. If you wrote the help section, there you have it, if not, go open the file (though you can relatively quickly do that with: edit ‘function_name’).
• R lets you keep on writing your code, while MATLAB forces you to organize it for better readability, see what I mean:

 #In R this is possible: c(1:10)[1:4] [1] 1 2 3 4 #--------------- #In MATLAB this is not possible: [1:10](1:4) % [1:10](1:4) % | % Error: Unbalanced or unexpected parenthesis or bracket You need to: temp = [1:10] temp(1:4)
• Again, I see the rational behind these delays, shortcuts can cause issues down the road, but perhaps not bad where appropriate, it is a matter of preference.

• Another thing which I find convenient in R and awkward in MATLAB is extracting singular result from a multiple outputs function. For example the function ?dm.test in ‘forecast’ package returns 6 different outputs, including the statistic and p.value. If you only need the p.value, you can use the “$” operator to extract it. In contrast, you need to let MATLAB know that you do not need: ?Download as.txt  temp = dm.test(residuals(f1),residuals(f2),h=1)$p.value # this is not the dm.test function in MATLAB, just an illustration. [~,~,~,p.value] = dm.test(residuals1,residuals2,h) # The "~" sign means to skip this output # until you get to the p.value which is output number 4 in the function
• In MATLAB, you can forget about mixing different kinds of classes in the same matrix, which is a breeze in R, we just use data.frame:

 mat <- data.frame(string=rep('string',2),numeric=c(1,1)) mat string numeric 1 string 1 2 string 1

I don’t think this is possible in MATLAB, only workarounds using probably ‘cell’ class.

• I like to work slow and to run each line separately, R lets you do that without selecting the line each time, it is enough that the cursor is there.
• Printing to screen is well thought of in R, less so in MATLAB:

 ### In R   # Row Vector: t(c(1:100)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [1,] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [1,] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [1,] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66] [,67] [,68] [,69] [1,] 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86] [1,] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99] [,100] [1,] 87 88 89 90 91 92 93 94 95 96 97 98 99 100 #Fine # Column Vector: > c(1:100) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 # Elegant #--------------- ### In MATLAB # Row Vector: >> [1:100]   ans =   Columns 1 through 20   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20   Columns 21 through 40   21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40   Columns 41 through 60   41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60   Columns 61 through 80   61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80   Columns 81 through 100   81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 #ok.. sort of. # Column Vector: [1:100]' 1 2 . . . 99 100 # distasteful

Got to enjoy the little things e?

As a final word, I predict that in the future R will simply dominate all other paid competitive software. Think about it, how the hell can these firms compete with the open source community? Thinking about labor productivity, those commercial firms need to be more efficient than the thousands (it will be tens of thousands) programmers that are so passionate to give back, they are working for free. Moving forward, pushing the boundaries, for free. Don’t let it go over your head, see how the workflow is smoothed slowly but surely. From the clunky carriage it used to be to the glistening porsche it is going to be. Work in progress include (but not limited to) R-studio (Hadley), Knitr (Yihui Xie), Googleivs (Markus). Recently I used some tools that were developed for laymen like myself to be able to make my code public: devtools. The R-team, talented experienced professionals lubricating those tools to allow the safe service of thousands of packages written to facilitate applications with whatever state-of-the-art statistic and econometric techniques you can think of. Books are slowly becoming free as well: a good example is Forecasting: principles and practice which demonstrates the potential of the platform (the book is excellent as expected). Here are some other books posted by Francis Diabold.

Monetary compensation is secondary at best. There are many areas where ‘old-school’ business plans have to be reconsidered. The future is here, it is just not widely distributed yet (William Gibson). There are (way) too many altruistic selfless professionals giving their time for anyone to stop it.

*You can source few scripts: sapply(FilesToLoad,source,.GlobalEnv).

# Non-linear beta

If you google-finance AMZN you can see the beta is 0.93. I already wrote in the past about this illusive concept. Beta is suppose to reflect the risk of an instrument with respect for example to the market. However, you can estimate this measure in all kind of ways. Continue reading

# Bias vs. Consistency

Especially for undergraduate students but not just, the concepts of unbiasedness and consistency as well as the relation between these two are tough to get one’s head around. My aim here is to help with this. We start with a short explanation of the two concepts and follow with an illustration.

# Bootstrap Critisim (example)

In a previous post I underlined an inherent feature of the non-parametric Bootstrap, it’s heavy reliance on the (single) realization of the data. This feature is not a bad one per se, we just need to be aware of the limitations. From comments made on the other post regarding this, I gathered that a more concrete example can help push this point across.

# Detecting bubbles in real time

Recently, we hear a lot about a housing bubble forming in UK. Would be great if we would have a formal test for identifying a bubble evolving in real time, I am not familiar with any such test. However, we can still do something in order to help us gauge if what we are seeing is indeed a bubbly process, which is bound to end badly.

# Bootstrap criticism

The title reads Bootstrap criticism, but in fact it should be Non-parametric bootstrap criticism. I am all in favour of Bootstrapping, but I point here to a major drawback.

# My favourite statistician

We are all standing on the shoulders of giants. Bradley Efron is one such giant. With the invention of the bootstrap in 1979 and later with his very influential 2004 paper about the Least Angle Regression (and the accompanied software written in R). Continue reading

# R vs Matlab (round 2)

R takes it. I prefer coding in R over Matlab. I feel R understands that I do not like to type too much. A few examples:

When you are busy with a lengthy project, like writing a paper, you create many objects along the way. Every time you log into the project, you need to remember what is what. In the past, each new working session I used to rerun the script anew and follow what each line is doing until I get back the objects I need and continue working. Apart from helping you remember what you are doing, it is very useful for reproducibility, at least given your data, in the sense that you are sure nothing is overrun using the console and it is all there. Those days are over.

# Omitted Variable Bias

Frequently, we see the term ‘control variables’. The researcher introduces dozens of explanatory variables she has no interest in. This is done in order to avoid the so-called ‘Omitted Variable Bias’.
In general, OLS estimator has great properties, not the least important is the fact that for a finite number of observations you can faithfully retrieve the marginal effect of X on Y, that is . This is very much not the case when you have a variable that should be included in the model but is left out. As in my previous posts about Multicollinearity and heteroskedasticity, I only try to provide the intuition since you are probably familiar with the result itself.

# Stocks with upside potential

THIS IS NOT INVESTMENT ADVICE. ACTING BASED ON THIS POST MAY, AND IN ALL PROBABILITY WILL, CAUSE MONETARY LOSS.

Quantile regression is now established as an important econometric tool. Unlike mean regression (OLS), the target is not the mean given x but some quantile given x. You can use it to find stocks that present good upside potential. You may think it has to do with the beta of a stock, but the beta is OLS-related, and is symmetric. High-beta stock rewards with an upside swing if the market spikes but symmetrically, you can suffer a large draw-down when the market drops. This is not an upside potential.