Background

  • We want the code to be interpetable by ourselves and other humans (e.g. code handovers).
  • Clear code makes errors easier to spot.
  • Clear code makes for an easy review.

Unlike the usual metric on which “good” is usually measured by when it comes to coding: good = efficient, here the metric would be different: good = friendly. As in literate programming paradigm.

There is a difference between coding for research and coding for operation. This document serves as a proposal for some good common coding practice, generally, rather than operationally.

“Code is more often read than written.” — Guido van Rossum

“It doesn’t matter how good your software is, because if the documentation is not good enough, people will not use it.” - Daniele Procida

Code style

Style your code such that it is not ugly-looking. Use the following coding convention*:

make your names meaningful and consistent with what may be found elsewhere. If you are working with cross-sectional time series it is customary to use the letter T for the time dimension and a letter like P or K to the cross section (firms, people, countries) dimension. Don’t be creative. If the letter T is “occupied”, use a workaround like TT, tT, T0.

  • Name your dimensions clearly.

If your object is a matrix, name the columns (e.g. variable’s name) and name the rows (e.g. index, or dates).

  • Try to report as you go along

For example, use commands like dim in R, or shape in python to print the dimension of the object to the console. Create plots even, where it makes sense.

  • Use comments
    • Comments usually don’t burden the RAM, and so it is very economical way to enhance the readability of your code.
    • You can comment commands which are optional for the readers. E.g, printing or tentative plots.
    • If you save a CSV for example, which needs to be later loaded, comment clearly on what should be returned. For example:
  • Split your code
  • A lot can be done with one-liners, but one-liners are less readable. Trade elegance for readability. When possible, make your code more modular.
  • You can use a convention which suits you for variables which are “temporary” (which you only need for the sake of readability). For example in Python you can use an underscore, _, as a prefix or prefix with a temp_.
  • Splitting your code also helps to avoid super long and unreadable lines.

For example:

Or in Python:

* There is a difference between coding for research and coding for operation. This document serves as a proposal for some good common coding practice, generally, rather than operationally.

Functions

  • Documentation of a function:
    • What are the inputs? what is the class (or type)? what are the dimensions (if relevant)?
    • What is the function doing? When algorithms are used, especially complicated ones, it can be useful to explain how the algorithm works or how it’s implemented within your code. It may also be appropriate to describe why a specific algorithm was selected over another
    • What is the output? What are the dimensions (if relevant)?

In R:

In Python:

  • Setting default arguments

Set default arguments only if it is strictly obvious what the argument should be. Otherwise force the user to explicitly specify the choice.

  • Add checks and assertions

Try to prevent situations where the user is unaware of some particular software behaviour. E.g, if the user would like to get the mean of a vector, make sure it is a vector format which is taken as an input:

In R:

In Python:

  • Avoid nested functions

Unless there is a very good reason for it, don’t create a function inside another function. It is much more complicated to understand, and to debug. DO NOT CREATE A MONSTER MOTHER FUNCTION which does everything in “one click”.

References