1 We start with a few needed concepts

What is an Integrated Development Environment?

It is a software; tools which make it easy work with software applications (buttons, syntax highlighting)


What is a compiler?

A program that translates the source code written in high-level programming language into low-level machine code (binary –> 0,0,1,0,1,1,……)


What are Packages (R) and Modules (Python)?

Code libraries (think folders, simply)


What is an environment?

State of a computer, determined by a combination of software, basic hardware, and which programs are running

Environments’ misalignment is a generous source of difficult bugs!


2 IDE of choice?

“INVEST IN PEOPLE NOT BUSINESSES” WARREN BUFFETT

Meet J.J. Allaire. He is one of the most influential people you have never heard of:

J.J. is the founder of Rstudio and the creator of the reticulate package which we rely on.

3 Whisky - It’s already Five O’Clock Somewhere

86 malt whiskies are scored between 0-4 for 12 different taste categories including sweetness, smoky, nutty etc. Additionally, coordinates of the distilleries.

4 R and Python = Rython (demo)

library(data.table)

whisky <- fread("http://outreach.mathstat.strath.ac.uk/outreach/nessie/datasets/whiskies.txt", 
    data.table = F)
class(whisky)
> [1] "data.frame"
dim(whisky)
> [1] 86 17
head(whisky, 3)
>   RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 1     1  Aberfeldy    2         2     2         0       0     2     1     2
> 2     2   Aberlour    3         3     1         0       0     4     3     2
> 3     3     AnCnoc    1         3     2         0       0     2     0     0
>   Nutty Malty Fruity Floral  Postcode Latitude Longitude
> 1     2     2      2      2 \tPH15 2EB   286580    749680
> 2     2     3      3      2 \tAB38 9PJ   326340    842570
> 3     2     2      3      2  \tAB5 5LI   352960    839320
tail(whisky, 3)
>    RowID   Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 84    84    Tomintoul    0         3     1         0       0     2     2     1
> 85    85      Tormore    2         2     1         0       0     1     0     1
> 86    86 Tullibardine    2         3     0         0       1     0     2     1
>    Nutty Malty Fruity Floral Postcode Latitude Longitude
> 84     1     2      1      2 AB37 9AQ   315100    825560
> 85     2     1      0      0 PH26 3LR   315180    834960
> 86     1     2      2      1  PH4 1QG   289690    708850
library(reticulate)
r_to_py(whisky, convert = F)
>     RowID    Distillery  Body  ...    Postcode  Latitude  Longitude
> 0       1     Aberfeldy     2  ...  \tPH15 2EB    286580     749680
> 1       2      Aberlour     3  ...  \tAB38 9PJ    326340     842570
> 2       3        AnCnoc     1  ...   \tAB5 5LI    352960     839320
> 3       4        Ardbeg     4  ...  \tPA42 7EB    141560     646220
> 4       5       Ardmore     2  ...  \tAB54 4NH    355350     829140
> ..    ...           ...   ...  ...         ...       ...        ...
> 81     82     Tobermory     1  ...    PA75 6NR    150450     755070
> 82     83       Tomatin     2  ...    IV13 7YT    279120     829630
> 83     84     Tomintoul     0  ...    AB37 9AQ    315100     825560
> 84     85       Tormore     2  ...    PH26 3LR    315180     834960
> 85     86  Tullibardine     2  ...     PH4 1QG    289690     708850
> 
> [86 rows x 17 columns]

import pandas

# type(r.whisky) # only ok in interactive mode

print( type(r.whisky) )
> <class 'pandas.core.frame.DataFrame'>
r.whisky.shape
> (86, 17)
r.whisky.head(3)
>    RowID Distillery  Body  Sweetness  ...  Floral    Postcode  Latitude  Longitude
> 0      1  Aberfeldy     2          2  ...       2  \tPH15 2EB    286580     749680
> 1      2   Aberlour     3          3  ...       2  \tAB38 9PJ    326340     842570
> 2      3     AnCnoc     1          3  ...       2   \tAB5 5LI    352960     839320
> 
> [3 rows x 17 columns]
r.whisky.tail(3)
# dir(r.whisky)
>     RowID    Distillery  Body  Sweetness  ...  Floral  Postcode  Latitude  Longitude
> 83     84     Tomintoul     0          3  ...       2  AB37 9AQ    315100     825560
> 84     85       Tormore     2          2  ...       0  PH26 3LR    315180     834960
> 85     86  Tullibardine     2          3  ...       1   PH4 1QG    289690     708850
> 
> [3 rows x 17 columns]
r.whisky.head()
>    RowID Distillery  Body  Sweetness  ...  Floral    Postcode  Latitude  Longitude
> 0      1  Aberfeldy     2          2  ...       2  \tPH15 2EB    286580     749680
> 1      2   Aberlour     3          3  ...       2  \tAB38 9PJ    326340     842570
> 2      3     AnCnoc     1          3  ...       2   \tAB5 5LI    352960     839320
> 3      4     Ardbeg     4          1  ...       0  \tPA42 7EB    141560     646220
> 4      5    Ardmore     2          2  ...       1  \tAB54 4NH    355350     829140
> 
> [5 rows x 17 columns]

I am not sure what I want to look at. Let’s see which columns we have:

pandas.set_option('display.max_columns', None) # to show all columns
r.whisky.describe()
>            RowID       Body  Sweetness      Smoky  Medicinal    Tobacco  \
> count  86.000000  86.000000  86.000000  86.000000  86.000000  86.000000   
> mean   43.500000   2.069767   2.290698   1.534884   0.546512   0.116279   
> std    24.969982   0.930410   0.717287   0.863613   0.990032   0.322439   
> min     1.000000   0.000000   1.000000   0.000000   0.000000   0.000000   
> 25%    22.250000   2.000000   2.000000   1.000000   0.000000   0.000000   
> 50%    43.500000   2.000000   2.000000   1.000000   0.000000   0.000000   
> 75%    64.750000   2.000000   3.000000   2.000000   1.000000   0.000000   
> max    86.000000   4.000000   4.000000   4.000000   4.000000   1.000000   
> 
>            Honey      Spicy      Winey      Nutty      Malty     Fruity  \
> count  86.000000  86.000000  86.000000  86.000000  86.000000  86.000000   
> mean    1.244186   1.383721   0.976744   1.465116   1.802326   1.802326   
> std     0.853175   0.784686   0.932760   0.821730   0.629094   0.779438   
> min     0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   
> 25%     1.000000   1.000000   0.000000   1.000000   1.000000   1.000000   
> 50%     1.000000   1.000000   1.000000   2.000000   2.000000   2.000000   
> 75%     2.000000   2.000000   1.000000   2.000000   2.000000   2.000000   
> max     4.000000   3.000000   4.000000   4.000000   3.000000   3.000000   
> 
>           Floral       Latitude     Longitude  
> count  86.000000      86.000000  8.600000e+01  
> mean    1.697674  287247.162791  8.026597e+05  
> std     0.855017   67889.046814  8.802422e+04  
> min     0.000000  126680.000000  5.542600e+05  
> 25%     1.000000  265672.500000  7.556975e+05  
> 50%     2.000000  319515.000000  8.398850e+05  
> 75%     2.000000  328630.000000  8.507700e+05  
> max     4.000000  381020.000000  1.009260e+06
colnames(whisky)
>  [1] "RowID"      "Distillery" "Body"       "Sweetness"  "Smoky"     
>  [6] "Medicinal"  "Tobacco"    "Honey"      "Spicy"      "Winey"     
> [11] "Nutty"      "Malty"      "Fruity"     "Floral"     "Postcode"  
> [16] "Latitude"   "Longitude"
summary(whisky)
>      RowID       Distillery             Body        Sweetness        Smoky     
>  Min.   : 1.0   Length:86          Min.   :0.00   Min.   :1.00   Min.   :0.00  
>  1st Qu.:22.2   Class :character   1st Qu.:2.00   1st Qu.:2.00   1st Qu.:1.00  
>  Median :43.5   Mode  :character   Median :2.00   Median :2.00   Median :1.00  
>  Mean   :43.5                      Mean   :2.07   Mean   :2.29   Mean   :1.53  
>  3rd Qu.:64.8                      3rd Qu.:2.00   3rd Qu.:3.00   3rd Qu.:2.00  
>  Max.   :86.0                      Max.   :4.00   Max.   :4.00   Max.   :4.00  
>    Medicinal       Tobacco          Honey          Spicy          Winey     
>  Min.   :0.00   Min.   :0.000   Min.   :0.00   Min.   :0.00   Min.   :0.00  
>  1st Qu.:0.00   1st Qu.:0.000   1st Qu.:1.00   1st Qu.:1.00   1st Qu.:0.00  
>  Median :0.00   Median :0.000   Median :1.00   Median :1.00   Median :1.00  
>  Mean   :0.55   Mean   :0.116   Mean   :1.24   Mean   :1.38   Mean   :0.98  
>  3rd Qu.:1.00   3rd Qu.:0.000   3rd Qu.:2.00   3rd Qu.:2.00   3rd Qu.:1.00  
>  Max.   :4.00   Max.   :1.000   Max.   :4.00   Max.   :3.00   Max.   :4.00  
>      Nutty          Malty         Fruity        Floral      Postcode        
>  Min.   :0.00   Min.   :0.0   Min.   :0.0   Min.   :0.0   Length:86         
>  1st Qu.:1.00   1st Qu.:1.0   1st Qu.:1.0   1st Qu.:1.0   Class :character  
>  Median :2.00   Median :2.0   Median :2.0   Median :2.0   Mode  :character  
>  Mean   :1.47   Mean   :1.8   Mean   :1.8   Mean   :1.7                     
>  3rd Qu.:2.00   3rd Qu.:2.0   3rd Qu.:2.0   3rd Qu.:2.0                     
>  Max.   :4.00   Max.   :3.0   Max.   :3.0   Max.   :4.0                     
>     Latitude        Longitude      
>  Min.   :126680   Min.   : 554260  
>  1st Qu.:265672   1st Qu.: 755698  
>  Median :319515   Median : 839885  
>  Mean   :287247   Mean   : 802660  
>  3rd Qu.:328630   3rd Qu.: 850770  
>  Max.   :381020   Max.   :1009260
# Could also use skim or describe library(psych) library(skimr)
# describeData(whisky) describeFast(whisky) skim(whisky)

What do we see?

  • Not too many “Medicinal” whiskies.

  • Tobacco was an idiotic question

5 Short pause: why are we talking about this?

R and Python

R and Python are friends


  • Scripting is very similar. Know one (approximately), know all.
  • Each scripting language has it own (dis)advantages.
  • With the right IDE, you can use both with relative ease.
  • Be opportunistic about it!
  • Examples

Collaboration

No need to code from scratch (R is strong amongst academics, Python has areas with better ecosystem)

Right tool for the right job (are you going to research? deploy? visualize? report?)

6 Rython demo continued

Abusing fundamental statistics, let’s check the linear correlation:

import matplotlib.pyplot as plt
plt.matshow(r.whisky.iloc[:,2:14].corr())
plt.xticks(fontsize=14, rotation=45)
plt.yticks(fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Simple correlation', fontsize=16);
plt.show()

Note python index is from zero.

library(corrplot)
whisky[, 3:14] %>%
    cor %>%
    corrplot(type = "upper", order = "hclust", tl.col = "black", tl.srt = 45)

Let’s find the most “nothing special” whisky:

# The most 'standard'
tmpdf <- whisky[, 3:14]
tmpvar <- (apply(tmpdf, 1, mean))/apply(tmpdf, 1, sd)
tmpvar %>%
    min
> [1] 0.846
tmpmin <- which.min(tmpvar)
whisky[tmpmin, -c(1, 15:17)]
>     Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey Nutty
> 46 Glenfiddich    1         3     1         0       0     0     0     0     0
>    Malty Fruity Floral
> 46     2      2      2
import numpy as np
import pprint
tmpdf = r.whisky.iloc[:,2:14]
# axis= 1 means across rows
tmpvar = tmpdf.apply(np.mean, axis=1) / tmpdf.apply(np.std, axis=1)
tmpvar.min()
> 0.8835412617927486
tmpmin = tmpvar.idxmin()
tmpmin
> 45
tmpidx=  [0, 14,15,16] # how to 14:16 ?
tmpvec = r.whisky.columns.isin(r.whisky.columns[tmpidx])
tmpdf = r.whisky.loc[:, ~tmpvec ]
pprint.pprint(tmpdf.iloc[tmpmin,:]) # how to print as row?
> Distillery    Glenfiddich
> Body                    1
> Sweetness               3
> Smoky                   1
> Medicinal               0
> Tobacco                 0
> Honey                   0
> Spicy                   0
> Winey                   0
> Nutty                   0
> Malty                   2
> Fruity                  2
> Floral                  2
> Name: 45, dtype: object

Let’s find what we like:

In Python
r.whisky[ r.whisky['Smoky'] > 2 ]
>     RowID     Distillery  Body  Sweetness  Smoky  Medicinal  Tobacco  Honey  \
> 3       4         Ardbeg     4          1      4          4        0      0   
> 18     19        Bowmore     2          2      3          1        0      2   
> 21     22       Caol Ila     3          1      4          2        1      0   
> 23     24      Clynelish     3          2      3          3        1      0   
> 34     35    GlenGarioch     2          1      3          0        0      0   
> 53     54  Highland Park     2          2      3          1        0      2   
> 57     58      Lagavulin     4          1      4          4        1      0   
> 58     59       Laphroig     4          2      4          4        1      0   
> 77     78       Talisker     4          2      3          3        0      1   
> 
>     Spicy  Winey  Nutty  Malty  Fruity  Floral    Postcode  Latitude  \
> 3       2      0      1      2       1       0  \tPA42 7EB    141560   
> 18      2      1      1      1       1       2  \tPA43 7GS    131330   
> 21      2      0      2      1       1       1  \tPA46 7RL    142920   
> 23      2      0      1      1       2       0   \tKW9 6LB    290250   
> 34      3      1      0      2       2       2    AB51 0ES    381020   
> 53      1      1      1      2       1       1    KW15 1SU    345340   
> 57      1      2      1      1       1       0    PA42 7DZ    140430   
> 58      0      1      1      1       0       0    PA42 7DU    138680   
> 77      3      0      1      2       2       0    IV47 8SR    137950   
> 
>     Longitude  
> 3      646220  
> 18     659720  
> 21     670040  
> 23     904230  
> 34     827590  
> 53    1009260  
> 57     645730  
> 58     645160  
> 77     831770
r.whisky[ (r.whisky['Smoky'] > 2) & ( r.whisky['Spicy'] > 2 ) ]
>     RowID   Distillery  Body  Sweetness  Smoky  Medicinal  Tobacco  Honey  \
> 34     35  GlenGarioch     2          1      3          0        0      0   
> 77     78     Talisker     4          2      3          3        0      1   
> 
>     Spicy  Winey  Nutty  Malty  Fruity  Floral  Postcode  Latitude  Longitude  
> 34      3      1      0      2       2       2  AB51 0ES    381020     827590  
> 77      3      0      1      2       2       0  IV47 8SR    137950     831770
In R
library(dplyr)
filter(whisky, Smoky > 2)
>   RowID    Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 1     4        Ardbeg    4         1     4         4       0     0     2     0
> 2    19       Bowmore    2         2     3         1       0     2     2     1
> 3    22      Caol Ila    3         1     4         2       1     0     2     0
> 4    24     Clynelish    3         2     3         3       1     0     2     0
> 5    35   GlenGarioch    2         1     3         0       0     0     3     1
> 6    54 Highland Park    2         2     3         1       0     2     1     1
> 7    58     Lagavulin    4         1     4         4       1     0     1     2
> 8    59      Laphroig    4         2     4         4       1     0     0     1
> 9    78      Talisker    4         2     3         3       0     1     3     0
>   Nutty Malty Fruity Floral  Postcode Latitude Longitude
> 1     1     2      1      0 \tPA42 7EB   141560    646220
> 2     1     1      1      2 \tPA43 7GS   131330    659720
> 3     2     1      1      1 \tPA46 7RL   142920    670040
> 4     1     1      2      0  \tKW9 6LB   290250    904230
> 5     0     2      2      2  AB51 0ES   381020    827590
> 6     1     2      1      1  KW15 1SU   345340   1009260
> 7     1     1      1      0  PA42 7DZ   140430    645730
> 8     1     1      0      0  PA42 7DU   138680    645160
> 9     1     2      2      0  IV47 8SR   137950    831770
filter(whisky, Smoky > 2 & Spicy > 2)
>   RowID  Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 1    35 GlenGarioch    2         1     3         0       0     0     3     1
> 2    78    Talisker    4         2     3         3       0     1     3     0
>   Nutty Malty Fruity Floral Postcode Latitude Longitude
> 1     0     2      2      2 AB51 0ES   381020    827590
> 2     1     2      2      0 IV47 8SR   137950    831770

Talisker is a long-time favorite, but will definitely check out Glen Garioch.

Select particular variables
In Python
r.whisky[['Distillery']].head(3)
>   Distillery
> 0  Aberfeldy
> 1   Aberlour
> 2     AnCnoc
In R
select(whisky, "Distillery") %>%
    head(3)
>   Distillery
> 1  Aberfeldy
> 2   Aberlour
> 3     AnCnoc
# Or
whisky[, "Distillery"] %>%
    head(3)
> [1] "Aberfeldy" "Aberlour"  "AnCnoc"
# Or
whisky[, 2] %>%
    head(3)
> [1] "Aberfeldy" "Aberlour"  "AnCnoc"
Arranging and summarizing
In Python
# To termial pip install tabulate
import tabulate
tmphead= r.whisky.columns.astype('str')
print(tabulate.tabulate(r.whisky.sort_values("Nutty").head(10), headers= tmphead,tablefmt= "pretty") )
> +----+-------+--------------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> |    | RowID |     Distillery     | Body | Sweetness | Smoky | Medicinal | Tobacco | Honey | Spicy | Winey | Nutty | Malty | Fruity | Floral | Postcode | Latitude | Longitude |
> +----+-------+--------------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> | 13 |  14   |      Benriach      |  2   |     2     |   1   |     0     |    0    |   2   |   2   |   0   |   0   |   2   |   3    |   2    | IV30 8SJ |  323450  |  858380   |
> | 32 |  33   | GlenDeveronMacduff |  2   |     3     |   1   |     1     |    1    |   1   |   1   |   2   |   0   |   2   |   0    |   1    | AB4 3JT  |  372120  |  860400   |
> | 47 |  48   |    Glenkinchie     |  1   |     2     |   1   |     0     |    0    |   1   |   2   |   0   |   0   |   2   |   2    |   2    | EH34 5ET |  344380  |  666690   |
> | 5  |   6   |    ArranIsleOf     |  2   |     3     |   1   |     1     |    0    |   1   |   1   |   1   |   0   |   1   |   1    |   2    | KA27 8HJ |  194050  |  649950   |
> | 80 |  81   |     Teaninich      |  2   |     2     |   2   |     1     |    0    |   0   |   2   |   0   |   0   |   0   |   2    |   2    | IV17 0XB |  265360  |  869120   |
> | 45 |  46   |    Glenfiddich     |  1   |     3     |   1   |     0     |    0    |   0   |   0   |   0   |   0   |   2   |   2    |   2    | AB55 4DH |  332680  |  840840   |
> | 69 |  70   |    RoyalBrackla    |  2   |     3     |   2   |     1     |    1    |   1   |   2   |   1   |   0   |   2   |   3    |   2    | IV12 5QY |  286040  |  851320   |
> | 34 |  35   |    GlenGarioch     |  2   |     1     |   3   |     0     |    0    |   0   |   3   |   1   |   0   |   2   |   2    |   2    | AB51 0ES |  381020  |  827590   |
> | 11 |  12   |      Belvenie      |  3   |     2     |   1   |     0     |    0    |   3   |   2   |   1   |   0   |   2   |   2    |   2    | AB55 4DH |  332680  |  840840   |
> | 59 |  60   |      Linkwood      |  2   |     3     |   1   |     0     |    0    |   1   |   1   |   2   |   0   |   1   |   3    |   2    | IV30 3RD |  322640  |  861040   |
> +----+-------+--------------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
print(tabulate.tabulate(r.whisky.sort_values("Nutty", ascending= False).head(10), headers= tmphead, tablefmt= "pretty") )
> +----+-------+----------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> |    | RowID |   Distillery   | Body | Sweetness | Smoky | Medicinal | Tobacco | Honey | Spicy | Winey | Nutty | Malty | Fruity | Floral | Postcode | Latitude | Longitude |
> +----+-------+----------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> | 31 |  32   |    Edradour    |  2   |     3     |   1   |     0     |    0    |   2   |   1   |   1   |   4   |   2   |   2    |   2    | PH16 5JP |  295960  |  757940   |
> | 75 |  76   |   Strathisla   |  2   |     2     |   1   |     0     |    0    |   2   |   2   |   2   |   3   |   3   |   3    |   2    | AB55 3BS |  340754  |  848623   |
> | 61 |  62   |    Longmorn    |  3   |     2     |   1   |     0     |    0    |   1   |   1   |   1   |   3   |   3   |   2    |   3    | IV30 3SJ |  322640  |  861040   |
> | 10 |  11   |   Balmenach    |  4   |     3     |   2   |     0     |    0    |   2   |   1   |   3   |   3   |   0   |   1    |   2    | PH26 3PF |  307750  |  827170   |
> | 39 |  40   |   GlenScotia   |  2   |     2     |   2   |     2     |    0    |   1   |   0   |   1   |   2   |   2   |   1    |   1    | PA28 6DS |  172090  |  621010   |
> | 73 |  74   |    Speyside    |  2   |     2     |   1   |     0     |    0    |   1   |   0   |   1   |   2   |   2   |   2    |   2    | PH21 1NS |  278740  |  800600   |
> | 71 |  72   |     Scapa      |  2   |     2     |   1   |     1     |    0    |   2   |   1   |   1   |   2   |   2   |   2    |   2    | KW15 1SE |  342850  |  1008930  |
> | 70 |  71   | RoyalLochnagar |  3   |     2     |   2   |     0     |    0    |   2   |   2   |   2   |   2   |   2   |   3    |   1    | AB35 5TB |  326140  |  794370   |
> | 68 |  69   |  OldPulteney   |  2   |     1     |   2   |     2     |    1    |   0   |   1   |   1   |   2   |   2   |   2    |   2    | KW1 5BA  |  336730  |  950130   |
> | 67 |  68   | OldFettercairn |  1   |     2     |   2   |     0     |    1    |   2   |   2   |   1   |   2   |   3   |   1    |   1    | AB30 1YE |  370860  |  772900   |
> +----+-------+----------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
r.whisky["Nutty"].agg(['mean', 'min', 'count'])
> mean      1.465116
> min       0.000000
> count    86.000000
> Name: Nutty, dtype: float64
r.whisky.agg({ "Nutty": ['mean', 'min', 'count'] } )
>            Nutty
> mean    1.465116
> min     0.000000
> count  86.000000
r.whisky.agg({ "Nutty": ['mean', 'min', 'count'], "Body": ['mean', 'min', 'count'] } )
>            Nutty       Body
> mean    1.465116   2.069767
> min     0.000000   0.000000
> count  86.000000  86.000000
In R
colnames(whisky)
>  [1] "RowID"      "Distillery" "Body"       "Sweetness"  "Smoky"     
>  [6] "Medicinal"  "Tobacco"    "Honey"      "Spicy"      "Winey"     
> [11] "Nutty"      "Malty"      "Fruity"     "Floral"     "Postcode"  
> [16] "Latitude"   "Longitude"
whisky %>%
    arrange(Nutty) %>%
    head(10)
>    RowID         Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy
> 1      6        ArranIsleOf    2         3     1         1       0     1     1
> 2     12           Belvenie    3         2     1         0       0     3     2
> 3     14           Benriach    2         2     1         0       0     2     2
> 4     17           Bladnoch    1         2     1         0       0     0     1
> 5     33 GlenDeveronMacduff    2         3     1         1       1     1     1
> 6     35        GlenGarioch    2         1     3         0       0     0     3
> 7     46        Glenfiddich    1         3     1         0       0     0     0
> 8     48        Glenkinchie    1         2     1         0       0     1     2
> 9     60           Linkwood    2         3     1         0       0     1     1
> 10    70       RoyalBrackla    2         3     2         1       1     1     2
>    Winey Nutty Malty Fruity Floral  Postcode Latitude Longitude
> 1      1     0     1      1      2  KA27 8HJ   194050    649950
> 2      1     0     2      2      2 \tAB55 4DH   332680    840840
> 3      0     0     2      3      2 \tIV30 8SJ   323450    858380
> 4      1     0     2      2      3  \tDG8 9AB   242260    554260
> 5      2     0     2      0      1   AB4 3JT   372120    860400
> 6      1     0     2      2      2  AB51 0ES   381020    827590
> 7      0     0     2      2      2  AB55 4DH   332680    840840
> 8      0     0     2      2      2  EH34 5ET   344380    666690
> 9      2     0     1      3      2  IV30 3RD   322640    861040
> 10     1     0     2      3      2  IV12 5QY   286040    851320
whisky %>%
    arrange(Nutty, Body) %>%
    head(10)
>    RowID         Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy
> 1     17           Bladnoch    1         2     1         0       0     0     1
> 2     46        Glenfiddich    1         3     1         0       0     0     0
> 3     48        Glenkinchie    1         2     1         0       0     1     2
> 4      6        ArranIsleOf    2         3     1         1       0     1     1
> 5     14           Benriach    2         2     1         0       0     2     2
> 6     33 GlenDeveronMacduff    2         3     1         1       1     1     1
> 7     35        GlenGarioch    2         1     3         0       0     0     3
> 8     60           Linkwood    2         3     1         0       0     1     1
> 9     70       RoyalBrackla    2         3     2         1       1     1     2
> 10    73           Speyburn    2         4     1         0       0     2     1
>    Winey Nutty Malty Fruity Floral  Postcode Latitude Longitude
> 1      1     0     2      2      3  \tDG8 9AB   242260    554260
> 2      0     0     2      2      2  AB55 4DH   332680    840840
> 3      0     0     2      2      2  EH34 5ET   344380    666690
> 4      1     0     1      1      2  KA27 8HJ   194050    649950
> 5      0     0     2      3      2 \tIV30 8SJ   323450    858380
> 6      2     0     2      0      1   AB4 3JT   372120    860400
> 7      1     0     2      2      2  AB51 0ES   381020    827590
> 8      2     0     1      3      2  IV30 3RD   322640    861040
> 9      1     0     2      3      2  IV12 5QY   286040    851320
> 10     0     0     2      1      2  AB38 7AG   326930    851430
whisky %>%
    summarise(minnutty = min(Nutty))
>   minnutty
> 1        0
whisky %>%
    summarise(mnutty = mean(Nutty), mbody = mean(Body))
>   mnutty mbody
> 1   1.47  2.07

7 Summary

  • We have shown how to use Rython

  • Practical differences between R and Python are mainly indenting and indexing

  • We went over some basic data-wrangling functions:

    • skimming through data
    • correlation
    • filtering, selecting, summarizing etc.
  • We have some new whisky ideas to try out (ok you got me, that was the real purpose of the analysis)