What is an Integrated Development Environment?
It is a software; tools which make it easy work with software applications (buttons, syntax highlighting)
What is a compiler?
A program that translates the source code written in high-level programming language into low-level machine code (binary –> 0,0,1,0,1,1,……)
What are Packages (R) and Modules (Python)?
Code libraries (think folders, simply)
What is an environment?
State of a computer, determined by a combination of software, basic hardware, and which programs are running
Environments’ misalignment is a generous source of difficult bugs!
“INVEST IN PEOPLE NOT BUSINESSES” WARREN BUFFETT
Meet J.J. Allaire. He is one of the most influential people you have never heard of:
J.J. is the founder of Rstudio and the creator of the reticulate
package which we rely on.
86 malt whiskies are scored between 0-4 for 12 different taste categories including sweetness, smoky, nutty etc. Additionally, coordinates of the distilleries.
library(data.table)
whisky <- fread("http://outreach.mathstat.strath.ac.uk/outreach/nessie/datasets/whiskies.txt",
data.table = F)
class(whisky)
> [1] "data.frame"
dim(whisky)
> [1] 86 17
head(whisky, 3)
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 1 1 Aberfeldy 2 2 2 0 0 2 1 2
> 2 2 Aberlour 3 3 1 0 0 4 3 2
> 3 3 AnCnoc 1 3 2 0 0 2 0 0
> Nutty Malty Fruity Floral Postcode Latitude Longitude
> 1 2 2 2 2 \tPH15 2EB 286580 749680
> 2 2 3 3 2 \tAB38 9PJ 326340 842570
> 3 2 2 3 2 \tAB5 5LI 352960 839320
tail(whisky, 3)
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 84 84 Tomintoul 0 3 1 0 0 2 2 1
> 85 85 Tormore 2 2 1 0 0 1 0 1
> 86 86 Tullibardine 2 3 0 0 1 0 2 1
> Nutty Malty Fruity Floral Postcode Latitude Longitude
> 84 1 2 1 2 AB37 9AQ 315100 825560
> 85 2 1 0 0 PH26 3LR 315180 834960
> 86 1 2 2 1 PH4 1QG 289690 708850
library(reticulate)
r_to_py(whisky, convert = F)
> RowID Distillery Body ... Postcode Latitude Longitude
> 0 1 Aberfeldy 2 ... \tPH15 2EB 286580 749680
> 1 2 Aberlour 3 ... \tAB38 9PJ 326340 842570
> 2 3 AnCnoc 1 ... \tAB5 5LI 352960 839320
> 3 4 Ardbeg 4 ... \tPA42 7EB 141560 646220
> 4 5 Ardmore 2 ... \tAB54 4NH 355350 829140
> .. ... ... ... ... ... ... ...
> 81 82 Tobermory 1 ... PA75 6NR 150450 755070
> 82 83 Tomatin 2 ... IV13 7YT 279120 829630
> 83 84 Tomintoul 0 ... AB37 9AQ 315100 825560
> 84 85 Tormore 2 ... PH26 3LR 315180 834960
> 85 86 Tullibardine 2 ... PH4 1QG 289690 708850
>
> [86 rows x 17 columns]
import pandas
# type(r.whisky) # only ok in interactive mode
print( type(r.whisky) )
> <class 'pandas.core.frame.DataFrame'>
r.whisky.shape
> (86, 17)
r.whisky.head(3)
> RowID Distillery Body Sweetness ... Floral Postcode Latitude Longitude
> 0 1 Aberfeldy 2 2 ... 2 \tPH15 2EB 286580 749680
> 1 2 Aberlour 3 3 ... 2 \tAB38 9PJ 326340 842570
> 2 3 AnCnoc 1 3 ... 2 \tAB5 5LI 352960 839320
>
> [3 rows x 17 columns]
r.whisky.tail(3)
# dir(r.whisky)
> RowID Distillery Body Sweetness ... Floral Postcode Latitude Longitude
> 83 84 Tomintoul 0 3 ... 2 AB37 9AQ 315100 825560
> 84 85 Tormore 2 2 ... 0 PH26 3LR 315180 834960
> 85 86 Tullibardine 2 3 ... 1 PH4 1QG 289690 708850
>
> [3 rows x 17 columns]
r.whisky.head()
> RowID Distillery Body Sweetness ... Floral Postcode Latitude Longitude
> 0 1 Aberfeldy 2 2 ... 2 \tPH15 2EB 286580 749680
> 1 2 Aberlour 3 3 ... 2 \tAB38 9PJ 326340 842570
> 2 3 AnCnoc 1 3 ... 2 \tAB5 5LI 352960 839320
> 3 4 Ardbeg 4 1 ... 0 \tPA42 7EB 141560 646220
> 4 5 Ardmore 2 2 ... 1 \tAB54 4NH 355350 829140
>
> [5 rows x 17 columns]
I am not sure what I want to look at. Let’s see which columns we have:
pandas.set_option('display.max_columns', None) # to show all columns
r.whisky.describe()
> RowID Body Sweetness Smoky Medicinal Tobacco \
> count 86.000000 86.000000 86.000000 86.000000 86.000000 86.000000
> mean 43.500000 2.069767 2.290698 1.534884 0.546512 0.116279
> std 24.969982 0.930410 0.717287 0.863613 0.990032 0.322439
> min 1.000000 0.000000 1.000000 0.000000 0.000000 0.000000
> 25% 22.250000 2.000000 2.000000 1.000000 0.000000 0.000000
> 50% 43.500000 2.000000 2.000000 1.000000 0.000000 0.000000
> 75% 64.750000 2.000000 3.000000 2.000000 1.000000 0.000000
> max 86.000000 4.000000 4.000000 4.000000 4.000000 1.000000
>
> Honey Spicy Winey Nutty Malty Fruity \
> count 86.000000 86.000000 86.000000 86.000000 86.000000 86.000000
> mean 1.244186 1.383721 0.976744 1.465116 1.802326 1.802326
> std 0.853175 0.784686 0.932760 0.821730 0.629094 0.779438
> min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
> 25% 1.000000 1.000000 0.000000 1.000000 1.000000 1.000000
> 50% 1.000000 1.000000 1.000000 2.000000 2.000000 2.000000
> 75% 2.000000 2.000000 1.000000 2.000000 2.000000 2.000000
> max 4.000000 3.000000 4.000000 4.000000 3.000000 3.000000
>
> Floral Latitude Longitude
> count 86.000000 86.000000 8.600000e+01
> mean 1.697674 287247.162791 8.026597e+05
> std 0.855017 67889.046814 8.802422e+04
> min 0.000000 126680.000000 5.542600e+05
> 25% 1.000000 265672.500000 7.556975e+05
> 50% 2.000000 319515.000000 8.398850e+05
> 75% 2.000000 328630.000000 8.507700e+05
> max 4.000000 381020.000000 1.009260e+06
colnames(whisky)
> [1] "RowID" "Distillery" "Body" "Sweetness" "Smoky"
> [6] "Medicinal" "Tobacco" "Honey" "Spicy" "Winey"
> [11] "Nutty" "Malty" "Fruity" "Floral" "Postcode"
> [16] "Latitude" "Longitude"
summary(whisky)
> RowID Distillery Body Sweetness Smoky
> Min. : 1.0 Length:86 Min. :0.00 Min. :1.00 Min. :0.00
> 1st Qu.:22.2 Class :character 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:1.00
> Median :43.5 Mode :character Median :2.00 Median :2.00 Median :1.00
> Mean :43.5 Mean :2.07 Mean :2.29 Mean :1.53
> 3rd Qu.:64.8 3rd Qu.:2.00 3rd Qu.:3.00 3rd Qu.:2.00
> Max. :86.0 Max. :4.00 Max. :4.00 Max. :4.00
> Medicinal Tobacco Honey Spicy Winey
> Min. :0.00 Min. :0.000 Min. :0.00 Min. :0.00 Min. :0.00
> 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:1.00 1st Qu.:1.00 1st Qu.:0.00
> Median :0.00 Median :0.000 Median :1.00 Median :1.00 Median :1.00
> Mean :0.55 Mean :0.116 Mean :1.24 Mean :1.38 Mean :0.98
> 3rd Qu.:1.00 3rd Qu.:0.000 3rd Qu.:2.00 3rd Qu.:2.00 3rd Qu.:1.00
> Max. :4.00 Max. :1.000 Max. :4.00 Max. :3.00 Max. :4.00
> Nutty Malty Fruity Floral Postcode
> Min. :0.00 Min. :0.0 Min. :0.0 Min. :0.0 Length:86
> 1st Qu.:1.00 1st Qu.:1.0 1st Qu.:1.0 1st Qu.:1.0 Class :character
> Median :2.00 Median :2.0 Median :2.0 Median :2.0 Mode :character
> Mean :1.47 Mean :1.8 Mean :1.8 Mean :1.7
> 3rd Qu.:2.00 3rd Qu.:2.0 3rd Qu.:2.0 3rd Qu.:2.0
> Max. :4.00 Max. :3.0 Max. :3.0 Max. :4.0
> Latitude Longitude
> Min. :126680 Min. : 554260
> 1st Qu.:265672 1st Qu.: 755698
> Median :319515 Median : 839885
> Mean :287247 Mean : 802660
> 3rd Qu.:328630 3rd Qu.: 850770
> Max. :381020 Max. :1009260
# Could also use skim or describe library(psych) library(skimr)
# describeData(whisky) describeFast(whisky) skim(whisky)
What do we see?
Not too many “Medicinal” whiskies.
Tobacco was an idiotic question
Collaboration
No need to code from scratch (R is strong amongst academics, Python has areas with better ecosystem)
Right tool for the right job (are you going to research? deploy? visualize? report?)
Abusing fundamental statistics, let’s check the linear correlation:
import matplotlib.pyplot as plt
plt.matshow(r.whisky.iloc[:,2:14].corr())
plt.xticks(fontsize=14, rotation=45)
plt.yticks(fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Simple correlation', fontsize=16);
plt.show()
Note python index is from zero.
library(corrplot)
whisky[, 3:14] %>%
cor %>%
corrplot(type = "upper", order = "hclust", tl.col = "black", tl.srt = 45)
Let’s find the most “nothing special” whisky:
# The most 'standard'
tmpdf <- whisky[, 3:14]
tmpvar <- (apply(tmpdf, 1, mean))/apply(tmpdf, 1, sd)
tmpvar %>%
min
> [1] 0.846
tmpmin <- which.min(tmpvar)
whisky[tmpmin, -c(1, 15:17)]
> Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey Nutty
> 46 Glenfiddich 1 3 1 0 0 0 0 0 0
> Malty Fruity Floral
> 46 2 2 2
import numpy as np
import pprint
tmpdf = r.whisky.iloc[:,2:14]
# axis= 1 means across rows
tmpvar = tmpdf.apply(np.mean, axis=1) / tmpdf.apply(np.std, axis=1)
tmpvar.min()
> 0.8835412617927486
tmpmin = tmpvar.idxmin()
tmpmin
> 45
tmpidx= [0, 14,15,16] # how to 14:16 ?
tmpvec = r.whisky.columns.isin(r.whisky.columns[tmpidx])
tmpdf = r.whisky.loc[:, ~tmpvec ]
pprint.pprint(tmpdf.iloc[tmpmin,:]) # how to print as row?
> Distillery Glenfiddich
> Body 1
> Sweetness 3
> Smoky 1
> Medicinal 0
> Tobacco 0
> Honey 0
> Spicy 0
> Winey 0
> Nutty 0
> Malty 2
> Fruity 2
> Floral 2
> Name: 45, dtype: object
Let’s find what we like:
r.whisky[ r.whisky['Smoky'] > 2 ]
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey \
> 3 4 Ardbeg 4 1 4 4 0 0
> 18 19 Bowmore 2 2 3 1 0 2
> 21 22 Caol Ila 3 1 4 2 1 0
> 23 24 Clynelish 3 2 3 3 1 0
> 34 35 GlenGarioch 2 1 3 0 0 0
> 53 54 Highland Park 2 2 3 1 0 2
> 57 58 Lagavulin 4 1 4 4 1 0
> 58 59 Laphroig 4 2 4 4 1 0
> 77 78 Talisker 4 2 3 3 0 1
>
> Spicy Winey Nutty Malty Fruity Floral Postcode Latitude \
> 3 2 0 1 2 1 0 \tPA42 7EB 141560
> 18 2 1 1 1 1 2 \tPA43 7GS 131330
> 21 2 0 2 1 1 1 \tPA46 7RL 142920
> 23 2 0 1 1 2 0 \tKW9 6LB 290250
> 34 3 1 0 2 2 2 AB51 0ES 381020
> 53 1 1 1 2 1 1 KW15 1SU 345340
> 57 1 2 1 1 1 0 PA42 7DZ 140430
> 58 0 1 1 1 0 0 PA42 7DU 138680
> 77 3 0 1 2 2 0 IV47 8SR 137950
>
> Longitude
> 3 646220
> 18 659720
> 21 670040
> 23 904230
> 34 827590
> 53 1009260
> 57 645730
> 58 645160
> 77 831770
r.whisky[ (r.whisky['Smoky'] > 2) & ( r.whisky['Spicy'] > 2 ) ]
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey \
> 34 35 GlenGarioch 2 1 3 0 0 0
> 77 78 Talisker 4 2 3 3 0 1
>
> Spicy Winey Nutty Malty Fruity Floral Postcode Latitude Longitude
> 34 3 1 0 2 2 2 AB51 0ES 381020 827590
> 77 3 0 1 2 2 0 IV47 8SR 137950 831770
library(dplyr)
filter(whisky, Smoky > 2)
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 1 4 Ardbeg 4 1 4 4 0 0 2 0
> 2 19 Bowmore 2 2 3 1 0 2 2 1
> 3 22 Caol Ila 3 1 4 2 1 0 2 0
> 4 24 Clynelish 3 2 3 3 1 0 2 0
> 5 35 GlenGarioch 2 1 3 0 0 0 3 1
> 6 54 Highland Park 2 2 3 1 0 2 1 1
> 7 58 Lagavulin 4 1 4 4 1 0 1 2
> 8 59 Laphroig 4 2 4 4 1 0 0 1
> 9 78 Talisker 4 2 3 3 0 1 3 0
> Nutty Malty Fruity Floral Postcode Latitude Longitude
> 1 1 2 1 0 \tPA42 7EB 141560 646220
> 2 1 1 1 2 \tPA43 7GS 131330 659720
> 3 2 1 1 1 \tPA46 7RL 142920 670040
> 4 1 1 2 0 \tKW9 6LB 290250 904230
> 5 0 2 2 2 AB51 0ES 381020 827590
> 6 1 2 1 1 KW15 1SU 345340 1009260
> 7 1 1 1 0 PA42 7DZ 140430 645730
> 8 1 1 0 0 PA42 7DU 138680 645160
> 9 1 2 2 0 IV47 8SR 137950 831770
filter(whisky, Smoky > 2 & Spicy > 2)
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
> 1 35 GlenGarioch 2 1 3 0 0 0 3 1
> 2 78 Talisker 4 2 3 3 0 1 3 0
> Nutty Malty Fruity Floral Postcode Latitude Longitude
> 1 0 2 2 2 AB51 0ES 381020 827590
> 2 1 2 2 0 IV47 8SR 137950 831770
Talisker is a long-time favorite, but will definitely check out Glen Garioch.
Select particular variablesr.whisky[['Distillery']].head(3)
> Distillery
> 0 Aberfeldy
> 1 Aberlour
> 2 AnCnoc
select(whisky, "Distillery") %>%
head(3)
> Distillery
> 1 Aberfeldy
> 2 Aberlour
> 3 AnCnoc
# Or
whisky[, "Distillery"] %>%
head(3)
> [1] "Aberfeldy" "Aberlour" "AnCnoc"
# Or
whisky[, 2] %>%
head(3)
> [1] "Aberfeldy" "Aberlour" "AnCnoc"
Arranging and summarizing
# To termial pip install tabulate
import tabulate
tmphead= r.whisky.columns.astype('str')
print(tabulate.tabulate(r.whisky.sort_values("Nutty").head(10), headers= tmphead,tablefmt= "pretty") )
> +----+-------+--------------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> | | RowID | Distillery | Body | Sweetness | Smoky | Medicinal | Tobacco | Honey | Spicy | Winey | Nutty | Malty | Fruity | Floral | Postcode | Latitude | Longitude |
> +----+-------+--------------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> | 13 | 14 | Benriach | 2 | 2 | 1 | 0 | 0 | 2 | 2 | 0 | 0 | 2 | 3 | 2 | IV30 8SJ | 323450 | 858380 |
> | 32 | 33 | GlenDeveronMacduff | 2 | 3 | 1 | 1 | 1 | 1 | 1 | 2 | 0 | 2 | 0 | 1 | AB4 3JT | 372120 | 860400 |
> | 47 | 48 | Glenkinchie | 1 | 2 | 1 | 0 | 0 | 1 | 2 | 0 | 0 | 2 | 2 | 2 | EH34 5ET | 344380 | 666690 |
> | 5 | 6 | ArranIsleOf | 2 | 3 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 2 | KA27 8HJ | 194050 | 649950 |
> | 80 | 81 | Teaninich | 2 | 2 | 2 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 2 | 2 | IV17 0XB | 265360 | 869120 |
> | 45 | 46 | Glenfiddich | 1 | 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 2 | AB55 4DH | 332680 | 840840 |
> | 69 | 70 | RoyalBrackla | 2 | 3 | 2 | 1 | 1 | 1 | 2 | 1 | 0 | 2 | 3 | 2 | IV12 5QY | 286040 | 851320 |
> | 34 | 35 | GlenGarioch | 2 | 1 | 3 | 0 | 0 | 0 | 3 | 1 | 0 | 2 | 2 | 2 | AB51 0ES | 381020 | 827590 |
> | 11 | 12 | Belvenie | 3 | 2 | 1 | 0 | 0 | 3 | 2 | 1 | 0 | 2 | 2 | 2 | AB55 4DH | 332680 | 840840 |
> | 59 | 60 | Linkwood | 2 | 3 | 1 | 0 | 0 | 1 | 1 | 2 | 0 | 1 | 3 | 2 | IV30 3RD | 322640 | 861040 |
> +----+-------+--------------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
print(tabulate.tabulate(r.whisky.sort_values("Nutty", ascending= False).head(10), headers= tmphead, tablefmt= "pretty") )
> +----+-------+----------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> | | RowID | Distillery | Body | Sweetness | Smoky | Medicinal | Tobacco | Honey | Spicy | Winey | Nutty | Malty | Fruity | Floral | Postcode | Latitude | Longitude |
> +----+-------+----------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
> | 31 | 32 | Edradour | 2 | 3 | 1 | 0 | 0 | 2 | 1 | 1 | 4 | 2 | 2 | 2 | PH16 5JP | 295960 | 757940 |
> | 75 | 76 | Strathisla | 2 | 2 | 1 | 0 | 0 | 2 | 2 | 2 | 3 | 3 | 3 | 2 | AB55 3BS | 340754 | 848623 |
> | 61 | 62 | Longmorn | 3 | 2 | 1 | 0 | 0 | 1 | 1 | 1 | 3 | 3 | 2 | 3 | IV30 3SJ | 322640 | 861040 |
> | 10 | 11 | Balmenach | 4 | 3 | 2 | 0 | 0 | 2 | 1 | 3 | 3 | 0 | 1 | 2 | PH26 3PF | 307750 | 827170 |
> | 39 | 40 | GlenScotia | 2 | 2 | 2 | 2 | 0 | 1 | 0 | 1 | 2 | 2 | 1 | 1 | PA28 6DS | 172090 | 621010 |
> | 73 | 74 | Speyside | 2 | 2 | 1 | 0 | 0 | 1 | 0 | 1 | 2 | 2 | 2 | 2 | PH21 1NS | 278740 | 800600 |
> | 71 | 72 | Scapa | 2 | 2 | 1 | 1 | 0 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | KW15 1SE | 342850 | 1008930 |
> | 70 | 71 | RoyalLochnagar | 3 | 2 | 2 | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 3 | 1 | AB35 5TB | 326140 | 794370 |
> | 68 | 69 | OldPulteney | 2 | 1 | 2 | 2 | 1 | 0 | 1 | 1 | 2 | 2 | 2 | 2 | KW1 5BA | 336730 | 950130 |
> | 67 | 68 | OldFettercairn | 1 | 2 | 2 | 0 | 1 | 2 | 2 | 1 | 2 | 3 | 1 | 1 | AB30 1YE | 370860 | 772900 |
> +----+-------+----------------+------+-----------+-------+-----------+---------+-------+-------+-------+-------+-------+--------+--------+----------+----------+-----------+
r.whisky["Nutty"].agg(['mean', 'min', 'count'])
> mean 1.465116
> min 0.000000
> count 86.000000
> Name: Nutty, dtype: float64
r.whisky.agg({ "Nutty": ['mean', 'min', 'count'] } )
> Nutty
> mean 1.465116
> min 0.000000
> count 86.000000
r.whisky.agg({ "Nutty": ['mean', 'min', 'count'], "Body": ['mean', 'min', 'count'] } )
> Nutty Body
> mean 1.465116 2.069767
> min 0.000000 0.000000
> count 86.000000 86.000000
colnames(whisky)
> [1] "RowID" "Distillery" "Body" "Sweetness" "Smoky"
> [6] "Medicinal" "Tobacco" "Honey" "Spicy" "Winey"
> [11] "Nutty" "Malty" "Fruity" "Floral" "Postcode"
> [16] "Latitude" "Longitude"
whisky %>%
arrange(Nutty) %>%
head(10)
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy
> 1 6 ArranIsleOf 2 3 1 1 0 1 1
> 2 12 Belvenie 3 2 1 0 0 3 2
> 3 14 Benriach 2 2 1 0 0 2 2
> 4 17 Bladnoch 1 2 1 0 0 0 1
> 5 33 GlenDeveronMacduff 2 3 1 1 1 1 1
> 6 35 GlenGarioch 2 1 3 0 0 0 3
> 7 46 Glenfiddich 1 3 1 0 0 0 0
> 8 48 Glenkinchie 1 2 1 0 0 1 2
> 9 60 Linkwood 2 3 1 0 0 1 1
> 10 70 RoyalBrackla 2 3 2 1 1 1 2
> Winey Nutty Malty Fruity Floral Postcode Latitude Longitude
> 1 1 0 1 1 2 KA27 8HJ 194050 649950
> 2 1 0 2 2 2 \tAB55 4DH 332680 840840
> 3 0 0 2 3 2 \tIV30 8SJ 323450 858380
> 4 1 0 2 2 3 \tDG8 9AB 242260 554260
> 5 2 0 2 0 1 AB4 3JT 372120 860400
> 6 1 0 2 2 2 AB51 0ES 381020 827590
> 7 0 0 2 2 2 AB55 4DH 332680 840840
> 8 0 0 2 2 2 EH34 5ET 344380 666690
> 9 2 0 1 3 2 IV30 3RD 322640 861040
> 10 1 0 2 3 2 IV12 5QY 286040 851320
whisky %>%
arrange(Nutty, Body) %>%
head(10)
> RowID Distillery Body Sweetness Smoky Medicinal Tobacco Honey Spicy
> 1 17 Bladnoch 1 2 1 0 0 0 1
> 2 46 Glenfiddich 1 3 1 0 0 0 0
> 3 48 Glenkinchie 1 2 1 0 0 1 2
> 4 6 ArranIsleOf 2 3 1 1 0 1 1
> 5 14 Benriach 2 2 1 0 0 2 2
> 6 33 GlenDeveronMacduff 2 3 1 1 1 1 1
> 7 35 GlenGarioch 2 1 3 0 0 0 3
> 8 60 Linkwood 2 3 1 0 0 1 1
> 9 70 RoyalBrackla 2 3 2 1 1 1 2
> 10 73 Speyburn 2 4 1 0 0 2 1
> Winey Nutty Malty Fruity Floral Postcode Latitude Longitude
> 1 1 0 2 2 3 \tDG8 9AB 242260 554260
> 2 0 0 2 2 2 AB55 4DH 332680 840840
> 3 0 0 2 2 2 EH34 5ET 344380 666690
> 4 1 0 1 1 2 KA27 8HJ 194050 649950
> 5 0 0 2 3 2 \tIV30 8SJ 323450 858380
> 6 2 0 2 0 1 AB4 3JT 372120 860400
> 7 1 0 2 2 2 AB51 0ES 381020 827590
> 8 2 0 1 3 2 IV30 3RD 322640 861040
> 9 1 0 2 3 2 IV12 5QY 286040 851320
> 10 0 0 2 1 2 AB38 7AG 326930 851430
whisky %>%
summarise(minnutty = min(Nutty))
> minnutty
> 1 0
whisky %>%
summarise(mnutty = mean(Nutty), mbody = mean(Body))
> mnutty mbody
> 1 1.47 2.07
We have shown how to use Rython
Practical differences between R and Python are mainly indenting and indexing
We went over some basic data-wrangling functions:
We have some new whisky ideas to try out (ok you got me, that was the real purpose of the analysis)