7 Use R efficiently with data.table!

## TL;DR

The data.table package developed by Matt Dowle is a game changer for many data scientists

Learn about it… and use it always, by default:

library(data.table)
dtIris <- data.table(iris) # or
df <- iris; dtIris <- setDT(df)

Where to learn:

7.1 data.table vs. dplyr

data.table (Computer language) way vs. dplyr (“English language”) way

The best: No wasted computations .No new memory allocations. dtLocations %>% .[WLOC == 4313, WLOC:=4312]
No new memory allocations, but computations are done with ALL rows. dtLocations %>% .[, WLOC:=ifelse(WLOC==4313, 4312, WLOC)]
The worst: Computations are done with ALL rows. Furthermore, the entire data is copied from one memory location to another. (Imagine if your data as in 1 million of cells, of which only 10 needs to be changed !) dtLocations <- dtLocations %>% mutate(WLOC=ifelse(WLOC==4313, 4312, WLOC)) NB: dtLocations %>% . [] is the same as dtLocations[]. so you can use it in pipes.

There’s considerable effort to marry data.table package with dplyr package. Here are notable ones:

https://github.com/tidyverse/dtplyr (Version: 1.1.0, Published: 2021-02-20, From Hadley himself - I found it quite cumbersome still though…)
https://github.com/asardaes/table.express (Version: 0.3.1 Published: 2019-09-07 - somewhat easier?)
https://github.com/markfairbanks/tidytable (Version: 0.6.2 Published: 2021-05-18 - seems to be the best supported of the three?)