disk.frame supports data.table syntax

In the following example, I will use the .N from the data.table package to count the unique combinations year and month within each chunk.

All data.table syntax are supported. However, disk.frame adds the ability to load only those columns required for the analysis using the keep = option. In the above analysis, only the year and month variables are required and hence keep = c("year", "month") was used.

Alternatively, we can use the srckeep function to achieve the same, e.g.

srckeep(flights.df, c("year", "month"))[,.N, .(year, month)]

External variables are captured

disk.frame sends the computation to background workers which are essentially distinct and separate R sessions. Typically, the variables that you have available in your current R session aren’t visible in the other R sessions, but disk.frame uses the future package’s variable detection abilities to figure out which variables are in use and then send them to the background workers so they have access to the variables as well. E.g.

In the above example, neither some_fn nor y are defined in the background workers’ environments, but disk.frame still manages to evaluate this code flights.df[,some_fn(y)].