R/shard.r
shard.Rd
Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame
`distribute` is an alias for `shard`
shard(
df,
shardby,
outdir = tempfile(fileext = ".df"),
...,
nchunks = recommend_nchunks(df),
overwrite = FALSE
)
distribute(...)
A data.frame/data.table or disk.frame. If disk.frame, then rechunk(df, ...) is run
The column(s) to shard the data by.
The output directory of the disk.frame
not used
The number of chunks
If TRUE then the chunks are overwritten
# shard the cars data.frame by speed so that rows with the same speed are in the same chunk
iris.df = shard(iris, "Species")
# clean up cars.df
delete(iris.df)