Increase or decrease the number of chunks in the disk.frame
rechunk( df, nchunks, outdir = attr(df, "path", exact = TRUE), shardby = NULL, overwrite = TRUE, shardby_function = "hash", sort_splits = NULL, desc_vars = NULL )
the disk.frame to rechunk
number of chunks
the output directory
overwrite the output directory
splitting of chunks: "hash" for hash function or "sort" for semi-sorted chunks
for the "sort" shardby function, a dataframe with the split values.
for the "sort" shardby function, the variables to sort descending.
# create a disk.frame with 2 chunks in tempdir() cars.df = as.disk.frame(cars, nchunks = 2) # re-chunking cars.df to 3 chunks, done "in-place" to the same folder as cars.df rechunk(cars.df, 3)#>#>#> #> #> #> #>new_path = tempfile(fileext = ".df") # re-chunking cars.df to 4 chunks, shard by speed, and done "out-of-place" to a new directory cars2.df = rechunk(cars.df, 4, outdir=new_path, shardby = "speed")#>#>#>#>