Function reference • disk.frame

All functions
`add_chunk()`	Add a chunk to the disk.frame
`as.data.frame(<disk.frame>)`	Convert disk.frame to data.frame by collecting all chunks
`as.data.table(<disk.frame>)`	Convert disk.frame to data.table by collecting all chunks
`as.disk.frame()`	Make a data.frame into a disk.frame
`bind_rows.disk.frame()`	Bind rows
`chunk_summarize()` `chunk_summarise()` `chunk_group_by()` `chunk_ungroup()`	#' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_count.disk.frame <- create_chunk_mapper(dplyr::add_count) #' @export #' @importFrom dplyr add_tally #' @rdname dplyr_verbs add_tally.disk.frame <- create_chunk_mapper(dplyr::add_tally)
`cmap()` `cmap_dfr()` `cimap()` `cimap_dfr()` `lazy()` `delayed()` `clapply()`	Apply the same function to all chunks
`cmap2()` `map_by_chunk_id()`	`cmap2` a function to two disk.frames
`collect(<disk.frame>)` `collect_list()` `collect(<summarized_disk.frame>)`	Bring the disk.frame into R
`colnames()` `names(<disk.frame>)`	Return the column names of the disk.frame
`compute(<disk.frame>)`	Force computations. The results are stored in a folder.
`create_chunk_mapper()`	Create function that applies to each chunk if disk.frame
`csv_to_disk.frame()`	Convert CSV file(s) to disk.frame format
`delete()`	Delete a disk.frame
`dfglm()`	Fit generalized linear models (glm) with disk.frame
`df_ram_size()`	Get the size of RAM in gigabytes
`disk.frame()`	Create a disk.frame from a folder
`disk.frame_to_parquet()`	A function to convert a disk.frame to parquet format
`select(<disk.frame>)` `rename(<disk.frame>)` `filter(<disk.frame>)` `mutate(<disk.frame>)` `transmute(<disk.frame>)` `arrange(<disk.frame>)` `chunk_arrange()` `distinct(<disk.frame>)` `chunk_distinct()` `glimpse(<disk.frame>)`	The dplyr verbs implemented for disk.frame
`evalparseglue()`	Helper function to evalparse some `glue::glue` string
`find_globals_recursively()`	Find globals in an expression by searching through the chain
`foverlaps.disk.frame()`	Apply data.table's foverlaps to the disk.frame
`gen_datatable_synthetic()`	Generate synthetic dataset for testing
`get_chunk()`	Obtain one chunk by chunk id
`get_chunk_ids()`	Get the chunk IDs and files names
`get_partition_paths()`	Get the partitioning structure of a folder
`groups(<disk.frame>)`	The shard keys of the disk.frame
`summarise(<grouped_disk.frame>)` `summarize(<grouped_disk.frame>)` `group_by(<disk.frame>)` `summarize(<disk.frame>)` `summarise(<disk.frame>)`	A function to parse the summarize function
`head(<disk.frame>)` `tail(<disk.frame>)`	Head and tail of the disk.frame
`is_disk.frame()`	Checks if a folder is a disk.frame
`anti_join(<disk.frame>)` `full_join(<disk.frame>)` `inner_join(<disk.frame>)` `left_join(<disk.frame>)` `semi_join(<disk.frame>)`	Performs join/merge for disk.frames
`make_glm_streaming_fn()`	A streaming function for speedglm
`merge(<disk.frame>)`	Merge function for disk.frames
`move_to()` `copy_df_to()`	Move or copy a disk.frame to another location
`nchunks()` `nchunk()`	Returns the number of chunks in a disk.frame
`nrow()` `ncol()`	Number of rows or columns
`var_df.chunk_agg.disk.frame()` `var_df.collected_agg.disk.frame()` `sd_df.chunk_agg.disk.frame()` `sd_df.collected_agg.disk.frame()` `mean_df.chunk_agg.disk.frame()` `mean_df.collected_agg.disk.frame()` `sum_df.chunk_agg.disk.frame()` `sum_df.collected_agg.disk.frame()` `min_df.chunk_agg.disk.frame()` `min_df.collected_agg.disk.frame()` `max_df.chunk_agg.disk.frame()` `max_df.collected_agg.disk.frame()` `median_df.chunk_agg.disk.frame()` `median_df.collected_agg.disk.frame()` `n_df.chunk_agg.disk.frame()` `n_df.collected_agg.disk.frame()` `length_df.chunk_agg.disk.frame()` `length_df.collected_agg.disk.frame()` `any_df.chunk_agg.disk.frame()` `any_df.collected_agg.disk.frame()` `all_df.chunk_agg.disk.frame()` `all_df.collected_agg.disk.frame()` `n_distinct_df.chunk_agg.disk.frame()` `n_distinct_df.collected_agg.disk.frame()` `quantile_df.chunk_agg.disk.frame()` `quantile_df.collected_agg.disk.frame()` `IQR_df.chunk_agg.disk.frame()` `IQR_df.collected_agg.disk.frame()`	One Stage function
`overwrite_check()`	Check if the outdir exists or not
`partition_filter()`	Filter the dataset based on folder partitions
`play()`	Play the recorded lazy operations
`print(<disk.frame>)`	Print disk.frame
`pull(<disk.frame>)`	Pull a column from table similar to `dplyr::pull`.
`purrr_as_mapper()`	Used to convert a function to purrr syntax if needed
`rbindlist.disk.frame()`	rbindlist disk.frames together
`rechunk()`	Increase or decrease the number of chunks in the disk.frame
`recommend_nchunks()`	Recommend number of chunks based on input size
`remove_chunk()`	Removes a chunk from the disk.frame
`sample_frac(<disk.frame>)`	Sample n rows from a disk.frame
`setup_disk.frame()`	Set up disk.frame environment
`shard()` `distribute()`	Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame
`shardkey()`	Returns the shardkey (not implemented yet)
`shardkey_equal()`	Compare two disk.frame shardkeys
`show_ceremony()` `ceremony_text()` `show_boilerplate()` `insert_ceremony()`	Show the code to setup disk.frame
`split_string_into_df()`	Turn a string of the form /partion1=val/partion2=val2 into data.frame
`srckeep()`	Keep only the variables from the input listed in selections
`[`(<disk.frame>)	[ interface for disk.frame using fst backend
`tbl_vars(<disk.frame>)` `group_vars(<disk.frame>)`	Column names for RStudio auto-complete
`write_disk.frame()` `output_disk.frame()`	Write disk.frame to disk
`zip_to_disk.frame()`	`zip_to_disk.frame` is used to read and convert every CSV file within the zip file to disk.frame format

Reference

All functions