All functions

add_chunk()

Add a chunk to the disk.frame

as.data.frame(<disk.frame>)

Convert disk.frame to data.frame by collecting all chunks

as.data.table(<disk.frame>)

Convert disk.frame to data.table by collecting all chunks

as.disk.frame()

Make a data.frame into a disk.frame

chunk_summarize() chunk_summarise() chunk_group_by() chunk_ungroup()

Group by within each disk.frame

cmap() cmap_dfr() cimap() cimap_dfr() lazy() delayed() chunk_lapply() map() imap_dfr() imap() map_dfr(<disk.frame>) map_dfr(<default>)

Apply the same function to all chunks

cmap2() map2() map_by_chunk_id()

`cmap2` a function to two disk.frames

collect(<disk.frame>) collect_list() collect(<summarized_disk.frame>)

Bring the disk.frame into R

colnames() names(<disk.frame>)

Return the column names of the disk.frame

compute(<disk.frame>)

Compute without writing

create_chunk_mapper()

Create function that applies to each chunk if disk.frame

create_dplyr_mapper()

Kept for backwards-compatibility to be removed in 0.3

csv_to_disk.frame()

Convert CSV file(s) to disk.frame format

delete()

Delete a disk.frame

dfglm()

Fit generalized linear models (glm) with disk.frame

df_ram_size()

Get the size of RAM in gigabytes

disk.frame()

Create a disk.frame from a folder

select(<disk.frame>) rename(<disk.frame>) filter(<disk.frame>) filter_all.disk.frame() filter_if.disk.frame() filter_at.disk.frame() mutate(<disk.frame>) transmute(<disk.frame>) arrange(<disk.frame>) chunk_arrange() tally.disk.frame() count.disk.frame() add_count.disk.frame() add_tally.disk.frame() do(<disk.frame>) chunk_group_by_all.disk.frame() chunk_group_by_at.disk.frame() chunk_group_by_if.disk.frame() mutate_all.disk.frame() mutate_at.disk.frame() mutate_if.disk.frame() rename_all.disk.frame() rename_at.disk.frame() rename_if.disk.frame() select_all.disk.frame() select_at.disk.frame() select_if.disk.frame() chunk_summarise_all() chunk_summarise_at() chunk_summarize_all() chunk_summarize_at() chunk_summarize_if() distinct(<disk.frame>) chunk_distinct() glimpse(<disk.frame>)

The dplyr verbs implemented for disk.frame

evalparseglue()

Helper function to evalparse some `glue::glue` string

foverlaps.disk.frame()

Apply data.table's foverlaps to the disk.frame

gen_datatable_synthetic()

Generate synthetic dataset for testing

get_chunk()

Obtain one chunk by chunk id

get_chunk_ids()

Get the chunk IDs and files names

groups(<disk.frame>)

The shard keys of the disk.frame

summarise(<grouped_disk.frame>) summarize(<grouped_disk.frame>) group_by(<disk.frame>) summarize(<disk.frame>) summarise(<disk.frame>)

A function to parse the summarize function

hard_arrange()

Perform a hard arrange

hard_group_by()

Perform a hard group

head(<disk.frame>) tail(<disk.frame>)

Head and tail of the disk.frame

is_disk.frame()

Checks if a folder is a disk.frame

anti_join(<disk.frame>) full_join(<disk.frame>) inner_join(<disk.frame>) left_join(<disk.frame>) semi_join(<disk.frame>)

Performs join/merge for disk.frames

make_glm_streaming_fn()

A streaming function for speedglm

merge(<disk.frame>)

Merge function for disk.frames

move_to() copy_df_to()

Move or copy a disk.frame to another location

nchunks() nchunk()

Returns the number of chunks in a disk.frame

nrow() ncol()

Number of rows or columns

var_df.chunk_agg.disk.frame() var_df.collected_agg.disk.frame() sd_df.chunk_agg.disk.frame() sd_df.collected_agg.disk.frame() mean_df.chunk_agg.disk.frame() mean_df.collected_agg.disk.frame() sum_df.chunk_agg.disk.frame() sum_df.collected_agg.disk.frame() min_df.chunk_agg.disk.frame() min_df.collected_agg.disk.frame() max_df.chunk_agg.disk.frame() max_df.collected_agg.disk.frame() median_df.chunk_agg.disk.frame() median_df.collected_agg.disk.frame() n_df.chunk_agg.disk.frame() n_df.collected_agg.disk.frame() length_df.chunk_agg.disk.frame() length_df.collected_agg.disk.frame() any_df.chunk_agg.disk.frame() any_df.collected_agg.disk.frame() all_df.chunk_agg.disk.frame() all_df.collected_agg.disk.frame() n_distinct_df.chunk_agg.disk.frame() n_distinct_df.collected_agg.disk.frame() quantile_df.chunk_agg.disk.frame() quantile_df.collected_agg.disk.frame() IQR_df.chunk_agg.disk.frame() IQR_df.collected_agg.disk.frame()

One Stage function

overwrite_check()

Check if the outdir exists or not

print(<disk.frame>)

Print disk.frame

rbindlist.disk.frame()

rbindlist disk.frames together

rechunk()

Increase or decrease the number of chunks in the disk.frame

recommend_nchunks()

Recommend number of chunks based on input size

remove_chunk()

Removes a chunk from the disk.frame

sample_frac(<disk.frame>)

Sample n rows from a disk.frame

setup_disk.frame()

Set up disk.frame environment

shard() distribute()

Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame

shardkey()

Returns the shardkey (not implemented yet)

shardkey_equal()

Compare two disk.frame shardkeys

show_ceremony() ceremony_text() show_boilerplate() insert_ceremony()

Show the code to setup disk.frame

srckeep() srckeepchunks()

Keep only the variables from the input listed in selections

`[`(<disk.frame>)

[ interface for disk.frame using fst backend

tbl_vars(<disk.frame>)

Column names for RStudio auto-complete

write_disk.frame() output_disk.frame()

Write disk.frame to disk

zip_to_disk.frame()

`zip_to_disk.frame` is used to read and convert every CSV file within the zip file to disk.frame format