All functions

add_chunk()

Add a chunk to the disk.frame

as.data.frame(<disk.frame>)

Convert disk.frame to data.frame by collecting all chunks

as.data.table(<disk.frame>)

Convert disk.frame to data.table by collecting all chunks

as.disk.frame()

Make a data.frame into a disk.frame

bind_rows.disk.frame()

Bind rows

chunk_summarize() chunk_summarise() chunk_group_by() chunk_ungroup()

#' @export #' @importFrom dplyr add_count #' @rdname dplyr_verbs add_count.disk.frame <- create_chunk_mapper(dplyr::add_count) #' @export #' @importFrom dplyr add_tally #' @rdname dplyr_verbs add_tally.disk.frame <- create_chunk_mapper(dplyr::add_tally)

cmap() cmap_dfr() cimap() cimap_dfr() lazy() delayed() clapply()

Apply the same function to all chunks

cmap2() map_by_chunk_id()

`cmap2` a function to two disk.frames

collect(<disk.frame>) collect_list() collect(<summarized_disk.frame>)

Bring the disk.frame into R

colnames() names(<disk.frame>)

Return the column names of the disk.frame

compute(<disk.frame>)

Force computations. The results are stored in a folder.

create_chunk_mapper()

Create function that applies to each chunk if disk.frame

csv_to_disk.frame()

Convert CSV file(s) to disk.frame format

delete()

Delete a disk.frame

dfglm()

Fit generalized linear models (glm) with disk.frame

df_ram_size()

Get the size of RAM in gigabytes

disk.frame()

Create a disk.frame from a folder

disk.frame_to_parquet()

A function to convert a disk.frame to parquet format

select(<disk.frame>) rename(<disk.frame>) filter(<disk.frame>) mutate(<disk.frame>) transmute(<disk.frame>) arrange(<disk.frame>) chunk_arrange() distinct(<disk.frame>) chunk_distinct() glimpse(<disk.frame>)

The dplyr verbs implemented for disk.frame

evalparseglue()

Helper function to evalparse some `glue::glue` string

find_globals_recursively()

Find globals in an expression by searching through the chain

foverlaps.disk.frame()

Apply data.table's foverlaps to the disk.frame

gen_datatable_synthetic()

Generate synthetic dataset for testing

get_chunk()

Obtain one chunk by chunk id

get_chunk_ids()

Get the chunk IDs and files names

get_partition_paths()

Get the partitioning structure of a folder

groups(<disk.frame>)

The shard keys of the disk.frame

summarise(<grouped_disk.frame>) summarize(<grouped_disk.frame>) group_by(<disk.frame>) summarize(<disk.frame>) summarise(<disk.frame>)

A function to parse the summarize function

head(<disk.frame>) tail(<disk.frame>)

Head and tail of the disk.frame

is_disk.frame()

Checks if a folder is a disk.frame

anti_join(<disk.frame>) full_join(<disk.frame>) inner_join(<disk.frame>) left_join(<disk.frame>) semi_join(<disk.frame>)

Performs join/merge for disk.frames

make_glm_streaming_fn()

A streaming function for speedglm

merge(<disk.frame>)

Merge function for disk.frames

move_to() copy_df_to()

Move or copy a disk.frame to another location

nchunks() nchunk()

Returns the number of chunks in a disk.frame

nrow() ncol()

Number of rows or columns

var_df.chunk_agg.disk.frame() var_df.collected_agg.disk.frame() sd_df.chunk_agg.disk.frame() sd_df.collected_agg.disk.frame() mean_df.chunk_agg.disk.frame() mean_df.collected_agg.disk.frame() sum_df.chunk_agg.disk.frame() sum_df.collected_agg.disk.frame() min_df.chunk_agg.disk.frame() min_df.collected_agg.disk.frame() max_df.chunk_agg.disk.frame() max_df.collected_agg.disk.frame() median_df.chunk_agg.disk.frame() median_df.collected_agg.disk.frame() n_df.chunk_agg.disk.frame() n_df.collected_agg.disk.frame() length_df.chunk_agg.disk.frame() length_df.collected_agg.disk.frame() any_df.chunk_agg.disk.frame() any_df.collected_agg.disk.frame() all_df.chunk_agg.disk.frame() all_df.collected_agg.disk.frame() n_distinct_df.chunk_agg.disk.frame() n_distinct_df.collected_agg.disk.frame() quantile_df.chunk_agg.disk.frame() quantile_df.collected_agg.disk.frame() IQR_df.chunk_agg.disk.frame() IQR_df.collected_agg.disk.frame()

One Stage function

overwrite_check()

Check if the outdir exists or not

partition_filter()

Filter the dataset based on folder partitions

play()

Play the recorded lazy operations

print(<disk.frame>)

Print disk.frame

pull(<disk.frame>)

Pull a column from table similar to `dplyr::pull`.

purrr_as_mapper()

Used to convert a function to purrr syntax if needed

rbindlist.disk.frame()

rbindlist disk.frames together

rechunk()

Increase or decrease the number of chunks in the disk.frame

recommend_nchunks()

Recommend number of chunks based on input size

remove_chunk()

Removes a chunk from the disk.frame

sample_frac(<disk.frame>)

Sample n rows from a disk.frame

setup_disk.frame()

Set up disk.frame environment

shard() distribute()

Shard a data.frame/data.table or disk.frame into chunk and saves it into a disk.frame

shardkey()

Returns the shardkey (not implemented yet)

shardkey_equal()

Compare two disk.frame shardkeys

show_ceremony() ceremony_text() show_boilerplate() insert_ceremony()

Show the code to setup disk.frame

split_string_into_df()

Turn a string of the form /partion1=val/partion2=val2 into data.frame

srckeep()

Keep only the variables from the input listed in selections

`[`(<disk.frame>)

[ interface for disk.frame using fst backend

tbl_vars(<disk.frame>) group_vars(<disk.frame>)

Column names for RStudio auto-complete

write_disk.frame() output_disk.frame()

Write disk.frame to disk

zip_to_disk.frame()

`zip_to_disk.frame` is used to read and convert every CSV file within the zip file to disk.frame format