benchmarks

Distributed DataFusion Benchmarks

Generating Benchmarking data

Generate datasets into benchmarks/data/.

# TPC-H (default: SCALE_FACTOR=1, PARTITIONS=16 - override by setting these environment variables)
./gen-tpch.sh

# TPC-DS (only SCALE_FACTOR=1 is supported)
./gen-tpcds.sh

Running Benchmarks in single-node mode

After generating the data with the command above, the benchmarks can be run with:

WORKERS=0 ./benchmarks/run.sh --threads 2 --dataset tpch_sf1

--threads: This is the physical threads that the Tokio runtime will use for executing the binary. It's recommended to set --threads to something small, like 2, for throttling each individual process running queries, and simulate how adding throttled workers can speed up the queries.
--dataset: Dataset directory name under benchmarks/data/ (e.g. tpch_sf1, tpcds_sf1).

Running Benchmarks benchmarks in distributed mode

The same script is used for running distributed benchmarks:

WORKERS=8 ./benchmarks/run.sh --threads 2 --dataset tpch_sf1 --files-per-task 2

WORKERS: Env variable that sets the amount of localhost workers used in the query.
--threads: Sets the Tokio runtime threads for each individual worker and for the benchmarking binary.
--dataset: Dataset directory name under benchmarks/data/.
--files-per-task: How many files each distributed task will handle.

Name		Name	Last commit message	Last commit date
parent directory ..
benches		benches
cdk		cdk
src		src
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
gen-clickbench.sh		gen-clickbench.sh
gen-tpcds.sh		gen-tpcds.sh
gen-tpch.sh		gen-tpch.sh
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Distributed DataFusion Benchmarks

Generating Benchmarking data

Running Benchmarks in single-node mode

Running Benchmarks benchmarks in distributed mode

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

Distributed DataFusion Benchmarks

Generating Benchmarking data

Running Benchmarks in single-node mode

Running Benchmarks benchmarks in distributed mode