Generate datasets into benchmarks/data/.
# TPC-H (default: SCALE_FACTOR=1, PARTITIONS=16 - override by setting these environment variables)
./gen-tpch.sh
# TPC-DS (only SCALE_FACTOR=1 is supported)
./gen-tpcds.shAfter generating the data with the command above, the benchmarks can be run with:
WORKERS=0 ./benchmarks/run.sh --threads 2 --dataset tpch_sf1--threads: This is the physical threads that the Tokio runtime will use for executing the binary. It's recommended to set--threadsto something small, like2, for throttling each individual process running queries, and simulate how adding throttled workers can speed up the queries.--dataset: Dataset directory name underbenchmarks/data/(e.g.tpch_sf1,tpcds_sf1).
The same script is used for running distributed benchmarks:
WORKERS=8 ./benchmarks/run.sh --threads 2 --dataset tpch_sf1 --files-per-task 2WORKERS: Env variable that sets the amount of localhost workers used in the query.--threads: Sets the Tokio runtime threads for each individual worker and for the benchmarking binary.--dataset: Dataset directory name underbenchmarks/data/.--files-per-task: How many files each distributed task will handle.