This is the same CLI as https://github.com/apache/datafusion/blob/main/datafusion-cli/, but enriched with distributed execution capabilities.
The CLI can be installed from source by cloning this repository:
git clone https://github.com/datafusion-contrib/datafusion-distributed
cd datafusion-distributedAnd running:
cargo install --path cliThe CLI can be invoked by running:
datafusion-distributed-cliThe best way of trying distributed queries is by issuing queries against the parquet files in the
testdata directory. For maximum parallelism, set the following config options:
SET distributed.files_per_task = 1;The usage is exactly the same as the original CLI: https://datafusion.apache.org/user-guide/cli/usage.html