Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Distributed DataFusion CLI

This is the same CLI as https://github.com/apache/datafusion/blob/main/datafusion-cli/, but enriched with distributed execution capabilities.

Installation

The CLI can be installed from source by cloning this repository:

git clone https://github.com/datafusion-contrib/datafusion-distributed
cd datafusion-distributed

And running:

cargo install --path cli

Usage

The CLI can be invoked by running:

datafusion-distributed-cli

The best way of trying distributed queries is by issuing queries against the parquet files in the testdata directory. For maximum parallelism, set the following config options:

SET distributed.files_per_task = 1;

The usage is exactly the same as the original CLI: https://datafusion.apache.org/user-guide/cli/usage.html