Thanks to visit codestin.com
Credit goes to github.com

Skip to content

eachsaj/benchmark_morphai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TPC-H Parquet dataset on macOS

To generate a TPC-H Parquet dataset on macOS, you have a couple of efficient options.

One popular method is to use the tpchgen-rs tool. This tool is a Rust-based implementation of the TPC-H data generator, specifically designed to output data in Parquet format, which is well-suited for analytics and efficient storage. tpchgen-rs is advantageous because it leverages the benefits of Rust's performance and safety features while producing data in a modern columnar format.

Alternatively, you can utilize the official TPC-H data generator, known as dbgen. This tool generates data in a flat file format (typically CSV). To convert the generated CSV files into Parquet format, you can use a database engine like DuckDB. DuckDB is particularly useful because it can read the CSV files directly and efficiently convert them into Parquet format, taking advantage of its capabilities for analytical queries.

To get started, you can refer to the official TPC-H documentation, which provides comprehensive details on the data generation process, including the schema, data characteristics, and guidelines for using both dbgen and tpchgen-rs. This documentation will help you understand the parameters you can set during the generation process, ensuring you customize the dataset to your specific needs.

Overall, both methods are effective for generating TPC-H datasets in Parquet format, so you can choose the one that best fits your workflow and requirements. Tool: https://github.com/clflushopt/tpchgen-rs

Prerequisite

Install Rust

Procedure

cargo install tpchgen-cli
tpchgen-cli --scale-factor 1 --output-dir ./ --format parquet

Generated Data Model

image

Image Source

To download the generated Parquet files

Please use the link

About

TPCH- benchmark dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages