Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Rust tools for working with CSV files: scrubcsv, catcsv, fixed2csv, geochunk, hashcsv.

faradayio/csv-tools

Repository files navigation

Miscellaneous Faraday CSV tools

This repository contains tools for manipulating CSV files, all written in Rust. Most of these are a page or two long, and they're intended to run as quickly as possible without requiring heroic optimization. If you like these, you will probably also like xsv, which provides an extensive toolkit of CSV-processing utilities written in Rust.

  • catcsv: Concatenate directory of CSV files into a single CSV stream, decompressing as needed.
  • fixed2csv: Convert fixed-width fields to CSV.
  • geochunk: Add a column to a CSV file that groups records into similarly-sized chunks based on US ZIP codes and census data.
  • scrubcsv: Turn messy, slightly corrupt CSV files into something clean and standardized.
  • hashcsv: Add a new column to a CSV file, containing a hash of the other columns. Useful for de-duplicating.
  • geocode-csv (separate repo): Geocode CSV files in bulk using the Smarty API (or other APIs). This is in separate repo beause it depends on tokio and networking and a more complicated build system.

Releasing

To create a new release:

  1. Update version numbers in the respective Cargo.toml files
  2. Update changelogs in each tool's CHANGELOG.md
  3. Commit and push your changes to main
  4. Install dependencies (if not already done):
    pdm install
  5. Run the release script:
    pdm run python release.py

The script will:

  • Read current versions from Cargo.toml files
  • Create and push git tags (e.g., catcsv_v1.0.1)
  • Trigger GitHub Actions workflows to build binaries
  • Create GitHub releases with binaries attached

Each tool is released independently based on its version in Cargo.toml.

Manual release process

If you prefer to do it manually:

  1. Create tags:

    git tag catcsv_v1.0.1
    git push origin catcsv_v1.0.1
  2. Trigger the workflow:

    gh workflow run ci-catcsv.yml --ref catcsv_v1.0.1

Current coding standards

In general, this repository should contain standard modern Rust code, formatting using cargo fmt and the supplied settings. The code should have no warnings when run with clippy.

These tools were written over several years, and they represent a history of Rust at Faraday. The following dependencies should be replaced if we get the chance:

  • structopt: Upgrade to the lastest clap, which includes it.
  • docopt: Replace with clap's new structopt support.
  • error_chain and failure: Replace with anyhow (plus thiserror if we need specific custom error types).

In general, it's a good idea to update any older code to match the newest code.

About

Rust tools for working with CSV files: scrubcsv, catcsv, fixed2csv, geochunk, hashcsv.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •