Cross-Sensor Calibration

Cross-Sensor Calibration provides a Python pipeline for processing NEON Airborne Observation Platform hyperspectral flight lines and resampling them to emulate alternate sensors in a reproducible, scriptable workflow.

Environment setup

Option A (conda, recommended for GDAL/rasterio users):

conda env create -f environment.yaml
conda activate cross-sensor-cal

Option B (pip, lightweight):

python -m venv cscal-env
source cscal-env/bin/activate  # or Windows equivalent
pip install .

We test on Python 3.10 in GitHub Actions (Linux x86_64) and periodically validate Python 3.11 builds on the same platform. Other operating systems may work, but Linux is the documented baseline.

Quickstart (CLI)

Run the full cross-sensor pipeline on one or more NEON flight lines:

cscal-pipeline \
  --base-folder output_demo \
  --site-code NIWO \
  --year-month 2023-08 \
  --product-code DP1.30006.001 \
  --flight-lines NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance \
                 NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance \
  --max-workers 2

This will:

Download each NEON .h5 flight line (with a live download progress bar).
Convert the cube to ENVI using canonical <flight_stem>_envi.img/.hdr names.
Compute and apply BRDF + topographic correction.
Convolve to multiple sensor bandpasses (Landsat TM/ETM+/OLI, OLI2, Micasense, etc.).
Export reflectance products and per-sensor tables to ENVI + Parquet.

Output structure:

output_demo/
    <flight_stem>.h5                       # raw NEON flightline (safe to delete later)
    <flight_stem>/                         # all derived products for that line
        <flight_stem>_envi.img/.hdr/.parquet
        <flight_stem>_brdfandtopo_corrected_envi.img/.hdr/.json/.parquet
        <flight_stem>_landsat_tm_envi.img/.hdr/.parquet
        ...
        <flight_stem>_qa.png               # QA summary figure (see below)

After the pipeline finishes you can generate QA plots summarizing each flight line:

cscal-qa --base-folder output_demo

That command creates <flight_stem>_qa.png inside each per-flightline folder.

Parallel execution from the CLI

--max-workers N (defaults to 2) bounds parallelism.
Each worker processes one flight line in isolation inside its own subdirectory.
Logs from each worker are prefixed with the flight line ID for readability.
Memory warning: each hyperspectral cube can consume tens of GB in memory, so avoid setting --max-workers higher than your hardware can support.

Quickstart (Python API)

As of v2.2 the pipeline automatically downloads NEON HDF5 cubes, streams live progress bars, and writes every derived product into a dedicated per-flightline folder.

Install the base package (includes Rasterio, GeoPandas, Spectral, Ray, and other runtime dependencies):

pip install cross-sensor-cal

The legacy full extra remains available for compatibility and currently resolves to the same dependency set as the base install.

Quickstart Example

from cross_sensor_cal.pipelines.pipeline import go_forth_and_multiply
from pathlib import Path

go_forth_and_multiply(
    base_folder=Path("output_fresh"),
    site_code="NIWO",
    year_month="2023-08",
    product_code="DP1.30006.001",
    flight_lines=[
        "NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance",
        "NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance",
    ],
    max_workers=2,  # Run flightlines in parallel
)

This automatically:

Downloads the required NEON HDF5 flightlines with live progress bars.
Converts each cube to ENVI with per-tile progress updates.
Builds BRDF + topo correction JSON.
Applies corrections, performs cross-sensor convolution, and exports Parquet tables.
Writes every derived product into a per-flightline subfolder while leaving the raw .h5 next to it for easy cleanup.

Example Output Layout

output_fresh/
├── NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance.h5
├── NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance/
│   ├── ..._envi.img/.hdr/.parquet
│   ├── ..._brdfandtopo_corrected_envi.img/.hdr/.json/.parquet
│   ├── ..._landsat_tm_envi.img/.hdr/.parquet
│   ├── ..._micasense_envi.img/.hdr/.parquet
│   └── NIWO_brdf_model.json
├── NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance.h5
└── NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance/
    └── <same pattern>

Reproducibility guarantees

Deterministic staging: Each step writes files with canonical names. Downstream steps never guess paths.
Checkpointing / idempotence: Stages skip work if valid outputs already exist (✅ ... already complete ... (skipping)).
Crash-safe restarts: You can re-run cscal-pipeline on the same folder after an interruption; it will resume from what's missing.
Per-flightline isolation: Each flight line has its own subdirectory. This allows parallel execution and makes it clear which outputs belong together.
Ephemeral HDF5: The original NEON .h5 stays at the top level and can be deleted later if you only want corrected/derived products.
QA panels: After processing, cscal-qa generates a multi-panel summary figure per flight line, to visually confirm that each step (export, correction, convolution, parquet) completed successfully. QA figures are re-generated on every run to reflect the current pipeline settings. The spectral panel uses unitless reflectance (0–1) and shades VIS/NIR/SWIR regions for readability.

Parallel Execution

By default the pipeline processes multiple flight lines in sequence. To speed up workflows, set max_workers in go_forth_and_multiply() to run several in parallel. Each worker operates on its own subfolder and logs are prefixed with the flightline ID:

[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🚀 Processing ...
[NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance] 🚀 Processing ...

Feature availability by install type:

Feature	Base	`[full]`
Core array ops (NumPy/Scipy)	✅	✅
Raster I/O (Rasterio)	✅	✅
Vector I/O/ops (GeoPandas)	✅	✅
ENVI/HDR (Spectral)	✅	✅
HDF5 (h5py)	✅	✅
Ray parallelism	✅	✅

Replace SITE with a NEON site code and FLIGHT_LINE with an actual line identifier.

Pipeline overview

Cross-Sensor Calibration processes every flight line through an idempotent five-stage flow. Each stage streams a tqdm-style progress bar, logs with a scoped [flight_stem] prefix, and writes artifacts using canonical names from get_flight_paths():

flowchart LR
    D[Download .h5]
    E[Export ENVI]
    J[Build BRDF+topo JSON]
    C[Correct reflectance]
    R[Resample + Parquet]
    D --> E --> J --> C --> R

Download – stage_download_h5() ensures every NEON .h5 exists before heavy processing begins. Downloads show live byte counters and leave the .h5 beside the derived folder for easy cleanup.
ENVI export – Converts the HDF5 cube into <flight_stem>_envi.img/.hdr with chunk-level progress bars and restart-safe validation.
BRDF + topo JSON – Computes correction parameters once and reuses valid JSON on reruns.
BRDF + topo correction – Applies the physical correction to produce the canonical <flight_stem>_brdfandtopo_corrected_envi.img/.hdr, updating the progress bar for each processed chunk.
Sensor resampling + Parquet – Convolves the corrected cube to per-sensor ENVI pairs and emits Parquet summaries into the same per-flightline folder.

Helper utilities such as get_flight_paths(base_folder, flight_stem) and _scoped_log_prefix(prefix) keep each worker isolated, ensure consistent filenames, and make the parallel logs readable.

Running the pipeline

from pathlib import Path
from cross_sensor_cal.pipelines.pipeline import go_forth_and_multiply

go_forth_and_multiply(
    base_folder=Path("output_tester"),
    site_code="NIWO",
    year_month="2023-08",
    product_code="DP1.30006.001",
    flight_lines=[
        "NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance",
        "NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance",
    ],
    max_workers=4,
)

This executes the download → ENVI → BRDF+topo → resample pipeline for every flight line, streaming progress bars along the way. After the last worker finishes, the pipeline logs ✅ All requested flightlines processed..

Idempotent / restart-safe

You can safely rerun the same command. The pipeline is stage-aware and restart-safe:

If a stage already produced a valid output, that stage logs a ✅ ... (skipping) message and returns immediately.
If an output is missing or looks corrupted/empty, only that stage is recomputed.
If you crashed halfway through a long run, you can rerun the same call to resume where work is still needed.

A realistic rerun for one flight line now looks like (progress bars omitted for brevity):

[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🚀 Processing ...
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 📥 stage_download_h5() found existing .h5 (skipping)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🔎 ENVI export target is ..._envi.img / ..._envi.hdr
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ ENVI export already complete -> ..._envi.img / ..._envi.hdr (skipping heavy export)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ Correction JSON already complete -> ..._brdfandtopo_corrected_envi.json (skipping)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ BRDF+topo correction already complete -> ..._brdfandtopo_corrected_envi.img / ..._brdfandtopo_corrected_envi.hdr (skipping)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🎯 Convolving corrected reflectance
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ Wrote landsat_tm product -> ..._landsat_tm_envi.img / ..._landsat_tm_envi.hdr
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ Wrote micasense product -> ..._micasense_envi.img / ..._micasense_envi.hdr
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 📊 Sensor convolution summary | succeeded=['landsat_tm', 'micasense'] skipped=['landsat_etm+', 'landsat_oli', 'landsat_oli2'] failed=[]
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🎉 Finished pipeline

After all requested flight lines finish, the run concludes with ✅ All requested flightlines processed.

Memory safety

The new pipeline will NOT keep re-loading 20+ GB hyperspectral cubes into memory on every rerun. The ENVI export step now checks for an existing, valid ENVI pair before doing any heavy work. If it's already there, it logs "✅ ... skipping heavy export" and moves on.

Data products

After a successful run you should see, for each flight line:

<base_folder>/<flight_stem>.h5 at the workspace root for easy cleanup or archival.
<base_folder>/<flight_stem>/ containing every derived artifact:
- <flight_stem>_envi.img/.hdr/.parquet (uncorrected ENVI export + summary).
- <flight_stem>_brdfandtopo_corrected_envi.img/.hdr/.json/.parquet (canonical corrected cube).
- <flight_stem>_<sensor>_envi.img/.hdr/.parquet for each simulated sensor.
- Support files such as NIWO_brdf_model.json generated during correction.

Quality Assurance (QA) panels

QA panel example coming soon.

Panel A – Raw ENVI RGB: Confirms the uncorrected export renders with sensible color balance and spatial alignment.
Panel B – Patch-mean spectrum: Compares raw vs. BRDF+topo corrected spectra with a difference trace to verify the correction stage modified reflectance as expected.
Panel C – Corrected NIR preview: Displays a high-NIR band from the corrected cube to quickly spot artifacts such as striping or nodata gaps.
Panel D – Sensor thumbnails: Shows downsampled previews for each convolved sensor product to confirm bandstacks were generated.
Panel E – Parquet summary: Lists the Parquet sidecars and file sizes so you can confirm tabular exports are present and non-empty.

Generate these summaries at any time with cscal-qa --base-folder <output_dir>.

The _brdfandtopo_corrected_envi suffix remains the canonical "final" reflectance for analysis and downstream comparisons; all scientific semantics are unchanged from previous releases.

QA dashboard summaries

To review QA performance across many flightlines at once, run:

cscal-qa-dashboard --base-folder output_fresh

This aggregates every *_qa_metrics.parquet file, computes per-flightline statistics, writes qa_dashboard_summary.parquet, and renders a companion plot (qa_dashboard_summary.png). Flag rates above 25% are marked with ⚠️ for quick triage.

Pipeline stages

Each stage uses get_flight_paths() to discover its inputs/outputs and performs restart-safe validation before doing work. Valid ENVI pairs or JSON artifacts are reused rather than recomputed, ensuring reruns only fill in missing or corrupted pieces. _scoped_log_prefix() keeps the console readable when several flightlines run concurrently.

Sensor convolution / resampling behavior

The final stage turns the corrected reflectance cube (*_brdfandtopo_corrected_envi.img/.hdr) into simulated sensor products (e.g. Landsat-style band stacks).
Each target sensor is attempted independently. Missing/unknown sensor definitions are logged with a warning and skipped.
Each simulated sensor writes an ENVI .img/.hdr pair named <flight_stem>_<sensor>_envi.*. GeoTIFFs are no longer emitted by this stage.
If a sensor product already exists on disk and validates as an ENVI pair, it is skipped with a ✅ ... already complete ... (skipping) log.
At the end of the stage, the pipeline logs a summary of which sensors succeeded, which were skipped (already done), and which failed.
The pipeline only raises a runtime error if all sensors failed to produce usable output for that flight line. Otherwise, partial success is allowed and the pipeline continues.

This enforced order prevents earlier bugs where convolution could run on uncorrected data.

Developer notes

process_one_flightline() is now the canonical per-flightline workflow.
go_forth_and_multiply() orchestrates downloads, per-flightline workers, and options like max_workers.
get_flight_paths() is the single source of truth for naming and layout of:
- the .h5 input,
- the per-flightline working directory,
- the uncorrected ENVI export,
- the correction JSON,
- the corrected ENVI (*_brdfandtopo_corrected_envi.*),
- the per-sensor convolution outputs and Parquet summaries.
All pipeline stages call get_flight_paths() instead of guessing filenames. If file naming changes, update get_flight_paths(), not each stage.
Each stage validates its outputs (non-empty files, parseable JSON, etc.). If outputs are valid, that stage logs "✅ ... skipping" and returns immediately.
If outputs are missing or corrupted, that stage recomputes them.
This is what makes the pipeline resumable after a crash or partial run.

Install

The recommended setup commands are documented in Environment setup. For a quick pip-based installation use:

pip install cross-sensor-cal

cross-sensor-cal[full] remains available as an alias for teams with existing automation but currently resolves to the same dependency set.

Documentation

Browse the full documentation site at earthlab.github.io/cross-sensor-cal. The site is built with MkDocs Material and automatically deployed to GitHub Pages.

Key entry points:

Support Matrix

Python versions	OS (CI)	Notes
3.10, 3.11	Linux x86_64	Tested in GitHub Actions (CI).
3.10, 3.11	macOS / other	Community supported.

Citation

If you use this software, please cite:

cross-sensor-cal (v2.2.0): NEON hyperspectral cross-sensor harmonization pipeline.
See CITATION.cff in this repository for full author list and metadata.

License

Distributed under the GPLv3 License. Full citation details are available in CITATION.cff.

Name		Name	Last commit message	Last commit date
Latest commit History 411 Commits
.github/workflows		.github/workflows
Datasets		Datasets
__pycache__		__pycache__
bin		bin
constraints		constraints
data		data
deprecated		deprecated
docs		docs
scripts		scripts
src/cross_sensor_cal		src/cross_sensor_cal
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Raster_processing.ipynb		Raster_processing.ipynb
environment.yaml		environment.yaml
gocmd		gocmd
mkdocs.yml		mkdocs.yml
publication_checklist.md		publication_checklist.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cross-Sensor Calibration

Environment setup

Quickstart (CLI)

Parallel execution from the CLI

Quickstart (Python API)

Quickstart Example

Example Output Layout

Reproducibility guarantees

Parallel Execution

Pipeline overview

Running the pipeline

Idempotent / restart-safe

Memory safety

Data products

Quality Assurance (QA) panels

QA dashboard summaries

Pipeline stages

Sensor convolution / resampling behavior

Developer notes

Install

Documentation

Support Matrix

Citation

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

License

earthlab/cross-sensor-cal

Folders and files

Latest commit

History

Repository files navigation

Cross-Sensor Calibration

Environment setup

Quickstart (CLI)

Parallel execution from the CLI

Quickstart (Python API)

Quickstart Example

Example Output Layout

Reproducibility guarantees

Parallel Execution

Pipeline overview

Running the pipeline

Idempotent / restart-safe

Memory safety

Data products

Quality Assurance (QA) panels

QA dashboard summaries

Pipeline stages

Sensor convolution / resampling behavior

Developer notes

Install

Documentation

Support Matrix

Citation

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages