Cross-Sensor Calibration provides a Python pipeline for processing NEON Airborne Observation Platform hyperspectral flight lines and resampling them to emulate alternate sensors in a reproducible, scriptable workflow.
Option A (conda, recommended for GDAL/rasterio users):
conda env create -f environment.yaml
conda activate cross-sensor-calOption B (pip, lightweight):
python -m venv cscal-env
source cscal-env/bin/activate # or Windows equivalent
pip install .We test on Python 3.10 in GitHub Actions (Linux x86_64) and periodically validate Python 3.11 builds on the same platform. Other operating systems may work, but Linux is the documented baseline.
Run the full cross-sensor pipeline on one or more NEON flight lines:
cscal-pipeline \
--base-folder output_demo \
--site-code NIWO \
--year-month 2023-08 \
--product-code DP1.30006.001 \
--flight-lines NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance \
NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance \
--max-workers 2This will:
- Download each NEON
.h5flight line (with a live download progress bar). - Convert the cube to ENVI using canonical
<flight_stem>_envi.img/.hdrnames. - Compute and apply BRDF + topographic correction.
- Convolve to multiple sensor bandpasses (Landsat TM/ETM+/OLI, OLI2, Micasense, etc.).
- Export reflectance products and per-sensor tables to ENVI + Parquet.
Output structure:
output_demo/
<flight_stem>.h5 # raw NEON flightline (safe to delete later)
<flight_stem>/ # all derived products for that line
<flight_stem>_envi.img/.hdr/.parquet
<flight_stem>_brdfandtopo_corrected_envi.img/.hdr/.json/.parquet
<flight_stem>_landsat_tm_envi.img/.hdr/.parquet
...
<flight_stem>_qa.png # QA summary figure (see below)
After the pipeline finishes you can generate QA plots summarizing each flight line:
cscal-qa --base-folder output_demoThat command creates <flight_stem>_qa.png inside each per-flightline folder.
--max-workers N(defaults to2) bounds parallelism.- Each worker processes one flight line in isolation inside its own subdirectory.
- Logs from each worker are prefixed with the flight line ID for readability.
- Memory warning: each hyperspectral cube can consume tens of GB in memory, so avoid
setting
--max-workershigher than your hardware can support.
As of v2.2 the pipeline automatically downloads NEON HDF5 cubes, streams live progress bars, and writes every derived product into a dedicated per-flightline folder.
Install the base package (includes Rasterio, GeoPandas, Spectral, Ray, and other runtime dependencies):
pip install cross-sensor-calThe legacy
fullextra remains available for compatibility and currently resolves to the same dependency set as the base install.
from cross_sensor_cal.pipelines.pipeline import go_forth_and_multiply
from pathlib import Path
go_forth_and_multiply(
base_folder=Path("output_fresh"),
site_code="NIWO",
year_month="2023-08",
product_code="DP1.30006.001",
flight_lines=[
"NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance",
"NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance",
],
max_workers=2, # Run flightlines in parallel
)This automatically:
- Downloads the required NEON HDF5 flightlines with live progress bars.
- Converts each cube to ENVI with per-tile progress updates.
- Builds BRDF + topo correction JSON.
- Applies corrections, performs cross-sensor convolution, and exports Parquet tables.
- Writes every derived product into a per-flightline subfolder while leaving the raw
.h5next to it for easy cleanup.
output_fresh/
├── NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance.h5
├── NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance/
│ ├── ..._envi.img/.hdr/.parquet
│ ├── ..._brdfandtopo_corrected_envi.img/.hdr/.json/.parquet
│ ├── ..._landsat_tm_envi.img/.hdr/.parquet
│ ├── ..._micasense_envi.img/.hdr/.parquet
│ └── NIWO_brdf_model.json
├── NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance.h5
└── NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance/
└── <same pattern>
- Deterministic staging: Each step writes files with canonical names. Downstream steps never guess paths.
- Checkpointing / idempotence: Stages skip work if valid outputs already exist
(
✅ ... already complete ... (skipping)). - Crash-safe restarts: You can re-run
cscal-pipelineon the same folder after an interruption; it will resume from what's missing. - Per-flightline isolation: Each flight line has its own subdirectory. This allows parallel execution and makes it clear which outputs belong together.
- Ephemeral HDF5: The original NEON
.h5stays at the top level and can be deleted later if you only want corrected/derived products. - QA panels: After processing,
cscal-qagenerates a multi-panel summary figure per flight line, to visually confirm that each step (export, correction, convolution, parquet) completed successfully. QA figures are re-generated on every run to reflect the current pipeline settings. The spectral panel uses unitless reflectance (0–1) and shades VIS/NIR/SWIR regions for readability.
By default the pipeline processes multiple flight lines in sequence. To speed up
workflows, set max_workers in go_forth_and_multiply() to run several in
parallel. Each worker operates on its own subfolder and logs are prefixed with
the flightline ID:
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🚀 Processing ...
[NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance] 🚀 Processing ...
Feature availability by install type:
| Feature | Base | [full] |
|---|---|---|
| Core array ops (NumPy/Scipy) | ✅ | ✅ |
| Raster I/O (Rasterio) | ✅ | ✅ |
| Vector I/O/ops (GeoPandas) | ✅ | ✅ |
| ENVI/HDR (Spectral) | ✅ | ✅ |
| HDF5 (h5py) | ✅ | ✅ |
| Ray parallelism | ✅ | ✅ |
Replace SITE with a NEON site code and FLIGHT_LINE with an actual line identifier.
Cross-Sensor Calibration processes every flight line through an idempotent
five-stage flow. Each stage streams a tqdm-style progress bar, logs with a
scoped [flight_stem] prefix, and writes artifacts using canonical names from
get_flight_paths():
flowchart LR
D[Download .h5]
E[Export ENVI]
J[Build BRDF+topo JSON]
C[Correct reflectance]
R[Resample + Parquet]
D --> E --> J --> C --> R
- Download –
stage_download_h5()ensures every NEON.h5exists before heavy processing begins. Downloads show live byte counters and leave the.h5beside the derived folder for easy cleanup. - ENVI export – Converts the HDF5 cube into
<flight_stem>_envi.img/.hdrwith chunk-level progress bars and restart-safe validation. - BRDF + topo JSON – Computes correction parameters once and reuses valid JSON on reruns.
- BRDF + topo correction – Applies the physical correction to produce the
canonical
<flight_stem>_brdfandtopo_corrected_envi.img/.hdr, updating the progress bar for each processed chunk. - Sensor resampling + Parquet – Convolves the corrected cube to per-sensor ENVI pairs and emits Parquet summaries into the same per-flightline folder.
Helper utilities such as get_flight_paths(base_folder, flight_stem) and
_scoped_log_prefix(prefix) keep each worker isolated, ensure consistent
filenames, and make the parallel logs readable.
from pathlib import Path
from cross_sensor_cal.pipelines.pipeline import go_forth_and_multiply
go_forth_and_multiply(
base_folder=Path("output_tester"),
site_code="NIWO",
year_month="2023-08",
product_code="DP1.30006.001",
flight_lines=[
"NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance",
"NEON_D13_NIWO_DP1_L020-1_20230815_directional_reflectance",
],
max_workers=4,
)This executes the download → ENVI → BRDF+topo → resample pipeline for every
flight line, streaming progress bars along the way. After the last worker
finishes, the pipeline logs ✅ All requested flightlines processed..
You can safely rerun the same command. The pipeline is stage-aware and restart-safe:
- If a stage already produced a valid output, that stage logs a
✅ ... (skipping)message and returns immediately. - If an output is missing or looks corrupted/empty, only that stage is recomputed.
- If you crashed halfway through a long run, you can rerun the same call to resume where work is still needed.
A realistic rerun for one flight line now looks like (progress bars omitted for brevity):
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🚀 Processing ...
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 📥 stage_download_h5() found existing .h5 (skipping)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🔎 ENVI export target is ..._envi.img / ..._envi.hdr
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ ENVI export already complete -> ..._envi.img / ..._envi.hdr (skipping heavy export)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ Correction JSON already complete -> ..._brdfandtopo_corrected_envi.json (skipping)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ BRDF+topo correction already complete -> ..._brdfandtopo_corrected_envi.img / ..._brdfandtopo_corrected_envi.hdr (skipping)
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🎯 Convolving corrected reflectance
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ Wrote landsat_tm product -> ..._landsat_tm_envi.img / ..._landsat_tm_envi.hdr
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] ✅ Wrote micasense product -> ..._micasense_envi.img / ..._micasense_envi.hdr
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 📊 Sensor convolution summary | succeeded=['landsat_tm', 'micasense'] skipped=['landsat_etm+', 'landsat_oli', 'landsat_oli2'] failed=[]
[NEON_D13_NIWO_DP1_L019-1_20230815_directional_reflectance] 🎉 Finished pipeline
After all requested flight lines finish, the run concludes with
✅ All requested flightlines processed.
The new pipeline will NOT keep re-loading 20+ GB hyperspectral cubes into memory on every rerun. The ENVI export step now checks for an existing, valid ENVI pair before doing any heavy work. If it's already there, it logs "✅ ... skipping heavy export" and moves on.
After a successful run you should see, for each flight line:
<base_folder>/<flight_stem>.h5at the workspace root for easy cleanup or archival.<base_folder>/<flight_stem>/containing every derived artifact:<flight_stem>_envi.img/.hdr/.parquet(uncorrected ENVI export + summary).<flight_stem>_brdfandtopo_corrected_envi.img/.hdr/.json/.parquet(canonical corrected cube).<flight_stem>_<sensor>_envi.img/.hdr/.parquetfor each simulated sensor.- Support files such as
NIWO_brdf_model.jsongenerated during correction.
QA panel example coming soon.
- Panel A – Raw ENVI RGB: Confirms the uncorrected export renders with sensible color balance and spatial alignment.
- Panel B – Patch-mean spectrum: Compares raw vs. BRDF+topo corrected spectra with a difference trace to verify the correction stage modified reflectance as expected.
- Panel C – Corrected NIR preview: Displays a high-NIR band from the corrected cube to quickly spot artifacts such as striping or nodata gaps.
- Panel D – Sensor thumbnails: Shows downsampled previews for each convolved sensor product to confirm bandstacks were generated.
- Panel E – Parquet summary: Lists the Parquet sidecars and file sizes so you can confirm tabular exports are present and non-empty.
Generate these summaries at any time with cscal-qa --base-folder <output_dir>.
The _brdfandtopo_corrected_envi suffix remains the canonical "final"
reflectance for analysis and downstream comparisons; all scientific semantics
are unchanged from previous releases.
To review QA performance across many flightlines at once, run:
cscal-qa-dashboard --base-folder output_freshThis aggregates every *_qa_metrics.parquet file, computes per-flightline
statistics, writes qa_dashboard_summary.parquet, and renders a companion plot
(qa_dashboard_summary.png). Flag rates above 25% are marked with
Each stage uses get_flight_paths() to discover its inputs/outputs and performs
restart-safe validation before doing work. Valid ENVI pairs or JSON artifacts
are reused rather than recomputed, ensuring reruns only fill in missing or
corrupted pieces. _scoped_log_prefix() keeps the console readable when several
flightlines run concurrently.
- The final stage turns the corrected reflectance cube
(
*_brdfandtopo_corrected_envi.img/.hdr) into simulated sensor products (e.g. Landsat-style band stacks). - Each target sensor is attempted independently. Missing/unknown sensor definitions are logged with a warning and skipped.
- Each simulated sensor writes an ENVI
.img/.hdrpair named<flight_stem>_<sensor>_envi.*. GeoTIFFs are no longer emitted by this stage. - If a sensor product already exists on disk and validates as an ENVI pair, it is skipped with
a
✅ ... already complete ... (skipping)log. - At the end of the stage, the pipeline logs a summary of which sensors succeeded, which were skipped (already done), and which failed.
- The pipeline only raises a runtime error if all sensors failed to produce usable output for that flight line. Otherwise, partial success is allowed and the pipeline continues.
This enforced order prevents earlier bugs where convolution could run on uncorrected data.
-
process_one_flightline()is now the canonical per-flightline workflow. -
go_forth_and_multiply()orchestrates downloads, per-flightline workers, and options likemax_workers. -
get_flight_paths()is the single source of truth for naming and layout of:- the
.h5input, - the per-flightline working directory,
- the uncorrected ENVI export,
- the correction JSON,
- the corrected ENVI (
*_brdfandtopo_corrected_envi.*), - the per-sensor convolution outputs and Parquet summaries.
All pipeline stages call
get_flight_paths()instead of guessing filenames. If file naming changes, updateget_flight_paths(), not each stage. - the
-
Each stage validates its outputs (non-empty files, parseable JSON, etc.). If outputs are valid, that stage logs "✅ ... skipping" and returns immediately.
If outputs are missing or corrupted, that stage recomputes them.
This is what makes the pipeline resumable after a crash or partial run.
The recommended setup commands are documented in Environment setup. For a quick pip-based installation use:
pip install cross-sensor-cal
cross-sensor-cal[full]remains available as an alias for teams with existing automation but currently resolves to the same dependency set.
Browse the full documentation site at earthlab.github.io/cross-sensor-cal. The site is built with MkDocs Material and automatically deployed to GitHub Pages.
Key entry points:
- Overview
- Quickstart
- Stage 01 Raster Processing
- Stage 02 Sorting
- Stage 03 Pixel Extraction
- Stage 04 Spectral Library
- Stage 05 MESMA
| Python versions | OS (CI) | Notes |
|---|---|---|
| 3.10, 3.11 | Linux x86_64 | Tested in GitHub Actions (CI). |
| 3.10, 3.11 | macOS / other | Community supported. |
If you use this software, please cite:
cross-sensor-cal (v2.2.0): NEON hyperspectral cross-sensor harmonization pipeline.
SeeCITATION.cffin this repository for full author list and metadata.
Distributed under the GPLv3 License. Full citation details are available in CITATION.cff.