Add xarray_to_cdf support for uncompressed epoch variables#329
Conversation
ISTP/SPDF validation requires epoch variables to be written without gzip compression. Previously, xarray_to_cdf applied the caller-provided compression value uniformly to every variable, so calls such as xarray_to_cdf(..., compression=6) also compressed ISTP epoch variables named epoch or epoch_N. Update xarray_to_cdf to select compression per variable. ISTP epoch variables now always receive Compress=0, while non-epoch variables keep the requested compression level. Add compression_skip_vars so callers can also force additional named variables to be written uncompressed. Add regression tests covering default epoch behavior and explicit caller-provided compression skips.
There was a problem hiding this comment.
Pull request overview
Updates cdflib.xarray.xarray_to_cdf() to satisfy ISTP/SPDF guidance by ensuring epoch variables are written uncompressed, even when gzip compression is enabled for other variables.
Changes:
- Add
_variable_compression()to centralize per-variable compression selection. - Automatically disable compression for variables named
epochorepoch_N, plus any caller-specifiedcompression_skip_vars. - Add tests verifying epoch variables are uncompressed by default under
compression=6, and that the skip list forces additional variables to be uncompressed.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cdflib/xarray/xarray_to_cdf.py |
Adds per-variable compression selection and a new compression_skip_vars API to ensure ISTP epoch variables are written without gzip compression. |
tests/test_xarray_reader_writer.py |
Adds regression tests asserting epoch variables get Compress == 0 while other variables retain the requested compression, plus skip-list behavior. |
| def _variable_compression(var_name: str, compression: int, compression_skip_vars: List[str]) -> int: | ||
| if _is_istp_epoch_variable(var_name) or var_name in compression_skip_vars: | ||
| return 0 | ||
|
|
||
| return compression |
There was a problem hiding this comment.
Optional: _variable_compression() calls _is_istp_epoch_variable() for each variable written, and _is_istp_epoch_variable() recompiles its regex patterns on every call. Consider precompiling the regex(es) at module scope (or using a single precompiled pattern) to avoid repeated compilation overhead when writing datasets with many variables.
Summary
This updates
xarray_to_cdf()so ISTP epoch variables namedepochorepoch_Nare written without gzip compression, even when variable compression is enabled for the rest of the CDF.This addresses an ISTP/SPDF validation issue found in IMAP generated CDFs:
When running CDF files through the CDF checker command line tool it reported errors like:
Changes
_variable_compression()to centralize per-variable compression selection.Compress=0for ISTP epoch variables matchingepochorepoch_N.compression_skip_varsfor callers that need to force additional variables to be written uncompressed.compressionvalue.Validation
Added tests covering:
compression=6:epoch Compress == 0epoch_1 Compress == 0Compress == 6compression_skip_varsare written withCompress == 0Local verification: