scope-ml uses machine learning to classify light curves from the Zwicky Transient Facility (ZTF) and the Vera C. Rubin Observatory (LSST). The documentation is hosted at https://zwickytransientfacility.github.io/scope-ml/. For local preview, install mkdocs-material and run mkdocs serve.
Feature generation includes period-finding (Conditional Entropy, Analysis of Variance, Lomb-Scargle, FPW) and Fourier decomposition via the periodfind library. Fourier features are computed using a batched weighted linear least-squares solver with BIC model selection, replacing the previous per-source scipy.optimize.curve_fit loop.
The pipeline supports running against Rubin Data Preview 1 (DP1) data stored as local parquet files, bypassing the TAP API entirely. This is the recommended approach for large-scale feature generation.
You need three gzip-compressed parquet files downloaded from the Rubin Science Platform and placed in a single directory:
/path/to/dp1_data/
Object.parquet.gzip
ForcedSource.parquet.gzip
Visit.parquet.gzip
Tell scope-ml to use local files instead of the TAP API by setting data_path in config.yaml:
rubin:
data_path: /path/to/dp1_data/Or set the environment variable:
export RUBIN_DATA_PATH=/path/to/dp1_data/When data_path (or RUBIN_DATA_PATH) is set, all Rubin commands automatically use the local parquet backend (RubinLocalClient) instead of the TAP client. No token is needed for local mode.
For a small number of sources (e.g. a cone search or a short object list):
# Cone search
generate-features-rubin --ra 62.0 --dec -37.0 --radius 60 --doCPU
# From a CSV file with an objectId column
generate-features-rubin --objectid-file my_objects.csv --doCPUOutput is written to generated_features_rubin/gen_features_rubin.parquet by default.
For processing the full DP1 catalog, use the chunked SLURM workflow:
1. Prepare chunks -- scan the local parquet files, filter to objects with enough detections, and split into chunk CSVs:
prepare-rubin-chunks \
--data-path /path/to/dp1_data/ \
--chunk-size 5000 \
--min-n-lc-points 50 \
--output-dir rubin_chunksThis writes rubin_chunks/chunk_000.csv, rubin_chunks/chunk_001.csv, etc., plus a master list rubin_chunks/all_eligible_objectids.csv.
2. Generate the SLURM array script:
generate-features-rubin-slurm \
--chunk-dir rubin_chunks \
--output-dir rubin_slurm \
--venv /path/to/your/.venv \
--cpus-per-task 8 \
--top-n-periods 50This writes rubin_slurm/run_rubin_features.sh. Edit the script to adjust partition, account, memory, and module loads for your cluster.
3. Submit the array job:
sbatch rubin_slurm/run_rubin_features.shEach array task processes one chunk and writes generated_features_rubin/gen_features_rubin_<TASK_ID>.parquet.
4. Combine results after all jobs finish:
combine-rubin-features \
--input-dir generated_features_rubin \
--output generated_features_rubin/dp1_features_combined.parquetIf you don't have a SLURM cluster, you can run chunks sequentially (or resume after interruption) with:
python tools/run_rubin_chunked.py \
--objectid-file rubin_chunks/all_eligible_objectids.csv \
--doCPU \
--chunk-size 5000 \
--top-n-periods 50Completed chunks are saved to generated_features_rubin/chunks/ and skipped on restart, so the job is resumable.
| Command | Description |
|---|---|
get-rubin-ids |
Discover object IDs via cone search or read from CSV |
generate-features-rubin |
Generate features for a set of Rubin sources |
prepare-rubin-chunks |
Scan local parquet files and split eligible objects into chunk CSVs |
generate-features-rubin-slurm |
Generate a SLURM array job script from chunk files |
combine-rubin-features |
Merge per-chunk parquet outputs into a single file |
To install SCoPe with periodfind, please complete periodfind GPU installation with Rust. Once that is done, you might encounter an issue in numpy version discrepancy between what SCoPe requires and what periodfind requires. SCoPe-ml 0.9.5 requires numpy<1.24,>=1.23 and periodfind might have installed a different version. You can make it work by installing SCoPe witout dependencies and then installing the dependencies one-by-one while skipping numpy.
pip install scope-ml --no-deps
pip install deepdiff gsutil matplotlib questionary scikit-learn wandb h5py astropy fast-histogram healpy "jinja2<=3.1" pandas penquins pyyaml tdtax pyarrow "numba>=0.56.4" cesium xgboost seaborn pydot notebook cython "tables>=3.7,<3.9.2" pyvoWe gratefully acknowledge previous and current support from the U.S. National Science Foundation (NSF) Harnessing the Data Revolution (HDR) Institute for Accelerated AI Algorithms for Data-Driven Discovery (A3D3) under Cooperative Agreement No. PHY-2117997.

