Astronomical Clustering Project: Grader Run Guide

Final Result (Key Outcome)

The final model for this project is Spectral Clustering, which achieved an accuracy of 97.9681%. This indicates that the model performs very well and that the full pipeline produces strong clustering quality for distinguishing STAR, GALAXY, and QSO classes.

Preworkflow Baseline (V6)

As an earlier discovery/preworkflow stage, the first K-Means model on pure physics-informed features in extra folder/V6.ipynb also produced a strong result with 97.84% overall accuracy. This is slightly lower than the final Spectral Clustering model (difference: 0.1281 percentage points), and it helped validate that the physics-based feature direction was already effective before the final model refinement.

This README explains exactly how to open and run this project in a clean environment.

Workflow Scope Clarification (Important for Grading)

code.ipynb is the main, complete workflow notebook. It includes the introduction, physics/background context, EDA, feature engineering/analysis, and clustering model building/evaluation.
model.ipynb is a separated modeling-focused version for convenience, mainly to run and review model-related steps.
For full context and the complete end-to-end work, graders should prioritize reviewing and running code.ipynb.

1. Files to Include for Submission

Required files:

code.ipynb
model.ipynb
star-galaxy-quasar.csv

Strongly recommended files (lets grader run the modeling notebook immediately without rerunning full preprocessing):

star-galaxy-quasar_processed.csv
star-galaxy-quasar_featured.csv

2. Python Environment

Recommended:

Python 3.10-3.13
VS Code with Jupyter extension

Create and activate a virtual environment from the project root:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

If PowerShell execution policy blocks activation, run in Command Prompt:

.venv\Scripts\activate.bat

3. Install Dependencies

Install required packages:

pip install pandas numpy matplotlib seaborn scikit-learn umap-learn hdbscan jupyter ipykernel

Note:

hdbscan is optional in code logic (the notebook skips HDBSCAN sections if unavailable), but installing it is recommended for full output parity.

4. Run Order (Important)

Option A: Full pipeline from raw data (recommended for grading)

Open code.ipynb
Set notebook kernel to the created .venv
Run all cells from top to bottom
Confirm generated files exist:
- star-galaxy-quasar_processed.csv
- star-galaxy-quasar_featured.csv
Open model.ipynb
Run all cells from top to bottom

Option B: Run modeling only

If star-galaxy-quasar_featured.csv already exists, you can run only:

model.ipynb (Run All)

The notebook is set to load:

star-galaxy-quasar_featured.csv first
falls back to star-galaxy-quasar_processed.csv if needed

5. Expected Outputs

When run successfully, grader should see:

Feature engineering and EDA plots from code.ipynb
Clustering metrics/tables (Silhouette, AMI, ARI, NMI) in model.ipynb
Confusion matrices and post-hoc model comparison visualizations

6. Common Issues and Fixes

Issue: `FileNotFoundError` for dataset CSV

Fix:

Ensure notebook working directory is the project root.
Ensure required CSV files are in the same folder as notebooks.

Issue: `No module named umap` or `No module named hdbscan`

Fix:

pip install umap-learn hdbscan

Issue: Wrong kernel selected

Fix:

In VS Code notebook toolbar, click Kernel.
Select Python interpreter from .venv.
Restart kernel and run all cells again.

7. Reproducibility Notes

The notebooks set fixed random seeds (for example, random_state=42) in major modeling steps to keep results stable across runs.

8. Quick Grader Checklist

Open project folder in VS Code.
Create/activate .venv.
Install packages.
Run code.ipynb (Run All).
Run model.ipynb (Run All).
Verify tables/plots render without errors.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
code.ipynb		code.ipynb
star-galaxy-quasar.csv		star-galaxy-quasar.csv
star-galaxy-quasar_featured.csv		star-galaxy-quasar_featured.csv
star-galaxy-quasar_processed.csv		star-galaxy-quasar_processed.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Astronomical Clustering Project: Grader Run Guide

Final Result (Key Outcome)

Preworkflow Baseline (V6)

Workflow Scope Clarification (Important for Grading)

1. Files to Include for Submission

2. Python Environment

3. Install Dependencies

4. Run Order (Important)

Option A: Full pipeline from raw data (recommended for grading)

Option B: Run modeling only

5. Expected Outputs

6. Common Issues and Fixes

Issue: `FileNotFoundError` for dataset CSV

Issue: `No module named umap` or `No module named hdbscan`

Issue: Wrong kernel selected

7. Reproducibility Notes

8. Quick Grader Checklist

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Astronomical Clustering Project: Grader Run Guide

Final Result (Key Outcome)

Preworkflow Baseline (V6)

Workflow Scope Clarification (Important for Grading)

1. Files to Include for Submission

2. Python Environment

3. Install Dependencies

4. Run Order (Important)

Option A: Full pipeline from raw data (recommended for grading)

Option B: Run modeling only

5. Expected Outputs

6. Common Issues and Fixes

Issue: FileNotFoundError for dataset CSV

Issue: No module named umap or No module named hdbscan

Issue: Wrong kernel selected

7. Reproducibility Notes

8. Quick Grader Checklist

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Issue: `FileNotFoundError` for dataset CSV

Issue: `No module named umap` or `No module named hdbscan`

Packages