Our course combines {Data + Chemistry + Engineering}. We’ll explore how machine learning and data science can solve real chemical engineering problems with a mix of:
- Lectures with chemical examples and datasets 📊
- Hands-on sessions 👩💻
- Group projects 💡
This repo is where lectures, tutorials, assignments, and project guidelines will live for our course.
Here’s where to find stuff:
lectures/→ demo notebookstutorials/→ in-class hands-on exercisesprojects/→ group project informationdata/→ small sample datasets used in tutorials
| Week | Topic | Slides |
|---|---|---|
| Week 01 | Introduction to Machine Learning & Course Overview | Open in Google Slides |
| Week 02 | Data, Representation, and Exploratory Data Analysis | Open in Google Slides |
| Week 03 | Supervised Learning Workflow | Open in Google Slides |
| Week 04 | Modelling well: complexity, regularization and model selection | Open in Google Slides |
| Week 05 | Model Zoo: Different Ways of Learning from Data | Open in Google Slides |
| Week 06 | Logistic Regression & Classification | Open in Google Slides |
| Week 07 | Unsupervised Learning | Open in Google Slides |
| Week | Tutorial | Colab Link |
|---|---|---|
| W01 | 1. Python Refresher | |
| 2. Linear Algebra | ||
| W02 | 3. RDKit and EDA | |
| W03–06 | 4. Supervised Learning — Regression | |
| W07 | 5. Supervised Learning — Classification |
To reproduce the Python environment:
conda env create -f environment.yml
conda activate che1147 Tell me what to improve or any other requests using this totally anonymous form:
Or open a GitHub issue if you found a bug, typo, or broken link:
Found this useful? Please consider starring the repo 🌟 — it helps others discover the project and shows your support!
We welcome:
- 🐛 Bug reports (broken notebook cells, path issues, typos)
- 📚 Content improvements (clearer explanations, new examples)
- 🧪 New exercises/tutorials/content (small, focused PRs work best)
This course is being created by the AI4ChemS team and TAs:
A shout-out as well to our friends at the Chemical Cognition Lab 👋.
They run CHE1148, which builds on this course. CHE1147 is the foundation, CHE1148 takes it further to neural nets and representation learning. We’ve been inspired by each other’s ideas along the way.
The content, examples, figures, and ideas are inspired from many textbooks, and other open courses which we will reference properly. The main references include:
- Christopher Bishop’s Pattern Recognition and Machine Learning (Springer, 2006)
- Simon Prince’s Understanding Deep Learning (Cambridge University Press, 2023)
- Kevin M. Jablonka for ML-MolSim