Thanks to visit codestin.com
Credit goes to github.com

Skip to content

juraskov/FM-tutorials

Β 
Β 

Repository files navigation

🧬 Foundation Models for Science Workshop: Protein Machine Learning Tutorials

November 3-5, 2025 | University of Toronto

Repository of hands-on workshop materials for Foundation Models for Science Workshop. These tutorials cover the complete workflow of working with protein language models, from data preparation to advanced analysis techniques.


πŸ“š Table of Contents


πŸš€ Installation (Post-workshop)

1. Clone the Repository

git clone https://github.com/ai-for-science-org/private_tutorials.git
cd private_tutorials

2. Create a Virtual Environment (Recommended)

Using conda:

conda create -n fomo-workshop python=3.9
conda activate fomo-workshop

Using venv:

python -m venv fomo-env
source fomo-env/bin/activate  # On macOS/Linux
# fomo-env\Scripts\activate   # On Windows

3. Install Dependencies

pip install -r requirements.txt

4. Launch Jupyter

jupyter lab

πŸ“– Tutorials Overview

🧬 Tutorial 1: Data Extraction & Cleaning

πŸ“‚ Location: Tutorial_1_Data_Cleaning/Data_Extraction.ipynb

What You'll Learn:

  • Download and extract datasets from ProteinGym
  • Clean and standardize tabular data with pandas
  • Handle missing values and duplicates
  • Normalize DMS (Deep Mutational Scanning) scores for machine learning
  • Visualize dataset characteristics

Key Skills: Data wrangling, pandas operations, data quality assessment


πŸ”¬ Tutorial 2: Model Fine-Tuning

πŸ“‚ Location: Tutorial_2_Fine_Tuning/ (Notebook TBD)

What You'll Learn:

  • Load and use pre-trained protein language models (ESM-2)
  • Generate protein embeddings for downstream tasks
  • Perform zero-shot similarity search
  • Fine-tune models with LoRA (Low-Rank Adaptation)
  • Predict DMS stability scores using adapted models

Key Skills: Transfer learning, model adaptation, embedding generation


🎯 Tutorial 3: Uncertainty Quantification

πŸ“‚ Location: Tutorial_3_Uncertainty_Quant/UQ_tutorial.ipynb

What You'll Learn:

  • Assess and improve model calibration using temperature scaling
  • Implement heteroscedastic models to capture prediction uncertainty
  • Use MC dropout to estimate epistemic uncertainty
  • Apply conformal prediction for distribution-free uncertainty intervals
  • Distinguish between different types of uncertainty in your predictions

Key Skills: Confidence estimation, calibration methods, probabilistic prediction


🧠 Tutorial 4: Latent Space Analysis

πŸ“‚ Location: Tutorial_4_Latent_Space/Latent_Space_Analysis.ipynb

What You'll Learn:

  • Extract and manipulate protein embeddings from pre-trained ESM2 models
  • Reduce high-dimensional embeddings to 2D for visualization using UMAP
  • Quantify clustering quality using mutual information metrics
  • Optimize dimensionality reduction hyperparameters automatically with Optuna
  • Analyze how features change across different layers of a transformer model
  • Interpret latent space structure in relation to protein function (EC classes)

Key Skills: Embedding analysis, dimensionality reduction, hyperparameter optimization, interpretability


πŸ“¦ Requirements

Core Dependencies

  • Python: 3.8 or higher
  • PyTorch: For deep learning models
  • Transformers: HuggingFace library for ESM models
  • Pandas: Data manipulation
  • NumPy: Numerical computing
  • Matplotlib/Seaborn: Visualization

Specialized Libraries

  • UMAP: Dimensionality reduction (Tutorial 4)
  • Optuna: Hyperparameter optimization (Tutorial 4)
  • scikit-learn: Machine learning utilities
  • tqdm: Progress bars

Hardware Recommendations

  • GPU: Recommended for Tutorial 2 (fine-tuning) and Tutorial 4 (embeddings)
  • RAM: 16GB minimum, 32GB recommended
  • Storage: ~10GB for datasets and models

πŸ“ License

See LICENSE file for details.


🀝 Contributing

This is a private workshop repository. For questions or issues, please contact the workshop organizers.


πŸ“§ Support

For technical support or questions about the tutorials, please reach out to the FOMO4Sci Workshop team.


Happy Learning! πŸš€

About

Tutorials for the foundational models for science workshop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%