Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bennycwong/marimo-pokemon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Professional ML Engineering Onboarding Project

Pokemon Card Machine Learning - Complete ML Engineering Course

The comprehensive, production-focused ML course that teaches the 80-20 most important skills for ML engineers at companies


🎯 What This Is

A hands-on learning system that teaches you to build production ML systems from scratch, covering both technical skills AND the business/collaboration skills that companies actually need. By the end, you'll be ready to contribute meaningfully to an ML team on day one.

Themes: Pokemon card type classification + price prediction Format: Interactive Marimo notebooks with exercises Duration: 20-24 hours over 3-4 weeks Outcome: Production-ready ML engineering skills + business acumen + team collaboration skills


πŸ“š Status: βœ… COMPLETE - All 8 Modules + Capstone Ready!

βœ… All Modules Completed

Module 0: ML in Business Context (1-2 hours)

  • When to use ML (and when not to)
  • ROI calculation and business metrics
  • Stakeholder communication
  • Setting realistic expectations
  • Status: βœ… Complete with 5 real-world scenarios

Module 1: Data Engineering Foundations (2-3 hours)

  • Data loading, validation, and quality analysis
  • Pandera schema validation
  • Data cleaning pipelines
  • Pandas vs Polars performance comparison
  • Status: βœ… Complete with exercises

Module 2: EDA & Feature Engineering (3-4 hours)

  • Comprehensive exploratory data analysis
  • Feature engineering with domain knowledge
  • Data leakage prevention
  • Scikit-learn preprocessing pipelines
  • Status: βœ… Complete with exercises

Module 3: Model Training & Experimentation (3-4 hours)

  • Baseline models and model zoo
  • Cross-validation and hyperparameter tuning
  • Experiment tracking
  • Systematic model selection
  • Status: βœ… Complete with exercises

Module 4: Model Evaluation & Validation (2-3 hours)

  • Metrics beyond accuracy
  • Confusion matrices and error analysis
  • Confidence calibration
  • Model cards and documentation
  • Status: βœ… Complete with exercises

Module 5: Deployment & Inference (2-3 hours)

  • Model serialization and versioning
  • Production inference API
  • Input validation and error handling
  • Model serving patterns
  • Status: βœ… Complete with interactive UI

Module 6: Production ML & Monitoring (2-3 hours)

  • Production debugging runbook
  • Data drift detection techniques
  • Incident response procedures
  • Monitoring strategies and alerts
  • Status: βœ… Complete with 3 incident scenarios

Module 7: Team Collaboration & Code Reviews (2-3 hours)

  • Git workflows for ML projects
  • ML code review best practices
  • Writing effective PR descriptions
  • Working with existing ML codebases
  • Status: βœ… Complete with 3 sample PR reviews

Module 8: Capstone Project (4-6 hours)

  • End-to-end Pokemon card price prediction
  • Integrates all concepts from Modules 0-7
  • 9 phases: business β†’ data β†’ features β†’ training β†’ evaluation β†’ deployment β†’ monitoring β†’ docs β†’ reflection
  • Self-assessment rubric (100 points)
  • Status: βœ… Complete with detailed rubric

πŸ“¦ Additional Resources

  • βœ… ML Cheatsheet - Quick reference for all key concepts
  • βœ… Progress Tracker - Track your learning journey
  • βœ… CAPSTONE_RUBRIC.md - 100-point self-assessment rubric
  • βœ… 16,000 Pokemon Cards Dataset - Comprehensive dataset with price_usd for regression tasks
  • βœ… USING_PRESPLIT_DATA.md - Guide for using pre-split train/val/test datasets

πŸš€ Quick Start

Prerequisites

  • uv - Fast Python package installer (install here)
  • Python 3.13+ (uv will handle this if needed)
  • That's it! uvx will handle everything else

Get Started (Recommended: 3 Steps)

# 1. Navigate to the project directory
cd marimo-pokemon

# 2. Install dependencies with uv
uv sync

# 3. Start learning with Workspace Mode (RECOMMENDED!)
uvx marimo edit ./

🌟 Why Workspace Mode (uvx marimo edit ./)?

  • Opens the entire project in one window
  • File browser to easily switch between modules
  • No need to remember file names
  • See your progress across all modules
  • This is the recommended way to use the course!

Alternative: Run Specific Modules

If you prefer to open individual notebooks:

# Start with Module 0 (business context)
uvx marimo edit 00_ml_in_business.py

# Or jump to a specific module
uvx marimo edit 03_model_training.py

# View-only mode (read without editing)
uvx marimo run 00_ml_in_business.py

Generate the Dataset

# Generate comprehensive Pokemon card dataset (16,000 cards)
uv run python data/generate_comprehensive_dataset.py

# This creates:
#   - data/pokemon_cards.csv (16,000 cards)
#   - data/pokemon_cards_sample_1000.csv (1,000 card sample)

Create Train/Validation/Test Splits (Optional but Recommended)

# Create professional 70/15/15 splits (stratified by type)
uv run python data/create_splits.py

# This creates data/splits/:
#   - pokemon_train.csv (11,200 cards)
#   - pokemon_validation.csv (2,400 cards)
#   - pokemon_test.csv (2,400 cards)

# See USING_PRESPLIT_DATA.md for usage guide

Why use pre-split data?

  • βœ… No data leakage (splits done before any EDA)
  • βœ… Consistent across all modules
  • βœ… Stratified by type (balanced classes)
  • βœ… Follows industry best practices (70/15/15)

Validate Your Setup

# Run the setup test to verify everything works
uv run python test_setup.py

# This checks:
# - All dependencies are installed
# - Dataset exists
# - All 8 modules + 7 exercises are present
# - Libraries can be imported
# - Basic operations work

If test_setup.py passes: You're ready to start! πŸŽ‰ If it fails: See TROUBLESHOOTING.md for solutions.


πŸ“Œ Key uv/uvx Commands

Command What it does When to use
uv sync Install dependencies Once at start, or when dependencies change
uvx marimo edit ./ Open entire project Recommended way to use the course
uvx marimo edit <file> Open specific notebook When you know which module you want
uvx marimo run <file> View notebook (read-only) Just reading, not editing
uv run python <file> Run Python script For dataset generation, utilities

πŸ’‘ Pro Tip: Always use uvx marimo edit ./ from the project root to get the best experience!

Why uvx is awesome:

  • No need to install marimo globally
  • Automatically uses the right version from your environment
  • Works from any directory
  • Perfect for ephemeral environments (containers, Codespaces)
  • No virtual environment activation needed

πŸ“– Course Structure

Complete Learning Path (20-24 hours)

Phase 1: Business & Technical Foundations (8-10 hours)

  1. Module 0: ML in Business Context β†’ uvx marimo edit 00_ml_in_business.py
  2. Exercises 0: Business case studies β†’ uvx marimo edit exercises_00.py
  3. Module 1: Data Engineering β†’ uvx marimo edit 01_data_engineering.py
  4. Exercises 1: Practice validation and pipelines β†’ uvx marimo edit exercises_01.py
  5. Module 2: EDA & Features β†’ uvx marimo edit 02_eda_and_features.py
  6. Exercises 2: Feature engineering competition β†’ uvx marimo edit exercises_02.py

Phase 2: Model Development (8-10 hours) 7. Module 3: Model Training β†’ uvx marimo edit 03_model_training.py 8. Exercises 3: CV and tuning β†’ uvx marimo edit exercises_03.py 9. Module 4: Model Evaluation β†’ uvx marimo edit 04_model_evaluation.py 10. Exercises 4: Metrics and error analysis β†’ uvx marimo edit exercises_04.py 11. Module 5: Deployment β†’ uvx marimo edit 05_inference_service.py

Phase 3: Production & Collaboration (6-8 hours) 12. Module 6: Production ML & Monitoring β†’ uvx marimo edit 06_production_monitoring.py 13. Exercises 6: Incident response scenarios β†’ uvx marimo edit exercises_06.py 14. Module 7: Team Collaboration β†’ uvx marimo edit 07_collaboration.py 15. Exercises 7: Code review practice β†’ uvx marimo edit exercises_07.py

Phase 4: Capstone (4-6 hours) 16. Module 8: End-to-End Project β†’ uvx marimo edit 08_capstone.py 17. Reference: Capstone Rubric β†’ CAPSTONE_RUBRIC.md 18. Reference: ML Cheatsheet β†’ ml_cheatsheet.md

Pro Tip: Workspace Mode

Open all notebooks at once with:

uvx marimo edit ./

This gives you a file browser and lets you switch between modules easily!


🎯 What You'll Learn

Technical Skills (Core ML Engineering)

  • βœ… Build end-to-end ML systems from data to deployment
  • βœ… Write production-quality ML code with proper validation
  • βœ… Engineer features using domain knowledge without data leakage
  • βœ… Train and evaluate models systematically
  • βœ… Deploy models with monitoring and error handling
  • βœ… Debug production ML issues using runbooks
  • βœ… Detect and respond to data drift

Business Skills (What Companies Actually Need)

  • βœ… Frame ML problems and calculate ROI
  • βœ… Communicate with non-technical stakeholders
  • βœ… Set realistic expectations and success metrics
  • βœ… Know when NOT to use ML
  • βœ… Translate business metrics to model metrics

Collaboration Skills (Working in Teams)

  • βœ… Use Git workflows for ML projects
  • βœ… Review ML code effectively
  • βœ… Write clear PR descriptions
  • βœ… Document models with model cards
  • βœ… Onboard to existing ML codebases

You'll be ready to contribute meaningfully to an ML team on day one!


πŸ“ Project Structure

marimo-pokemon/
β”œβ”€β”€ README.md                          # This file
β”œβ”€β”€ CAPSTONE_RUBRIC.md                # 100-point self-assessment rubric
β”œβ”€β”€ prompt.md                          # Complete course specification
β”œβ”€β”€ progress_tracker.md                # Track your learning
β”œβ”€β”€ ml_cheatsheet.md                   # Quick reference guide
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ generate_dataset.py           # Dataset generator
β”‚   β”œβ”€β”€ pokemon_cards.csv             # 800 Pokemon cards (with price_usd)
β”‚   └── clean/                        # Cleaned data (generated)
β”‚
β”œβ”€β”€ 00_ml_in_business.py              # Module 0: Business context
β”œβ”€β”€ exercises_00.py                    # Module 0 exercises
β”œβ”€β”€ 01_data_engineering.py            # Module 1: Data engineering
β”œβ”€β”€ exercises_01.py                    # Module 1 exercises
β”œβ”€β”€ 02_eda_and_features.py            # Module 2: EDA & features
β”œβ”€β”€ exercises_02.py                    # Module 2 exercises
β”œβ”€β”€ 03_model_training.py              # Module 3: Model training
β”œβ”€β”€ exercises_03.py                    # Module 3 exercises
β”œβ”€β”€ 04_model_evaluation.py            # Module 4: Evaluation
β”œβ”€β”€ exercises_04.py                    # Module 4 exercises
β”œβ”€β”€ 05_inference_service.py           # Module 5: Deployment
β”œβ”€β”€ 06_production_monitoring.py       # Module 6: Production & monitoring
β”œβ”€β”€ exercises_06.py                    # Module 6 exercises
β”œβ”€β”€ 07_collaboration.py               # Module 7: Team collaboration
β”œβ”€β”€ exercises_07.py                    # Module 7 exercises
β”œβ”€β”€ 08_capstone.py                    # Module 8: End-to-end capstone
β”‚
└── models/                            # Saved models (generated)

πŸ’‘ Key Features

Production-Focused

  • Code written to production standards
  • Type hints, docstrings, error handling
  • Industry best practices throughout

Hands-On Learning

  • 15+ exercises across all modules
  • Interactive UIs in Marimo notebooks
  • Real-world scenarios and challenges

Complete Coverage

  • Full ML lifecycle from data to deployment
  • Both classification and regression examples
  • Tools: pandas, polars, scikit-learn, XGBoost

Professional Content

  • Industry context in every module
  • Real company examples (Netflix, Google, etc.)
  • Common pitfalls and how to avoid them

πŸŽ“ Learning Approach

Recommended Path

  1. Week 1: Modules 0-2 + exercises (8-10 hours)
    • Business context, data engineering, feature engineering
  2. Week 2: Modules 3-5 + exercises (8-10 hours)
    • Model training, evaluation, deployment
  3. Week 3: Modules 6-7 + exercises (6-8 hours)
    • Production monitoring, team collaboration
  4. Week 4: Module 8 Capstone (4-6 hours)
    • End-to-end project applying all skills

Study Tips

  • Start with Module 0 - business context is critical!
  • Run every code cell and experiment
  • Answer socratic questions before moving on
  • Complete ALL exercises (they're where real learning happens!)
  • Use progress_tracker.md to track completion
  • Refer to ml_cheatsheet.md when stuck
  • For capstone, use CAPSTONE_RUBRIC.md to self-assess

πŸ† Success Metrics

You've completed the course when you can:

  • Frame ML problems with business context and ROI
  • Load, validate, and clean data like a pro
  • Engineer features without data leakage
  • Train multiple models and pick the best systematically
  • Evaluate models with appropriate metrics
  • Deploy a model to production with monitoring
  • Debug production ML systems using runbooks
  • Collaborate effectively with ML teams
  • Score 80+ on the capstone rubric

Ready for your first ML engineer role!

What Companies Are Looking For

This course covers the 80-20 most important skills for ML engineers:

  • Technical Core (60%): Data β†’ Features β†’ Models β†’ Deployment
  • Business Acumen (20%): ROI, stakeholder communication, when NOT to use ML
  • Collaboration (20%): Git workflows, code reviews, documentation

Most courses only teach the technical core. This course teaches all three.


πŸ“š Additional Resources

  • Testing Results: TESTING_RESULTS.md - βœ… Complete test results (all tests passed!)
  • Testing Guide: TESTING_GUIDE.md - How to test each module
  • Review Summary: REVIEW_SUMMARY.md - Assessment criteria
  • Module 1 Evaluation: MODULE_1_EVALUATION.md - Quality metrics

πŸ› οΈ Tech Stack

  • Python 3.13+
  • Marimo - Reactive notebooks (better than Jupyter for production)
  • pandas - Data manipulation
  • polars - High-performance data processing
  • scikit-learn - ML algorithms
  • XGBoost - Gradient boosting
  • Pandera - Data validation
  • Matplotlib/Seaborn - Visualization

πŸš€ Getting Started

Ready to become an ML engineer?

Recommended: Workspace Mode

cd marimo-pokemon
uv sync
uvx marimo edit ./

This opens the entire project in one window with a file browser. Start with 00_ml_in_business.py and work through modules 0-8 in order.

Alternative Options

Option 1: Start with Module 0

uvx marimo edit 00_ml_in_business.py

Start here to understand the business context before diving into code!

Option 2: Jump to specific modules

uvx marimo edit 03_model_training.py   # Jump to specific module
uvx marimo edit 08_capstone.py         # Jump to capstone (after 0-7)

Option 3: View-only mode

uvx marimo run 00_ml_in_business.py    # View without editing

Happy learning! πŸŽ“

P.S. This course will prepare you for the 80-20 most important skills companies need in ML engineers. Use uvx marimo edit ./ for the best experience, and start with Module 0 - don't skip the business context!

About

Learning Repo for learning production ML

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •