[!INFO] Course Information
- Instructor: Christopher Seaman
- EA's: Samantha Chan [email protected] & Eric Yang [email protected]
- Dates: January 15th - March 19th, 2026 (11 class meetings)
- Lecture: Wednesday, 1:00 PM - 3:00 PM, Mission Hall 1407
- Lab: Wednesday, 3:00 PM - 4:30 PM, Mission Hall 1407
- GitHub: https://github.com/christopherseaman/datasci_223
- Course Website: https://christopherseaman.github.io/datasci_223/
- CLE: DATASCI 223: Applied Data Science with Python (Winter 2026)
This repository contains the course materials for UCSF DataSci 223: Applied Data Science with Python.
- Setup + Debugging - Notebook hygiene, defensive programming, VS Code debugger
- SQL for Data Analysis - SELECT, JOIN, GROUP BY, window functions, pandas integration
- Larger-than-Memory Data - Polars lazy evaluation, out-of-core processing, parquet
- NLP Foundations - Text preprocessing, embeddings, sentiment, clinical text applications
- Classification - Train/test splits, evaluation metrics, Random Forest, XGBoost
- Neural Networks - MLP, CNN, RNN/LSTM, PyTorch training loop
- Transformers & Deep Learning - Attention mechanism, Hugging Face, tokenization
- LLMs - DIY & Understanding - nanoGPT walkthrough, embeddings, fine-tuning concepts
- LLMs - API, Agentic & Workflows - OpenAI/Anthropic APIs, prompt engineering, tool use
- Time Series & Forecasting - Time-based splits, lag features, ARIMA basics
- TBD - Student Vote - Options: Computer Vision, Visualization & Dashboards, A/B Testing, Distributed Computing, or End-to-End Project
- Data cleaning, munging, and wrangling
- Jobs, technical interviews, & impostor syndrome
- Generative AI with Images
- Feature Engineering and Selection
- Algorithms and complexity notation
- Local setup with the "Modern Data Stack"
- Deploying a basic model/app to the web
- Ethics in Data Science