Thanks to visit codestin.com
Credit goes to github.com

Skip to content

christopherseaman/datasci_223

Repository files navigation

Applied Data Science with Python

[!INFO] Course Information

This repository contains the course materials for UCSF DataSci 223: Applied Data Science with Python.

Course Topics (Winter 2026 - 11 Lectures)

Foundational (L01-L04)

  1. Setup + Debugging - Notebook hygiene, defensive programming, VS Code debugger
  2. SQL for Data Analysis - SELECT, JOIN, GROUP BY, window functions, pandas integration
  3. Larger-than-Memory Data - Polars lazy evaluation, out-of-core processing, parquet
  4. NLP Foundations - Text preprocessing, embeddings, sentiment, clinical text applications

ML/AI Progression (L05-L09)

  1. Classification - Train/test splits, evaluation metrics, Random Forest, XGBoost
  2. Neural Networks - MLP, CNN, RNN/LSTM, PyTorch training loop
  3. Transformers & Deep Learning - Attention mechanism, Hugging Face, tokenization
  4. LLMs - DIY & Understanding - nanoGPT walkthrough, embeddings, fine-tuning concepts
  5. LLMs - API, Agentic & Workflows - OpenAI/Anthropic APIs, prompt engineering, tool use

Applied / Student Choice (L10-L11)

  1. Time Series & Forecasting - Time-based splits, lag features, ARIMA basics
  2. TBD - Student Vote - Options: Computer Vision, Visualization & Dashboards, A/B Testing, Distributed Computing, or End-to-End Project

Additional Topics (from previous years)

  • Data cleaning, munging, and wrangling
  • Jobs, technical interviews, & impostor syndrome
  • Generative AI with Images
  • Feature Engineering and Selection
  • Algorithms and complexity notation
  • Local setup with the "Modern Data Stack"
  • Deploying a basic model/app to the web
  • Ethics in Data Science

About

Lecture notes and assignments for applied Python course for health datasci masters students

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •