This repository contains my complete implementation for CMU 10-714: Deep Learning Systems.
The course provides a full-stack understanding of modern deep learning systems — from high-level framework design and automatic differentiation, down to low-level hardware acceleration and production deployment.
Throughout the course, I built Needle, a deep learning framework developed entirely from scratch.
Needle supports autograd, GPU acceleration, loss functions, data loaders, and optimizers, and can train a variety of modern neural network architectures including CNNs, RNNs, LSTMs, and Transformers.
10-714: Deep Learning Systems (CMU)
This course covers the fundamental building blocks of modern deep learning frameworks.
Students design and implement all components step-by-step, bridging the gap between theory and production-level systems.
Topics include:
- Computational graphs and reverse-mode automatic differentiation
- Tensor operations and broadcasting
- GPU acceleration using CUDA kernels
- Neural network layers and modules (Linear, Conv2D, BatchNorm, etc.)
- Loss functions and optimization algorithms
- Sequence models (RNN, LSTM)
- Transformer architectures
- Dataset loaders, training loops, and optimization pipelines
Needle is a minimalist deep learning framework that demonstrates:
- Automatic differentiation via a dynamic computation graph
- CPU and GPU backends for tensor operations
- Training utilities (optimizers, data loaders, model serialization)
- Neural network modules implemented on top of the autograd engine
- End-to-end models, including:
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Long Short-Term Memory (LSTM)
- Transformers for sequence modeling