Introduction to PyTorch
CS236 Section, Autumn 2024
Honglin Chen
Machine learning framework
Deep learning primitives such as data loading, NN layer types,
activations, loss functions, and optimizers
Hardware acceleration on NVIDIA GPUs
What is Pytorch? Libraries for vision, NLP, and audio applications
Research prototyping
A machine learning framework that
accelerates the path from research Models are Python code, Automatic differentiation, and eager mode
prototyping to production deployment
Production deployment
TorchScript, TorchServe, quantization
Motivations
Python
NumPy
Building Blocks
Tensors
Operations
Overview
Modules
Examples
MNIST
Beyond PyTorch
Tools
High Level Libraries
Domain Specific Libraries
Motivations
Python vs. NumPy
X = [1] * 10000 X = np.full((10000,), 1)
Y = [0.5] * 10000 Y = np.full((10000,), 0.5)
Z = [None] * 10000 Z = X * Y
for i in range(10000):
Z[i] = X[i] * Y[i]
# 2.772092819213867 ms # 0.08273124694824219 ms
# Interpreter Overhead # Low Level Implementation
# 64 bit # Vectorization
Motivations
NumPy vs. PyTorch
X = np.full((10000,), 1) X = torch.full((10000,), 1).cuda()
Y = np.full((10000,), 0.5) Y = torch.full((10000,), 0.5).cuda()
Z = X * Y Z = X * Y
# 0.3185272216796875 ms
# GPU Acceleration
Z.sum().backward()
# 0.08273124694824219 ms dX = X.grad
# Low Level Implementation
# Vectorization # Automatic Differentiation
Building Blocks
TENSORS
Building Blocks
Tensors / Initialization
torch.tensor([5., 3.])
tensor([ 5., 3.,]) # defaults to
torch.float32
torch.from_numpy(np.array([5., 3.]))
tensor([ 5., 3.,], dtype=torch.float64) #
because numpy defaults to 64bit
torch.tensor([5., 3.]).numpy()
array([5., 3.], dtype=float32)
Building Blocks
Tensors / Initialization
torch.ones(5, 3)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], dtype=torch.float64)
Building Blocks
Tensors / Initialization
torch.randn(5, 3)
tensor([[ 0.2349, -0.0427, -0.5053],
[ 0.6455, 0.1199, 0.4239],
[ 0.1279, 0.1105, 1.4637],
[ 0.4259, -0.0763, -0.9671],
[ 0.6856, 0.5047, 0.4250]])
Building Blocks
Tensors / Initialization
torch.ones_like(tensor)
Input: tensor([[ 0.2349, -0.0427, -0.5053],
[ 0.6455, 0.1199,
0.4239]])
Output: tensor([[1., 1., 1.],
[1., 1., 1.],
dtype=torch.float64)
Building Blocks
Tensors / Initialization
torch.empty(5, 3)
tensor([[ 0.0000e+00, 2.5244e-29, 0.0000e+00],
[ 2.5244e-29, 1.4569e-19, 2.7517e+12],
[ 7.5338e+28, 3.0313e+32, 6.3828e+28],
[ 1.4603e-19, 1.0899e+27, 6.8943e+34],
[ 1.1835e+22, 7.0976e+22, 1.8515e+28]])
# The values are not initialized
Building Blocks
Tensors / Indexing & Reshaping
torch.tensor([[5., 3.]])[0, :]
tensor([ 5., 3.,])
torch.tensor([[5., 3.]]).view(-1) # infer
dimension size
torch.tensor([[5., 3.]]).view(2)
tensor([ 5., 3.,])
torch.tensor([[5., 3.]]).size()
torch.Size([1, 2])
Building Blocks
Tensors / Broadcasting
X = torch.ones((3, 3, 3))
Y = torch.ones((1, 1, 3))
Z = X * Y
Z.size()
torch.Size([3, 3, 3])
#
https://pytorch.org/docs/stable/notes/broad
casting.html
Building Blocks
Tensors / Devices
if torch.cuda.is_available():
device = torch.device("cuda") # a CUDA device object
x = torch.ones(2, device=device) # directly create a tensor on GPU
y = torch.ones(2).to(device) # or just use strings
`.to("cuda")`
z = x + y
print(z) # z is on GPU
print(z.to("cpu", torch.double)) # to(‘cpu’) moves array to CPU
# `x.cuda()` and `x.cpu()` also works
Building Blocks
Operations / Primitives
torch.tensor([5., 3.]) + torch.tensor([3., 5.])
tensor([ 8., 8.,])
z = torch.add(x, y)
torch.add(x, y, out=z)
y = y.add_(x) # inplace y += x
torch.tanh(y)
torch.stack([x, y])
# https://pytorch.org/docs/stable/torch.html
Building Blocks
Operations / Functional
import torch.nn.functional as F
X = torch.randn((64, 3, 256, 256))
W = torch.randn((8, 3, 3, 3)
out = F.conv2d(X, W, stride=1, padding=1)
# Like SciPy
# https://pytorch.org/docs/stable/nn.functional.html
Building Blocks
Operations / Automatic Differentiation
Computation as a graph built at runtime
x 2
x = torch.ones(2, 2, requires_grad=True)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
∂+
y = x + 2
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
y
Building Blocks
Operations / Automatic Differentiation
x 2
z = y * 3
out = z.mean()
∂+
tensor(9., grad_fn=<MeanBackward1>)
y 3
out.backward() # Must be scalar
∂*
print(x.grad) # Only leaf nodes have grad
Gradient w.r.t. the input Tensors is computed ∂mean
step-by-step from loss to the top in reverse
out
Building Blocks
Operations / Automatic Differentiation
x.requires_grad # True
(x ** 2).requires_grad # True
# Keeping track of activations is expensive
with torch.no_grad():
(x ** 2).requires_grad # False
(x.detach() ** 2).requires_grad # False
Building Blocks
Operations / nn
import torch.nn as nn import torch.nn.functional as F
X = torch.ones((64, 3, 256, 256)) X = torch.randn((64, 3, 256, 256))
W = torch.randn((8, 3, 3, 3)
conv = nn.Conv2D(in_channels=3,
out_channels=8, out = F.conv2d(X, W,
kernel_size=3, stride=1, padding=1)
stride=1,
padding=1) # Inherits from nn.Module
# Implemented using functional
out = conv(img) # Stores internal states
Building Blocks
Operations / Module
import torch.nn as nn # Move the module to GPUs
conv.cuda()
X = torch.ones((64, 3, 256, 256))
# Saves states
conv = nn.Conv2D(in_channels=3, conv.state_dict()
out_channels=8,
kernel_size=3, # Saves trainable states
stride=1, conv.parameters()
padding=1)
# Recursively visit child modules
conv.apply(weight_init)
Examples
MNIST
Preprocessing
Dataloader
Example Network
MNIST
Optimizer
Training
Examples
MNIST / Preprocessing
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# Convert to Torch Tensor and perform normalization
# https://pytorch.org/vision/stable/transforms.html
# e.x Color Jitter, Five Crops
Examples
MNIST / Dataloader
Import torch
import torchvision
trainset = torchvision.datasets.CIFAR10(
root='./data', train=True,
download=True, transform=transform)
# Dataloaders are python iterators
trainloader = torch.utils.data.DataLoader(
trainset, batch_size=8,
shuffle=True, num_workers=2)
Examples
MNIST / Network
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
Examples
MNIST / Network
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
...
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = torch.flatten(self.pool(F.relu(self.conv2(x))))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return self.fc3(x)
Examples
MNIST / Optimizer
import torch.optim as optim
# Instantiate nn.Module (Use default weights)
net = Net().to(“cuda”)
# Define loss function
criterion = nn.CrossEntropyLoss()
# Create optimizer: https://pytorch.org/docs/stable/optim.html
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Examples
MNIST / Training
net.train() # Set to training mode (there is also `net.eval()`)
for epoch in range(2):
for inputs, labels in trainloader:
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs.to(“cuda”))
loss = criterion(outputs, labels.to(“cuda”))
loss.backward()
optimizer.step()
Examples
MNIST / Recap
... transforms.Compose( ... # Define preprocessing transforms
... torch.utils.data.DataLoader( ... # Create Dataloader
... def Net(nn.Module): ... # Define Network
... criterion = nn.CrossEntropyLoss() ... # Define loss function
... optim.SGD(net.parameters(), ... # Create Optimizer
... for x, y in trainloader: ... # Iterate over Dataloader
... outputs = net(inputs) # Forward Pass
... criterion(outputs, labels) ... # Compute Loss
... optimizer.zero_grad() ... # Zero out gradients
... loss.backward() ... # Back Propagate
... optimizer.step() ... # Update weights
Beyond PyTorch
Tools / Keep Track of experiments, artifacts
Beyond PyTorch
High Level Libraries / Distributed & Mixed Precision Training
Beyond PyTorch
Domain Specific Libraries / Graph, RL, Probabilistic Programming