PyTorch Fundamentals: Building Blocks of Deep Learning
Part of the Naver Boostcamp AI Tech series - Week 2
Introduction
PyTorch has become one of the most popular deep learning frameworks, favored by researchers and practitioners alike for its dynamic computation graphs and Pythonic design. In this post, I’ll share my learnings from Week 2 of Naver Boostcamp AI Tech, focusing on PyTorch fundamentals.
Why PyTorch?
Before diving into the technical details, let’s understand what makes PyTorch special:
- Dynamic Computation Graphs: Unlike static graph frameworks, PyTorch builds the graph on-the-fly, making debugging intuitive
- Pythonic Design: Feels natural to Python developers
- Strong GPU Acceleration: Seamless CUDA integration
- Rich Ecosystem: Extensive libraries for computer vision, NLP, and more
Core Concepts
1. Tensors: The Foundation
Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU acceleration capabilities.
import torch
# Creating tensors
x = torch.tensor([[1, 2], [3, 4]])
y = torch.zeros(2, 3)
z = torch.randn(2, 3) # Random normal distribution
# Moving to GPU
if torch.cuda.is_available():
x = x.cuda()
2. Autograd: Automatic Differentiation
PyTorch’s autograd system automatically computes gradients, which is essential for training neural networks.
# Enable gradient computation
x = torch.tensor([2.0], requires_grad=True)
y = x ** 2
y.backward() # Compute gradients
print(x.grad) # dy/dx = 2x = 4.0
3. Building Neural Networks
PyTorch provides nn.Module as the base class for all neural network modules.
import torch.nn as nn
import torch.nn.functional as F
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
Training Loop Pattern
A typical PyTorch training loop follows this pattern:
model = SimpleNet()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
for epoch in range(num_epochs):
for batch_idx, (data, target) in enumerate(train_loader):
# Forward pass
output = model(data)
loss = criterion(output, target)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
Key Takeaways from Boostcamp
1. Device Agnostic Code
Always write code that can run on both CPU and GPU:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
data = data.to(device)
2. DataLoader is Your Friend
Use DataLoader for efficient batch processing:
from torch.utils.data import DataLoader, TensorDataset
dataset = TensorDataset(X_train, y_train)
loader = DataLoader(dataset, batch_size=32, shuffle=True)
3. Model Checkpointing
Save your model regularly during training:
# Save
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': loss,
}, 'checkpoint.pth')
# Load
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state_dict'])
Common Pitfalls and Solutions
1. Gradient Accumulation
Remember to zero gradients before each backward pass:
optimizer.zero_grad() # Don't forget this!
2. Tensor Shape Mismatches
Always verify tensor shapes during development:
print(f"Input shape: {x.shape}")
print(f"Output shape: {model(x).shape}")
3. Memory Management
Clear cache when working with large models:
torch.cuda.empty_cache()
Practical Project: MNIST Classifier
Here’s a complete example bringing everything together:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Define model
class MNISTNet(nn.Module):
def __init__(self):
super(MNISTNet, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
# Training configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MNISTNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Data loading
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
# Training loop
model.train()
for epoch in range(5):
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print(f'Epoch: {epoch}, Batch: {batch_idx}, Loss: {loss.item():.4f}')
What’s Next?
In the next post, I’ll dive into more advanced PyTorch topics:
- Custom datasets and data augmentation
- Transfer learning with pretrained models
- Mixed precision training
- Distributed training strategies
Resources
- PyTorch Official Tutorials
- PyTorch Documentation
- Naver Boostcamp AI Tech
- Deep Learning with PyTorch (Book)
Conclusion
PyTorch’s flexibility and intuitive design make it an excellent choice for both research and production. The key is understanding its core concepts: tensors, autograd, and the module system. With these fundamentals, you can build anything from simple classifiers to state-of-the-art models.
Stay tuned for the next post in the Naver Boostcamp AI Tech series, where we’ll explore more advanced PyTorch techniques!
This post is part of my learning journey at Naver Boostcamp AI Tech. Feel free to reach out if you have questions or want to discuss deep learning!