Note | PyTorch
目录
- 1. 快速入门PYTORCH
- 2. DATA LOADING AND PROCESSING TUTORIAL
- 3. LEARNING PYTORCH WITH EXAMPLES
全文参考:PyTorch官方教程
SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。1. 快速入门PYTORCH
这个教程(本章)太经典了,因此不作过多翻译。
1.1. 什么是PyTorch
It' s a Python-based scientific computing package targeted at two sets of audiences:
- A replacement for NumPy to use the power of GPUs.
- A deep learning research platform that provides maximum flexibility and speed.
1.1.1. 基础概念
- Tensors: similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.
import torch
x = torch.zeros(3, 2, dtype=torch.long) # torch.int64
print(x)
y = torch.randn_like(x, dtype=torch.double) # result has the same size but dtype is overrode!
print(y)
z = x.new_ones(3, 3) # result has the same dtype
print(z)
print(z.size()) # torch.Size is in fact a tuple, so it supports all tuple operations.
tensor([[0, 0],
[0, 0],
[0, 0]])
tensor([[0.1171, 2.2741],
[0.8569, 0.7953],
[1.4362, 0.4094]], dtype=torch.float64)
tensor([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
torch.Size([3, 3])
- Addition:
torch.add(x, y)
,x + y
,torch.add(x, y, out=result)
,y.add_(x)
x = torch.randn(3, 2, dtype=torch.double) # float64
y = torch.randn(3, 2, dtype=torch.double)
print(x + y) # also a tensor
print(torch.add(x, y))
result = torch.randn_like(x)
torch.add(x, y, out=result)
print(result)
y.add_(x) # adds x to y
print(y)
tensor([[ 0.2623, 0.3829],
[ 2.5567, 1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623, 0.3829],
[ 2.5567, 1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623, 0.3829],
[ 2.5567, 1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623, 0.3829],
[ 2.5567, 1.3920],
[ 1.3003, -0.5964]], dtype=torch.float64)
- Indexing: We can use standard Numpy-like indexing!!!
print(y[:,1])
tensor([ 0.3829, 1.3920, -0.5964], dtype=torch.float64)
- Resizing:
torch.Tensor.view
x = torch.randn(2, 3)
print(x)
print(x.view(-1, 6)) # the size -1 is inferred from other dimensions
tensor([[-1.2632, -0.2648, -1.0473],
[ 1.8173, 0.0445, -1.4210]])
tensor([[-1.2632, -0.2648, -1.0473, 1.8173, 0.0445, -1.4210]])
- Get item: If you have a one element tensor, use
.item()
to get the value as a Python number.
x = torch.randn(1)
print(x)
print(x.item())
tensor([0.8341])
0.834109365940094
1.1.2. 和NumPy合作
Convert a Torch Tensor to a NumPy array and vice versa.
Note:
- The Torch Tensor and NumPy array will share their underlying memory locations.
- All the Tensors on a CPU except a CharTensor support converting to NumPy and back.
- Torch Tensor -> NumPy Array
a = torch.randn(1)
print(a)
b = a.numpy()
print(b)
a.add_(1)
print(a)
print(b)
tensor([1.5351])
[1.5350896]
tensor([2.5351])
[2.5350895]
- NumPy Array -> Torch Tensor
import numpy as np
a = np.random.randn(1)
print(a)
b = torch.from_numpy(a)
a += 1
print(a)
print(b)
[-0.51711662]
[0.48288338]
tensor([0.4829], dtype=torch.float64)
- CUDA Tensors
x = torch.randn(1)
print(x)
device = torch.device("cuda:0") # a CUDA device object
x = x.to(device) # move it to GPU
print(x)
y = torch.randn_like(x, device=device) # directly create a tensor on GPU
print(y)
z = x + y
print(z)
print(z.to('cpu', torch.int32)) # move to CPU, and change its dtype together.
tensor([0.8053])
tensor([0.8053], device='cuda:0')
tensor([-1.4201], device='cuda:0')
tensor([-0.6148], device='cuda:0')
tensor([0], dtype=torch.int32)
1.2. Autograd: Automatic Differentiation
The autograd
package provides automatic differentiation for all operations on Tensors.
1.2.1. Tensor
If you set torch.Tensor
's attribute .requires_grad
as True
(default is False
), it starts to track all operations on it.
When you finish your computation, you can call .backward()
and have all the gradients computed automatically.
The gradient for this tensor will be accumulated into .grad
attribute.
注意,Tensor
默认是关闭追踪的。
To stop a tensor from tracking history, you can call .detach()
to detach it from the computation history.
You can also wrap the code block in with torch.no_grad():
.
It is particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True
. This may help saving memory.
注意,在测试阶段或手动更迭参数时,追踪需要屏蔽,因为这些过程与前向传播过程无关。
Each tensor has a .grad_fn
attribute that references a Function
that has created the Tensor
(except for Tensors created by the user - their grad_fn
is None
).
If you want to compute the derivatives, you can call .backward()
on a Tensor
. If Tensor
is not a scalar, you need to specify a gradient
argument that is a tensor of matching shape to backward()
.
解释一下:
如果结果是标量,那么可以直接调用backward
,实际上是.backward(torch.tensor(1.))
。
1.0
实际上是\(\frac{\partial{Loss}}{\partial{Loss}}=1.0\)。
如果结果高维,那么就存在多个导数项与参数对应。
import torch
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
print(y)
z = (y * y * 3).mean()
print(z)
tensor([[3., 3.],
[3., 3.]], grad_fn=<AddBackward0>)
tensor(27., grad_fn=<MeanBackward1>)
1.2.2. Gradients
torch.autograd
is an engine for computing vector-Jacobian product. That is, given any vector \(v = (v_1 v_2 \cdots v_m)^T\), compute:
\[ J^T \cdot v = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_m} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} v_1 \\ \vdots \\ v_m \\ \end{matrix} \right] \]
Why should we do that?
Because we usually compute a loss value \(l\) at the end. Let's suppose \(v\) to be the scalar function: \(l = g(\vec{y})\), then we have:
\[ v = (\frac{\partial{l}}{\partial{y_1}} \cdots \frac{\partial{l}}{\partial{y_m}})^T \]
then we have:
\[ J^T \cdot v = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_m} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} \frac{\partial{l}}{\partial{y_1}} \\ \vdots \\ \frac{\partial{l}}{\partial{y_m}} \\ \end{matrix} \right] = \left[ \begin{matrix} \frac{\partial{l}}{\partial{x_1}} \\ \vdots \\ \frac{\partial{l}}{\partial{x_m}} \\ \end{matrix} \right] \]
To better feed external gradients into a model that has non-scalar output, PyTorch provides vector-Jacobian product by autograd
.
z.backward() # Because `z` contains a single scalar, it's equivalent to `z.backward(torch.tensor(1.))`
print(x.grad) # \partial{z}/\partial{x_i} = 1.5(x+2) = 4.5
tensor([[4.5000, 4.5000],
[4.5000, 4.5000]])
If output is not a scalar, a vector \(v\) is needed:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
print(z)
v = torch.tensor([[0.1,1],[10,100]],dtype=torch.float32) # shape matching!
z.backward(v)
print(x.grad)
tensor([[27., 27.],
[27., 27.]], grad_fn=<MulBackward0>)
tensor([[ 1.8000, 18.0000],
[ 180.0000, 1800.0000]])
We can stop tracking history:
x = torch.ones(2, 2, requires_grad=True)
y = x + 2
print(x.requires_grad)
with torch.no_grad():
z = y * y * 3
print(z.requires_grad)
True
False
1.3. Neural Networks
Neural networks can be constructed using the torch.nn
package.
1.3.1. Defind the network
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module): # `nn.Module` contains layers
def __init__(self):
super(Net, self).__init__() # allows you to call methods of the superclass `nn.Module` in your subclass `Net`.
self.conv1 = nn.Conv2d(1, 6, 5) # 1 input channel, 6 output channel, 5x5 kernel
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x): # method `forward(input)` that returns the output.
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = F.relu(self.fc1(x.view(-1, self.num_flat_features(x))))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
Net(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined using autograd.
The learnable parameters of a model are returned by net.parameters()
:
params = list(net.parameters())
print(len(params))
10
1.3.2. Process inputs and call backward
Let's try input random \(32 \times 32\) image:
input = torch.randn(1,1,32,32)
out = net(input)
print(out)
tensor([[ 0.1177, 0.0199, -0.0774, 0.0580, 0.0407, 0.0384, 0.0380, -0.1090,
0.0345, -0.0498]], grad_fn=<AddmmBackward>)
We can even zero the gradient buffers of all parameters and backprops with random gradients:
net.zero_grad()
out.backward(torch.randn(1, 10))
Note: torch.nn
only supports mini-batches, not a single sample. For example, nn.Conv2d
will take in 4D Tensor os nSamples x nChannels x Height x Width
.
You can use input.unsqueeze(0)
to add a fake batch dimension for a single sample.
1.3.3. Loss function
There are several different loss functions under the nn
package, e.g. nn.MSELoss
:
output = net(input)
target = torch.randn(10)
target = target.view(1, -1)
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
tensor(0.7902, grad_fn=<MseLossBackward>)
Now, if we follow loss
in the backward direction using its .grad_fn
attribute, we can see a graph of computations:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
<MseLossBackward object at 0x000001C6AB532780>
<AddmmBackward object at 0x000001C6AB9CD6D8>
<AccumulateGrad object at 0x000001C6AB41A4A8>
1.3.4. Backprop
To backpropagate the error all we have to do is to loss.backward()
.
You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.
注意:当逐个batch计算时,每一个batch都需要清空一次梯度。否则,梯度不会被替换,而是会累积。
例如,每迭代2个batch再清空梯度 -> 反向传播求梯度 -> 更新参数,效果类似于扩大batch容量为2倍,但内存节约了。
net.zero_grad() # zeroes the gradient buffers of all parameters
print(net.conv1.bias.grad) # gradients before backprop
loss.backward()
print(net.conv1.bias.grad) # gradients after backprop
tensor([0., 0., 0., 0., 0., 0.])
tensor([-0.0074, -0.0043, 0.0082, 0.0022, -0.0055, -0.0047])
1.3.5. Update the weights
The simple implementation is:
learning_rate = 0.1
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
However, there are various different update rules such as SGD, Adam, RMSProp, etc.
To enable this, we can use torch.optim
package:
import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() # update
1.4. 举例:Training a Classifier
1.4.1. Load data
Specifically for vision, we can use torchvision
that has data loaders for common datasets such as imagenet, CIFAR10, MNIST, etc. and data tranformers for images, viz., torchvision.datasets
and torch.utils.data.DataLoader
.
For this tutorial, we will use the CIFAR10 dataset. It has the classes: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. The images in CIFAR10 are of size 3x32x32.
1.4.2. Training an image classifier
Load CIFAR10 and normalize its range from [0,1] to [-1,1]:
import torch
import torchvision
import torchvision.transforms as transforms
# Compose several transforms together: to tensor, normalize each channnel (totally 3) with mean 0.5 and std 0.5 (supposed to be).
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root=".\data", train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root=".\data", train=False, download=True, transform=transform)
# shuffle: set to `True` to have the data reshuffled at every epoch (default: `False`).
# num_workers: how many subprocesses to use for data loading. `0` means that the data will be loaded in the main process. (default: `0`)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to .\data\cifar-10-python.tar.gz
100.0%
Files already downloaded and verified
Show some training images:
import matplotlib.pyplot as plt
import numpy as np
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy() # Tensor -> numpy array
plt.imshow(np.transpose(npimg, (1, 2, 0))) # channel x height x width -> height x width x channel
plt.show()
dataiter = iter(trainloader)
images, labels = dataiter.next()
imshow(images[0])
print(labels[0],classes[labels[0]])
tensor(9) truck
Let's define a CNN:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
We can move it to GPU:
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
net.to(device)
Net(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
(pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=400, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
Define a loss function and optimizer:
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Training:
for epoch in range(3):
sum_loss = 0.0
max_show = 3000
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# send to GPU
inputs, labels = inputs.to(device), labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
sum_loss += loss.item()
if (i+1) % max_show == 0: # print every 3000 mini-batches
print('[%d, %5d] loss: %.3f' %
((epoch+1), (i+1), (sum_loss/max_show)))
sum_loss = 0.0
print('Finished Training')
[1, 3000] loss: 0.774
[1, 6000] loss: 0.807
[1, 9000] loss: 0.832
[1, 12000] loss: 0.844
[2, 3000] loss: 0.722
[2, 6000] loss: 0.791
[2, 9000] loss: 0.804
[2, 12000] loss: 0.820
[3, 3000] loss: 0.711
[3, 6000] loss: 0.761
[3, 9000] loss: 0.776
[3, 12000] loss: 0.786
Finished Training
Test our trained model on test data:
sum_correct = 0
sum_test = 0
with torch.no_grad():
for data in testloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = net(images) # 4x10
_, predicted = torch.max(outputs.data, 1) # (max_value, index)
sum_correct += (predicted==labels).sum().item()
sum_test += labels.size(0)
print("Accuracy on 10000 test images: %.3f %%" % (100*sum_correct/sum_test))
Accuracy on 10000 test images: 63.140 %
1.5. Data Parallelism
We will learn how to use multiple GPUs using DataParallel
.
DataParallel
splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel
collects and merges the results before returning it to you.
Please note that: just calling my_tensor.to(device)
returns a new copy of my_tensor
on GPU instead of rewriting my_tensor. You need to assign it to a new tensor and use that tensor on the GPU.
It is easy to make your model run parallelly using DataParallel
:
model = nn.DataParallel(model)
Let's see an example.
### Imports and parameters
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
input_size = 5
output_size = 2
batch_size = 30
data_size = 100
device = torch.device("cuda:0")
### Dummy dataset
class RandomDataset(Dataset):
def __init__(self, size, length):
self.len = length
self.data = torch.randn(length, size)
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return self.len
### Simple model
class Model(nn.Module):
def __init__(self, input_size, output_size):
super(Model, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, input):
output = self.fc(input)
print("\tInside the model: input size",input.size(),"output size",output.size())
return output
### Create model and dataparallel
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
print(torch.cuda.device_count(),"GPUs are found!")
model = nn.DataParallel(model)
model.to(device)
### Run the model
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
batch_size=batch_size, shuffle=True)
for data in rand_loader:
input = data.to(device)
output = model(input)
print("Total: Input size",input.size(),"output size",output.size())
2 GPUs are found!
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
Inside the model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Inside the model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Total: Input size torch.Size([10, 5]) output size torch.Size([10, 2])
2. DATA LOADING AND PROCESSING TUTORIAL
PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable.
For this tutorial, we should install two packages:
scikit-image
: Image io and transformspandas
: Easier csv parsing
We have prepared a pose estimation database in ./data/faces
. There are some human faces and their landmark points stored in .csv
.
Let's read the CSV and get the annotations in an (N,2) array:
import pandas as pd
from skimage import io
import matplotlib.pyplot as plt
landmarks_list = pd.read_csv('data/faces/face_landmarks.csv')
'''
image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y
0805personali01.jpg,27,83,27,98, ... 84,134
1084239450_e76e00b7e7.jpg,70,236,71,257, ... ,128,312
'''
n = 50
img_name = landmarks_list.iloc[n, 0]
landmarks = landmarks_list.iloc[n, 1:].values.astype('float').reshape(-1,2) # pandas dict -> values
def show_landmarks(image, landmarks):
'Show image with landmarks.'
plt.imshow(image)
plt.scatter(landmarks[:,0],landmarks[:,1], s=10, marker=".", c="r")
plt.pause(0.001) # pause a bit so that plots are updated
plt.figure()
img_path = "./data/faces/"+img_name
show_landmarks(io.imread(img_path), landmarks)
plt.show()
2.1. Dataset Class
torch.utils.data.Dataset
is an abstract class representing a dataset. Our custom dataset should inherit Dataset
and override the following methods:
__len__
: so thatlen(dataset)
returns the size of the dataset.__getitem__
: so thatdataset[i]
can used for indexing.
Demo:
from torch.utils.data import Dataset
import os
class FaceLandmarksDataset(Dataset):
'Face landmarks dataset.'
def __init__(self, CsvFile_path, dir_img, transform=None):
self.landmarks_list = pd.read_csv(CsvFile_path)
self.dir_img = dir_img
self.transform = transform
def __len__(self):
return len(self.landmarks_list)
def __getitem__(self, idx):
img_path = os.path.join(self.dir_img,
self.landmarks_list.iloc[idx, 0])
image = io.imread(img_path)
landmarks = self.landmarks_list.iloc[idx, 1:].values.astype("float").reshape(-1,2)
sample = {'image':image, 'landmarks': landmarks}
if self.transform:
sample = self.transform(sample)
return sample
### Instantiate this class and show four images.
face_landmarks = FaceLandmarksDataset(CsvFile_path='./data/faces/face_landmarks.csv', dir_img='./data/faces/')
fig = plt.figure()
for i in range(len(face_landmarks)):
sample = face_landmarks[i]
print(i, sample['image'].shape, sample['landmarks'].shape)
ax = plt.subplot(1,4,i+1)
ax.set_title('Sample #{}'.format(i))
ax.axis('off')
ax.imshow(sample['image'])
ax.scatter(sample['landmarks'][:,0],sample['landmarks'][:,1], s=10, marker=".", c="r")
#show_landmarks(**sample)
if i == 3:
plt.tight_layout()
plt.show()
break
0 (324, 215, 3) (68, 2)
1 (500, 333, 3) (68, 2)
2 (250, 258, 3) (68, 2)
3 (434, 290, 3) (68, 2)
2.2. Transforms
We want to:
- randomly crop samples.
- rescale images.
- convert the numpy images to torch images (notice: swap axes).
We also want to write them as callable classes instead of simple functions:
from skimage import transform
import numpy as np
class Rescale():
'''
Rescale the image in a sample to a given size.
Args:
output_size (tuple or int): Desired output size. If int, the smaller image edge is matched to it
and the aspect ratio remains the same.
'''
def __init__(self, output_size):
assert isinstance(output_size, (int, tuple)) # ensure that output_size is an int or a tuple.
self.output_size = output_size
def __call__(self, sample):
image, landmarks = sample['image'], sample['landmarks']
h, w = image.shape[:2]
if isinstance(self.output_size, int): # int: the length of the smaller edge
if h > w:
new_h, new_w = self.output_size * h / w, self.output_size
else:
new_h, new_w = self.output_size, self.output_size * w / h
else:
new_h, new_w = self.output_size
new_h, new_w = int(new_h), int(new_w)
image = transform.resize(image, (new_h, new_w))
landmarks = landmarks * [new_w/w, new_h/h]
return {'image':image, 'landmarks':landmarks}
class RandomCrop():
'''
Crop the image in a sample randomly.
Args:
output_size (tuple or int). If int, square crop is made.
'''
def __init__(self, output_size):
assert isinstance(output_size, (int, tuple)) # ensure that output_size is an int or a tuple.
self.output_size = output_size
def __call__(self, sample):
image, landmarks = sample['image'], sample['landmarks']
h, w = image.shape[:2]
if isinstance(self.output_size, int):
new_h, new_w = self.output_size, self.output_size
else:
new_h, new_w = self.output_size
start_h_idx = np.random.randint(0, h - new_h)
start_w_idx = np.random.randint(0, w - new_w)
image = image[start_h_idx: (start_h_idx+new_h),
start_w_idx: (start_w_idx+new_w)]
landmarks = landmarks - [start_w_idx, start_h_idx]
return {'image':image, 'landmarks':landmarks}
class ToTensor():
'''
Convert the ndarray image in a sample to a Tensor.
Notice: swap color axis because:
numpy image: H x W x C
torch image: C X H X W
'''
def __call__(self, sample):
image, landmarks = sample['image'], sample['landmarks']
image = image.transpose((2, 0, 1))
return {'image': torch.from_numpy(image),
'landmarks': torch.from_numpy(landmarks)}
We now apply our transforms on an sample:
from torchvision import transforms
scale = Rescale(256) # the length of the smaller side is 256
crop = RandomCrop(210) # crop a 128x128 img
composed_trans = transforms.Compose([scale, crop])
fig = plt.figure()
plt.tight_layout()
sample = face_landmarks[65]
transformed_sample = composed_trans(sample)
show_landmarks(**sample)
show_landmarks(**transformed_sample)
plt.show()
2.3. Iterating through the Dataset
有了数据集,我们需要不断从中获取数据,用于训练或测试。
import torch
transformed_dataset = FaceLandmarksDataset(CsvFile_path='./data/faces/face_landmarks.csv',
dir_img='./data/faces/',
transform=transforms.Compose([
Rescale(256),
RandomCrop(210),
ToTensor()
]))
for i in range(len(transformed_dataset)):
sample = transformed_dataset[i]
print(i, sample['image'].size(), sample['landmarks'].size())
if i == 4:
break
0 torch.Size([3, 210, 210]) torch.Size([68, 2])
1 torch.Size([3, 210, 210]) torch.Size([68, 2])
2 torch.Size([3, 210, 210]) torch.Size([68, 2])
3 torch.Size([3, 210, 210]) torch.Size([68, 2])
4 torch.Size([3, 210, 210]) torch.Size([68, 2])
However, we also want to:
- batch the data.
- shuffle the data.
- Load the data in parallel.
torch.utils.DataLoader
is an iterator which provides all these features.
from torch.utils.data import DataLoader
from torchvision import utils
dataloader = DataLoader(transformed_dataset, batch_size=4,
shuffle=True, num_workers=0) # Windows may error when num_workers > 0
def show_landmarks_batch(sample_batch):
'Show images with landmarks for a batch of samples.'
image_batch, landmarks_batch = sample_batch['image'], sample_batch['landmarks']
batch_size = len(image_batch)
im_size = image_batch.size(2)
grid = utils.make_grid(image_batch)
plt.imshow(grid.numpy().transpose((1,2,0))) # Tensors -> ndarrays -> CxHxW to HxWxC
for i in range(batch_size):
plt.scatter(landmarks_batch[i,:,0].numpy()+im_size*i,
landmarks_batch[i,:,1].numpy(),
s=10, marker='.', c='r')
plt.title('A batch from dataloader')
ite_batch = 3
for ite,sample_batch in enumerate(dataloader):
print(sample_batch['image'].size(),
sample_batch['landmarks'].size())
if ite == ite_batch:
plt.figure()
show_landmarks_batch(sample_batch)
plt.axis('off')
plt.show()
break
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
2.4. Torchvision
torchvision
package provides some common datasets and transforms.
We might not even have to write custom classes. One of the more generic datasets available in torchvision
is ImageFolder
. It assumes that images are organized in the following way:
root/ants/xxx.png
root/ants/xxy.jpeg
root/ants/xxz.png
.
.
.
root/bees/123.jpg
root/bees/nsdf3.png
root/bees/asd932_.png
where ants
and bees
are class labels.
Besides, generic transforms in PIL.Image
like RandomHorizontalFlip
, Scale
are also available.
import torch
from torchvision import transforms, datasets
data_transform = transforms.Compose([
transforms.RandomSizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225])
])
hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train',
transform=data_transform)
dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset,
batch_size=4,shuffle=True,num_workers=0)
3. LEARNING PYTORCH WITH EXAMPLES
我们之前提到,PyTorch的核心功能可以归结为以下二者:
- A replacement for NumPy to use the power of GPUs.
- A deep learning research platform that provides maximum flexibility and speed.
即:
- 用GPU承载张量运算。
- 提供深度学习所需的其他功能。
我们可以阐释得更清楚:PyTorch provides:
- An n-dimensional Tensor, similar to numpy but can run on GPUs.
- Automatic differentiation for building and training neural networks.
原因如下:
- GPU能提供50倍甚至更多的运算加速。
- 现今深度学习方法仍然离不开BP方法,因此差分法求梯度是不可或缺的。其中自动差分技术是被广泛使用的。
3.1. 基本概念:Tensors and Autograd
Tensor
在概念上和NumPy的array
本质上是一致的,但Tensor
的功能更全面:
Tensor
携带着运算图(computational graph)和梯度信息,并且可以保持追踪状态;运算图上的节点就是Tensor
,边缘(edges)是函数(functions)。Tensor
可以使用GPU完成数值计算。
我们来看一个两层全连接网络的例子:
import torch
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-6
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10
### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)
### Initialize weight Tensors randomly
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
### Iterations
for ite in range(1,total_ite+1):
y_pred = x.mm(w1).clamp(min=0).mm(w2) # clamp acts as relu function
loss = (y_pred -y ).pow(2).sum()
if ite % 100 == 0:
print(ite, loss.item())
loss.backward()
# Manually update weights
# Weights have requires_grad=True, but we don't need tracking.
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Maunally zero the gradients after updating weights
w1.grad.zero_()
w2.grad.zero_()
100 616.6424560546875
200 5.097920894622803
300 0.06861867755651474
400 0.0014389019925147295
500 0.0001344898482784629
TensorFlow和PyTorch最大的不同是:
- TensorFlow的运算图(computational graphs)是静态的(static):当定义好后,我们可以多次使用相同的运算图,只有输入数据可以不同。
- PyTorch的运算图是动态的(dynamic):每次前向传递(forward pass)时,运算图可以是全新的。
静态运算图可以进一步优化,因此效率比较高;但在一些场合比如反馈网络(recurrent network),更新动态运算图会更加简单。
我们在下下节会给一个例子。
还有一点不同:在TensorFlow中,参数更新是包含在运算图内的,而PyTorch反之。因此在PyTorch中我们应该停止梯度追逐。
3.2. 简化操作:nn
Module
显然,上面的手动前向传导和参数迭代繁琐的。特别当网络复杂庞大时,参数是很难显式列举的。
PyTorch提供了一些模块来解决这些问题。
首先是简化网络定义。简单来说,nn
包含了许多神经网络常用组件以及一些常用的损失函数。
定义好网络后,其中的参数会自动纳入学习参数(learnable parameters)列表内。
回到之前的例子:
import torch
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10
### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)
### Define network model by nn package
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out)
)
model = model.to(device)
### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')
for ite in range(1, total_ite+1):
y_pred = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item())
model.zero_grad()
loss.backward()
# Manually update weights
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
100 2.5298514366149902
200 0.04136687144637108
300 0.0011623052414506674
400 4.448959225555882e-05
500 2.1180185285629705e-06
其次是简化优化步骤。PyTorch提供了optim
包,可以支持更加复杂的优化方法。
import torch
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10
### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)
### Define network model by nn package
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out)
)
model = model.to(device)
### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')
### Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for ite in range(1, total_ite+1):
y_pred = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item())
optimizer.zero_grad()
loss.backward()
# Update parameters
optimizer.step()
100 65.00260925292969
200 1.0924508571624756
300 0.006899723317474127
400 5.2772647904930636e-05
500 1.6419755866081687e-07
nn
包中提供的网络组件是很基本的。如果我们的网络很复杂,那么我们还可以自定义复杂网络:
import torch
class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
super(TwoLayerNet, self).__init__()
self.linear1 = torch.nn.Linear(D_in, H)
self.linear2 = torch.nn.Linear(H, D_out)
def forward(self, x):
h_relu = self.linear1(x).clamp(min=0)
y_pred = self.linear2(h_relu)
return y_pred
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10
### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)
### Define network model by nn package
model = TwoLayerNet(D_in, H, D_out)
model = model.to(device)
### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')
### Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for ite in range(1, total_ite+1):
y_pred = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
100 71.93133544921875
200 1.759734869003296
300 0.012220818549394608
400 0.0002741872740443796
500 1.9429817257332616e-05
3.3. 动态优势:Control Flow + Weight Sharing of PyTorch
在这一节,我们要举例介绍PyTorch的动态图优势。
首先,我们要搭建一个全连接网络。该网络的特点是:
- 每次前向传播时,隐藏层数目是随机的,可能是1,2,3或4;
- 隐藏层的参数是共享的。
import torch
import random
class DynamicNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
super(DynamicNet, self).__init__()
self.input_linear = torch.nn.Linear(D_in, H)
self.middle_linear = torch.nn.Linear(H, H)
self.output_linear = torch.nn.Linear(H, D_out)
def forward(self, x):
h_relu = self.input_linear(x).clamp(min=0)
rand_num = random.randint(0, 3)
for _ in range(rand_num): # 1 layer, 2 layers, 3 layers or 4 layers
h_relu = self.middle_linear(h_relu).clamp(min=0)
y_pred = self.output_linear(h_relu)
return y_pred, rand_num
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4
momentum = 0.9
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10
### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)
### Define network model by nn package
model = DynamicNet(D_in, H, D_out)
model = model.to(device)
### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')
### Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)
for ite in range(1, total_ite+1):
y_pred, rand_num = model(x)
loss = loss_fn(y_pred, y)
if ite % 100 == 0:
print(ite, loss.item(), rand_num)
optimizer.zero_grad()
loss.backward()
optimizer.step()
100 13.738285064697266 0
200 4.243963718414307 2
300 0.7942993640899658 1
400 0.43234169483184814 3
500 0.42137715220451355 2
