目录

全文参考:PyTorch官方教程

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

1. 快速入门PYTORCH

这个教程(本章)太经典了,因此不作过多翻译。

1.1. 什么是PyTorch

It' s a Python-based scientific computing package targeted at two sets of audiences:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

1.1.1. 基础概念

  • Tensors: similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.
import torch

x = torch.zeros(3, 2, dtype=torch.long) # torch.int64
print(x)

y = torch.randn_like(x, dtype=torch.double) # result has the same size but dtype is overrode!
print(y)

z = x.new_ones(3, 3) # result has the same dtype
print(z)

print(z.size()) # torch.Size is in fact a tuple, so it supports all tuple operations.
tensor([[0, 0],
        [0, 0],
        [0, 0]])
tensor([[0.1171, 2.2741],
        [0.8569, 0.7953],
        [1.4362, 0.4094]], dtype=torch.float64)
tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])
torch.Size([3, 3])
  • Addition: torch.add(x, y)x + ytorch.add(x, y, out=result)y.add_(x)
x = torch.randn(3, 2, dtype=torch.double) # float64
y = torch.randn(3, 2, dtype=torch.double)

print(x + y) # also a tensor

print(torch.add(x, y))

result = torch.randn_like(x)
torch.add(x, y, out=result)
print(result)

y.add_(x) # adds x to y
print(y)
tensor([[ 0.2623,  0.3829],
        [ 2.5567,  1.3920],
        [ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623,  0.3829],
        [ 2.5567,  1.3920],
        [ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623,  0.3829],
        [ 2.5567,  1.3920],
        [ 1.3003, -0.5964]], dtype=torch.float64)
tensor([[ 0.2623,  0.3829],
        [ 2.5567,  1.3920],
        [ 1.3003, -0.5964]], dtype=torch.float64)
  • Indexing: We can use standard Numpy-like indexing!!!
print(y[:,1])
tensor([ 0.3829,  1.3920, -0.5964], dtype=torch.float64)
  • Resizing: torch.Tensor.view
x = torch.randn(2, 3)
print(x)
print(x.view(-1, 6)) # the size -1 is inferred from other dimensions
tensor([[-1.2632, -0.2648, -1.0473],
        [ 1.8173,  0.0445, -1.4210]])
tensor([[-1.2632, -0.2648, -1.0473,  1.8173,  0.0445, -1.4210]])
  • Get item: If you have a one element tensor, use .item() to get the value as a Python number.
x = torch.randn(1)
print(x)
print(x.item())
tensor([0.8341])
0.834109365940094

1.1.2. 和NumPy合作

Convert a Torch Tensor to a NumPy array and vice versa.

Note:

  1. The Torch Tensor and NumPy array will share their underlying memory locations.
  2. All the Tensors on a CPU except a CharTensor support converting to NumPy and back.
  • Torch Tensor -> NumPy Array
a = torch.randn(1)
print(a)

b = a.numpy()
print(b)

a.add_(1)
print(a)
print(b)
tensor([1.5351])
[1.5350896]
tensor([2.5351])
[2.5350895]
  • NumPy Array -> Torch Tensor
import numpy as np
a = np.random.randn(1)
print(a)

b = torch.from_numpy(a)

a += 1
print(a)
print(b)
[-0.51711662]
[0.48288338]
tensor([0.4829], dtype=torch.float64)
  • CUDA Tensors
x = torch.randn(1)
print(x)

device = torch.device("cuda:0") # a CUDA device object
x = x.to(device) # move it to GPU
print(x)

y = torch.randn_like(x, device=device) # directly create a tensor on GPU
print(y)

z = x + y

print(z)
print(z.to('cpu', torch.int32)) # move to CPU, and change its dtype together.
tensor([0.8053])
tensor([0.8053], device='cuda:0')
tensor([-1.4201], device='cuda:0')
tensor([-0.6148], device='cuda:0')
tensor([0], dtype=torch.int32)

1.2. Autograd: Automatic Differentiation

The autograd package provides automatic differentiation for all operations on Tensors.

1.2.1. Tensor

If you set torch.Tensor's attribute .requires_grad as True (default is False), it starts to track all operations on it.
When you finish your computation, you can call .backward() and have all the gradients computed automatically.
The gradient for this tensor will be accumulated into .grad attribute.

注意,Tensor默认是关闭追踪的

To stop a tensor from tracking history, you can call .detach() to detach it from the computation history.
You can also wrap the code block in with torch.no_grad():.
It is particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True. This may help saving memory.

注意,在测试阶段或手动更迭参数时,追踪需要屏蔽,因为这些过程与前向传播过程无关。

Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).

If you want to compute the derivatives, you can call .backward() on a Tensor. If Tensor is not a scalar, you need to specify a gradient argument that is a tensor of matching shape to backward().

解释一下:
如果结果是标量,那么可以直接调用backward实际上是.backward(torch.tensor(1.))
1.0实际上是\(\frac{\partial{Loss}}{\partial{Loss}}=1.0\)
如果结果高维,那么就存在多个导数项与参数对应。

import torch

x = torch.ones(2, 2, requires_grad=True)

y = x + 2
print(y)

z = (y * y * 3).mean()
print(z)
tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
tensor(27., grad_fn=<MeanBackward1>)

1.2.2. Gradients

torch.autograd is an engine for computing vector-Jacobian product. That is, given any vector \(v = (v_1 v_2 \cdots v_m)^T\), compute:
\[ J^T \cdot v = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_m} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} v_1 \\ \vdots \\ v_m \\ \end{matrix} \right] \]

Why should we do that?
Because we usually compute a loss value \(l\) at the end. Let's suppose \(v\) to be the scalar function: \(l = g(\vec{y})\), then we have:
\[ v = (\frac{\partial{l}}{\partial{y_1}} \cdots \frac{\partial{l}}{\partial{y_m}})^T \]
then we have:
\[ J^T \cdot v = \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial y_m}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_m} \\ \end{matrix} \right] \cdot \left[ \begin{matrix} \frac{\partial{l}}{\partial{y_1}} \\ \vdots \\ \frac{\partial{l}}{\partial{y_m}} \\ \end{matrix} \right] = \left[ \begin{matrix} \frac{\partial{l}}{\partial{x_1}} \\ \vdots \\ \frac{\partial{l}}{\partial{x_m}} \\ \end{matrix} \right] \]

To better feed external gradients into a model that has non-scalar output, PyTorch provides vector-Jacobian product by autograd.

z.backward() # Because `z` contains a single scalar, it's equivalent to `z.backward(torch.tensor(1.))`

print(x.grad) # \partial{z}/\partial{x_i} = 1.5(x+2) = 4.5
tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

If output is not a scalar, a vector \(v\) is needed:

x = torch.ones(2, 2, requires_grad=True)
y = x + 2
z = y * y * 3
print(z)

v = torch.tensor([[0.1,1],[10,100]],dtype=torch.float32) # shape matching!
z.backward(v)

print(x.grad)
tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)
tensor([[   1.8000,   18.0000],
        [ 180.0000, 1800.0000]])

We can stop tracking history:

x = torch.ones(2, 2, requires_grad=True)

y = x + 2
print(x.requires_grad)

with torch.no_grad():
    z = y * y * 3
print(z.requires_grad)
True
False

1.3. Neural Networks

Neural networks can be constructed using the torch.nn package.

1.3.1. Defind the network

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module): # `nn.Module` contains layers
    
    def __init__(self):
        super(Net, self).__init__() # allows you to call methods of the superclass `nn.Module` in your subclass `Net`.
        
        self.conv1 = nn.Conv2d(1, 6, 5) # 1 input channel, 6 output channel, 5x5 kernel
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x): # method `forward(input)` that returns the output.
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        
        x = F.relu(self.fc1(x.view(-1, self.num_flat_features(x))))
        x = F.relu(self.fc2(x))
        
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
    
net = Net()
print(net)
Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined using autograd.

The learnable parameters of a model are returned by net.parameters():

params = list(net.parameters())
print(len(params))
10

1.3.2. Process inputs and call backward

Let's try input random \(32 \times 32\) image:

input = torch.randn(1,1,32,32)
out = net(input)
print(out)
tensor([[ 0.1177,  0.0199, -0.0774,  0.0580,  0.0407,  0.0384,  0.0380, -0.1090,
          0.0345, -0.0498]], grad_fn=<AddmmBackward>)

We can even zero the gradient buffers of all parameters and backprops with random gradients:

net.zero_grad()
out.backward(torch.randn(1, 10))

Note: torch.nn only supports mini-batches, not a single sample. For example, nn.Conv2d will take in 4D Tensor os nSamples x nChannels x Height x Width.
You can use input.unsqueeze(0) to add a fake batch dimension for a single sample.

1.3.3. Loss function

There are several different loss functions under the nn package, e.g. nn.MSELoss:

output = net(input)
target = torch.randn(10)
target = target.view(1, -1)
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
tensor(0.7902, grad_fn=<MseLossBackward>)

Now, if we follow loss in the backward direction using its .grad_fn attribute, we can see a graph of computations:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss

print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
<MseLossBackward object at 0x000001C6AB532780>
<AddmmBackward object at 0x000001C6AB9CD6D8>
<AccumulateGrad object at 0x000001C6AB41A4A8>

1.3.4. Backprop

To backpropagate the error all we have to do is to loss.backward().
You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

注意:当逐个batch计算时,每一个batch都需要清空一次梯度。否则,梯度不会被替换,而是会累积。

例如,每迭代2个batch再清空梯度 -> 反向传播求梯度 -> 更新参数,效果类似于扩大batch容量为2倍,但内存节约了。

net.zero_grad() # zeroes the gradient buffers of all parameters

print(net.conv1.bias.grad) # gradients before backprop

loss.backward()

print(net.conv1.bias.grad) # gradients after backprop
tensor([0., 0., 0., 0., 0., 0.])
tensor([-0.0074, -0.0043,  0.0082,  0.0022, -0.0055, -0.0047])

1.3.5. Update the weights

The simple implementation is:

learning_rate = 0.1
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, there are various different update rules such as SGD, Adam, RMSProp, etc.
To enable this, we can use torch.optim package:

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()

optimizer.step() # update

1.4. 举例:Training a Classifier

1.4.1. Load data

Specifically for vision, we can use torchvision that has data loaders for common datasets such as imagenet, CIFAR10, MNIST, etc. and data tranformers for images, viz., torchvision.datasets and torch.utils.data.DataLoader.

For this tutorial, we will use the CIFAR10 dataset. It has the classes: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'. The images in CIFAR10 are of size 3x32x32.

1.4.2. Training an image classifier

Load CIFAR10 and normalize its range from [0,1] to [-1,1]:

import torch
import torchvision
import torchvision.transforms as transforms

# Compose several transforms together: to tensor, normalize each channnel (totally 3) with mean 0.5 and std 0.5 (supposed to be).
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root=".\data", train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root=".\data", train=False, download=True, transform=transform)

# shuffle: set to `True` to have the data reshuffled at every epoch (default: `False`).
# num_workers: how many subprocesses to use for data loading. `0` means that the data will be loaded in the main process. (default: `0`)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to .\data\cifar-10-python.tar.gz


100.0%

Files already downloaded and verified

Show some training images:

import matplotlib.pyplot as plt
import numpy as np

def imshow(img):
    img = img / 2 + 0.5 # unnormalize
    npimg = img.numpy() # Tensor -> numpy array
    
    plt.imshow(np.transpose(npimg, (1, 2, 0))) # channel x height x width -> height x width x channel
    plt.show()
    
dataiter = iter(trainloader)
images, labels = dataiter.next()

imshow(images[0])
print(labels[0],classes[labels[0]])

Note | PyTorch 随笔 第1张

tensor(9) truck

Let's define a CNN:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

We can move it to GPU:

device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
net.to(device)
Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Define a loss function and optimizer:

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Training:

for epoch in range(3):

    sum_loss = 0.0
    max_show = 3000
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        
        # send to GPU
        inputs, labels = inputs.to(device), labels.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        sum_loss += loss.item()
        if (i+1) % max_show == 0:    # print every 3000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  ((epoch+1), (i+1), (sum_loss/max_show)))
            sum_loss = 0.0

print('Finished Training')
[1,  3000] loss: 0.774
[1,  6000] loss: 0.807
[1,  9000] loss: 0.832
[1, 12000] loss: 0.844
[2,  3000] loss: 0.722
[2,  6000] loss: 0.791
[2,  9000] loss: 0.804
[2, 12000] loss: 0.820
[3,  3000] loss: 0.711
[3,  6000] loss: 0.761
[3,  9000] loss: 0.776
[3, 12000] loss: 0.786
Finished Training

Test our trained model on test data:

sum_correct = 0
sum_test = 0

with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)

        outputs = net(images) # 4x10
        _, predicted = torch.max(outputs.data, 1) # (max_value, index)

        sum_correct += (predicted==labels).sum().item()
        sum_test += labels.size(0)
    
    print("Accuracy on 10000 test images: %.3f %%" % (100*sum_correct/sum_test))
Accuracy on 10000 test images: 63.140 %

1.5. Data Parallelism

We will learn how to use multiple GPUs using DataParallel.
DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.

Please note that: just calling my_tensor.to(device) returns a new copy of my_tensor on GPU instead of rewriting my_tensor. You need to assign it to a new tensor and use that tensor on the GPU.

It is easy to make your model run parallelly using DataParallel:

model = nn.DataParallel(model)

Let's see an example.

### Imports and parameters
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

input_size = 5
output_size = 2

batch_size = 30
data_size = 100

device = torch.device("cuda:0")

### Dummy dataset
class RandomDataset(Dataset):
    
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)
        
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return self.len

### Simple model
class Model(nn.Module):
    
    def __init__(self, input_size, output_size):
        super(Model, self).__init__()
        self.fc = nn.Linear(input_size, output_size)
        
    def forward(self, input):
        output = self.fc(input)
        print("\tInside the model: input size",input.size(),"output size",output.size())
        
        return output
    
### Create model and dataparallel
model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
    print(torch.cuda.device_count(),"GPUs are found!")
model = nn.DataParallel(model)
model.to(device)

### Run the model
rand_loader = DataLoader(dataset=RandomDataset(input_size, data_size),
                        batch_size=batch_size, shuffle=True)
for data in rand_loader:
    input = data.to(device)
    output = model(input)
    print("Total: Input size",input.size(),"output size",output.size())
2 GPUs are found!
    Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
    Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
    Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
    Inside the model: input size torch.Size([15, 5]) output size torch.Size([15, 2])
Total: Input size torch.Size([30, 5]) output size torch.Size([30, 2])
    Inside the model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
    Inside the model: input size torch.Size([5, 5]) output size torch.Size([5, 2])
Total: Input size torch.Size([10, 5]) output size torch.Size([10, 2])

2. DATA LOADING AND PROCESSING TUTORIAL

PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable.

For this tutorial, we should install two packages:

  • scikit-image: Image io and transforms
  • pandas: Easier csv parsing

We have prepared a pose estimation database in ./data/faces. There are some human faces and their landmark points stored in .csv.
Let's read the CSV and get the annotations in an (N,2) array:

import pandas as pd
from skimage import io
import matplotlib.pyplot as plt

landmarks_list = pd.read_csv('data/faces/face_landmarks.csv')
'''
image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_y
0805personali01.jpg,27,83,27,98, ... 84,134
1084239450_e76e00b7e7.jpg,70,236,71,257, ... ,128,312
'''

n = 50
img_name = landmarks_list.iloc[n, 0]
landmarks = landmarks_list.iloc[n, 1:].values.astype('float').reshape(-1,2) # pandas dict -> values

def show_landmarks(image, landmarks):
    'Show image with landmarks.'
    plt.imshow(image)
    plt.scatter(landmarks[:,0],landmarks[:,1], s=10, marker=".", c="r")
    plt.pause(0.001) # pause a bit so that plots are updated
    
plt.figure()
img_path = "./data/faces/"+img_name
show_landmarks(io.imread(img_path), landmarks)
plt.show()

Note | PyTorch 随笔 第2张

2.1. Dataset Class

torch.utils.data.Dataset is an abstract class representing a dataset. Our custom dataset should inherit Dataset and override the following methods:

  • __len__: so that len(dataset) returns the size of the dataset.
  • __getitem__: so that dataset[i] can used for indexing.

Demo:

from torch.utils.data import Dataset
import os

class FaceLandmarksDataset(Dataset):
    'Face landmarks dataset.'
    def __init__(self, CsvFile_path, dir_img, transform=None):
        self.landmarks_list = pd.read_csv(CsvFile_path)
        self.dir_img = dir_img
        self.transform = transform
    
    def __len__(self):
        return len(self.landmarks_list)
    
    def __getitem__(self, idx):
        img_path = os.path.join(self.dir_img,
                                self.landmarks_list.iloc[idx, 0])
        image = io.imread(img_path)
        landmarks = self.landmarks_list.iloc[idx, 1:].values.astype("float").reshape(-1,2)
        sample = {'image':image, 'landmarks': landmarks}
        
        if self.transform:
            sample = self.transform(sample)
            
        return sample
    
### Instantiate this class and show four images.
face_landmarks = FaceLandmarksDataset(CsvFile_path='./data/faces/face_landmarks.csv', dir_img='./data/faces/')


fig = plt.figure()
for i in range(len(face_landmarks)):
    sample = face_landmarks[i]
    print(i, sample['image'].shape, sample['landmarks'].shape)
    
    ax = plt.subplot(1,4,i+1)
    ax.set_title('Sample #{}'.format(i))
    ax.axis('off')
    
    ax.imshow(sample['image'])
    ax.scatter(sample['landmarks'][:,0],sample['landmarks'][:,1], s=10, marker=".", c="r")
    #show_landmarks(**sample)
    
    if i == 3:
        plt.tight_layout()
        plt.show()
        break
0 (324, 215, 3) (68, 2)
1 (500, 333, 3) (68, 2)
2 (250, 258, 3) (68, 2)
3 (434, 290, 3) (68, 2)

Note | PyTorch 随笔 第3张

2.2. Transforms

We want to:

  • randomly crop samples.
  • rescale images.
  • convert the numpy images to torch images (notice: swap axes).

We also want to write them as callable classes instead of simple functions:

from skimage import transform
import numpy as np

class Rescale():
    '''
    Rescale the image in a sample to a given size.
    
    Args:
        output_size (tuple or int): Desired output size. If int, the smaller image edge is matched to it 
            and the aspect ratio remains the same.
    '''
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple)) # ensure that output_size is an int or a tuple.
        self.output_size = output_size
        
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']
        
        h, w = image.shape[:2]
        if isinstance(self.output_size, int): # int: the length of the smaller edge
            if h > w:
                new_h, new_w = self.output_size * h / w, self.output_size
            else:
                new_h, new_w = self.output_size, self.output_size * w / h
        else:
            new_h, new_w = self.output_size
        new_h, new_w = int(new_h), int(new_w)
            
        image = transform.resize(image, (new_h, new_w))
        landmarks = landmarks * [new_w/w, new_h/h]
        
        return {'image':image, 'landmarks':landmarks}

class RandomCrop():
    '''
    Crop the image in a sample randomly.
    
    Args:
        output_size (tuple or int). If int, square crop is made.
    '''
    def __init__(self, output_size):
        assert isinstance(output_size, (int, tuple)) # ensure that output_size is an int or a tuple.
        self.output_size = output_size
    
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']
        
        h, w = image.shape[:2]
        if isinstance(self.output_size, int):
            new_h, new_w = self.output_size, self.output_size
        else:
            new_h, new_w = self.output_size
        
        start_h_idx = np.random.randint(0, h - new_h)
        start_w_idx = np.random.randint(0, w - new_w)
        
        image = image[start_h_idx: (start_h_idx+new_h),
                      start_w_idx: (start_w_idx+new_w)]
        landmarks = landmarks - [start_w_idx, start_h_idx]
        
        return {'image':image, 'landmarks':landmarks}
    
class ToTensor():
    '''
    Convert the ndarray image in a sample to a Tensor.
    Notice: swap color axis because:
        numpy image: H x W x C
        torch image: C X H X W
    '''
    def __call__(self, sample):
        image, landmarks = sample['image'], sample['landmarks']
        image = image.transpose((2, 0, 1))
        return {'image': torch.from_numpy(image), 
                'landmarks': torch.from_numpy(landmarks)}

We now apply our transforms on an sample:

from torchvision import transforms

scale = Rescale(256) # the length of the smaller side is 256
crop = RandomCrop(210) # crop a 128x128 img
composed_trans = transforms.Compose([scale, crop])

fig = plt.figure()
plt.tight_layout()

sample = face_landmarks[65]
transformed_sample = composed_trans(sample)

show_landmarks(**sample)
show_landmarks(**transformed_sample)

plt.show()

Note | PyTorch 随笔 第4张

Note | PyTorch 随笔 第5张

2.3. Iterating through the Dataset

有了数据集,我们需要不断从中获取数据,用于训练或测试。

import torch

transformed_dataset = FaceLandmarksDataset(CsvFile_path='./data/faces/face_landmarks.csv',
                                           dir_img='./data/faces/',
                                           transform=transforms.Compose([
                                               Rescale(256),
                                               RandomCrop(210),
                                               ToTensor()
                                           ]))

for i in range(len(transformed_dataset)):
    sample = transformed_dataset[i]
    print(i, sample['image'].size(), sample['landmarks'].size())
    
    if i == 4:
        break
0 torch.Size([3, 210, 210]) torch.Size([68, 2])
1 torch.Size([3, 210, 210]) torch.Size([68, 2])
2 torch.Size([3, 210, 210]) torch.Size([68, 2])
3 torch.Size([3, 210, 210]) torch.Size([68, 2])
4 torch.Size([3, 210, 210]) torch.Size([68, 2])

However, we also want to:

  • batch the data.
  • shuffle the data.
  • Load the data in parallel.

torch.utils.DataLoader is an iterator which provides all these features.

from torch.utils.data import DataLoader
from torchvision import utils

dataloader = DataLoader(transformed_dataset, batch_size=4,
                    shuffle=True, num_workers=0) # Windows may error when num_workers > 0

def show_landmarks_batch(sample_batch):
    'Show images with landmarks for a batch of samples.'
    image_batch, landmarks_batch = sample_batch['image'], sample_batch['landmarks']

    batch_size = len(image_batch)
    im_size = image_batch.size(2)

    grid = utils.make_grid(image_batch)
    plt.imshow(grid.numpy().transpose((1,2,0))) # Tensors -> ndarrays -> CxHxW to HxWxC

    for i in range(batch_size):
        plt.scatter(landmarks_batch[i,:,0].numpy()+im_size*i,
                    landmarks_batch[i,:,1].numpy(),
                    s=10, marker='.', c='r')
        plt.title('A batch from dataloader')

ite_batch = 3
for ite,sample_batch in enumerate(dataloader):
    print(sample_batch['image'].size(),
          sample_batch['landmarks'].size())
    if ite == ite_batch:
        plt.figure()
        show_landmarks_batch(sample_batch)
        plt.axis('off')
        plt.show()
        break
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])
torch.Size([4, 3, 210, 210]) torch.Size([4, 68, 2])

Note | PyTorch 随笔 第6张

2.4. Torchvision

torchvision package provides some common datasets and transforms.

We might not even have to write custom classes. One of the more generic datasets available in torchvision is ImageFolder. It assumes that images are organized in the following way:

root/ants/xxx.png
root/ants/xxy.jpeg
root/ants/xxz.png
.
.
.
root/bees/123.jpg
root/bees/nsdf3.png
root/bees/asd932_.png

where ants and bees are class labels.

Besides, generic transforms in PIL.Image like RandomHorizontalFlip, Scale are also available.

import torch
from torchvision import transforms, datasets

data_transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                        std = [0.229, 0.224, 0.225])
])

hymenoptera_dataset = datasets.ImageFolder(root='hymenoptera_data/train',
                                          transform=data_transform)

dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset,
                                            batch_size=4,shuffle=True,num_workers=0)

3. LEARNING PYTORCH WITH EXAMPLES

我们之前提到,PyTorch的核心功能可以归结为以下二者:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

即:

  1. 用GPU承载张量运算。
  2. 提供深度学习所需的其他功能。

我们可以阐释得更清楚:PyTorch provides:

  • An n-dimensional Tensor, similar to numpy but can run on GPUs.
  • Automatic differentiation for building and training neural networks.

原因如下:

  1. GPU能提供50倍甚至更多的运算加速。
  2. 现今深度学习方法仍然离不开BP方法,因此差分法求梯度是不可或缺的。其中自动差分技术是被广泛使用的。

3.1. 基本概念:Tensors and Autograd

Tensor在概念上和NumPy的array本质上是一致的,但Tensor的功能更全面

  1. Tensor携带着运算图(computational graph)和梯度信息,并且可以保持追踪状态;运算图上的节点就是Tensor,边缘(edges)是函数(functions)。
  2. Tensor可以使用GPU完成数值计算。

我们来看一个两层全连接网络的例子:

import torch

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-6
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Initialize weight Tensors randomly
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

### Iterations
for ite in range(1,total_ite+1):
    
    y_pred = x.mm(w1).clamp(min=0).mm(w2) # clamp acts as relu function
    loss = (y_pred -y ).pow(2).sum()
    if ite % 100 == 0:
        print(ite, loss.item())
    
    loss.backward()
    
    # Manually update weights
    # Weights have requires_grad=True, but we don't need tracking.
    with torch.no_grad():
        
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
    
        # Maunally zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()
100 616.6424560546875
200 5.097920894622803
300 0.06861867755651474
400 0.0014389019925147295
500 0.0001344898482784629

TensorFlow和PyTorch最大的不同是:

  • TensorFlow的运算图(computational graphs)是静态的(static):当定义好后,我们可以多次使用相同的运算图,只有输入数据可以不同。
  • PyTorch的运算图是动态的(dynamic):每次前向传递(forward pass)时,运算图可以是全新的

静态运算图可以进一步优化,因此效率比较高;但在一些场合比如反馈网络(recurrent network),更新动态运算图会更加简单。
我们在下下节会给一个例子。

还有一点不同:在TensorFlow中,参数更新是包含在运算图内的,而PyTorch反之。因此在PyTorch中我们应该停止梯度追逐。

3.2. 简化操作:nn Module

显然,上面的手动前向传导和参数迭代繁琐的。特别当网络复杂庞大时,参数是很难显式列举的。
PyTorch提供了一些模块来解决这些问题。

首先是简化网络定义。简单来说,nn包含了许多神经网络常用组件以及一些常用的损失函数
定义好网络后,其中的参数会自动纳入学习参数(learnable parameters)列表内。

回到之前的例子:

import torch

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

for ite in range(1, total_ite+1):
    
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    if ite % 100 == 0:
        print(ite, loss.item())
    
    model.zero_grad()
    loss.backward()
    
    # Manually update weights
    with torch.no_grad():
        
        for param in model.parameters():
            param -= learning_rate * param.grad
100 2.5298514366149902
200 0.04136687144637108
300 0.0011623052414506674
400 4.448959225555882e-05
500 2.1180185285629705e-06

其次是简化优化步骤。PyTorch提供了optim包,可以支持更加复杂的优化方法

import torch

### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out)
)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

### Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for ite in range(1, total_ite+1):
    
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    if ite % 100 == 0:
        print(ite, loss.item())
    
    optimizer.zero_grad()
    loss.backward()
    
    # Update parameters
    optimizer.step()
100 65.00260925292969
200 1.0924508571624756
300 0.006899723317474127
400 5.2772647904930636e-05
500 1.6419755866081687e-07

nn包中提供的网络组件是很基本的。如果我们的网络很复杂,那么我们还可以自定义复杂网络

import torch

class TwoLayerNet(torch.nn.Module):

    def __init__(self, D_in, H, D_out):
        
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)
        
    def forward(self, x):
        
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred
    
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4 # bigger!
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = TwoLayerNet(D_in, H, D_out)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

### Define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for ite in range(1, total_ite+1):
    
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    if ite % 100 == 0:
        print(ite, loss.item())
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
100 71.93133544921875
200 1.759734869003296
300 0.012220818549394608
400 0.0002741872740443796
500 1.9429817257332616e-05

3.3. 动态优势:Control Flow + Weight Sharing of PyTorch

在这一节,我们要举例介绍PyTorch的动态图优势。

首先,我们要搭建一个全连接网络。该网络的特点是:

  • 每次前向传播时,隐藏层数目是随机的,可能是1,2,3或4;
  • 隐藏层的参数是共享的。
import torch
import random

class DynamicNet(torch.nn.Module):
    
    def __init__(self, D_in, H, D_out):
        
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)
        
    def forward(self, x):
        
        h_relu = self.input_linear(x).clamp(min=0)
        rand_num = random.randint(0, 3)
        for _ in range(rand_num): # 1 layer, 2 layers, 3 layers or 4 layers
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred, rand_num
    
### Settings
dtype = torch.float32
device = torch.device("cuda:0")
learning_rate = 1e-4
momentum = 0.9
total_ite = 500
N_batch, D_in, H, D_out = 64, 1000, 100, 10

### Creat random data set (Tensors)
x = torch.randn(N_batch, D_in, device=device, dtype=dtype)
y = torch.randn(N_batch, D_out, device=device, dtype=dtype)

### Define network model by nn package
model = DynamicNet(D_in, H, D_out)
model = model.to(device)

### Define loss function by nn package
loss_fn = torch.nn.MSELoss(reduction='sum')

### Define optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)

for ite in range(1, total_ite+1):
    
    y_pred, rand_num = model(x)
    loss = loss_fn(y_pred, y)
    if ite % 100 == 0:
        print(ite, loss.item(), rand_num)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
100 13.738285064697266 0
200 4.243963718414307 2
300 0.7942993640899658 1
400 0.43234169483184814 3
500 0.42137715220451355 2
扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄