PyTorch Tutorial
09. Softmax Classifier
Lecturer : Hongpu Liu    Lecture 9-1          PyTorch Tutorial @ SLAM Research Group
Revision: Diabetes dataset
  𝑥1                                                                     Linear Layer
                                                                         Sigmoid Layer
  𝑥2          𝑧1        𝑜1
  𝑥3          𝑧2        𝑜2    𝑧1           𝑜1
  𝑥4          𝑧3        𝑜3    𝑧2           𝑜2
                                                𝑧1      𝑜1       𝑦ො
  𝑥5          𝑧4        𝑜4    𝑧3           𝑜3
  𝑥6          𝑧5        𝑜5    𝑧4           𝑜4
  𝑥7          𝑧6        𝑜6
  𝑥8
Lecturer : Hongpu Liu        Lecture 9-2         PyTorch Tutorial @ SLAM Research Group
Revision: MNIST Dataset
                                      There are 10 labels in MNIST dataset.
                                       How to design the neural network?
Lecturer : Hongpu Liu   Lecture 9-3           PyTorch Tutorial @ SLAM Research Group
Design 10 outputs using Sigmoid?
                                                                         Linear Layer
                                          𝑜1    𝑦ො1
                                                                         Sigmoid Layer
                                          𝑜2    𝑦ො2
                                                                         Input Layer
                                          𝑜3    𝑦ො3
                                          𝑜4    𝑦ො4
                                          𝑜5    𝑦ො5
                        …                                        What is wrong?
                                          𝑜6    𝑦ො6
                                                         We hope the outputs is competitive!
                                          𝑜7    𝑦ො7      Actually we hope the neural network
                                          𝑜8    𝑦ො8      outputs a distribution.
                                          𝑜9    𝑦ො9
                                          𝑜10   𝑦ො10
Lecturer : Hongpu Liu       Lecture 9-4               PyTorch Tutorial @ SLAM Research Group
Output a Distribution of prediction with Softmax
                                                                 Linear Layer
                                          𝑜1    𝑃(𝑦 = 0)
                                                                 Sigmoid Layer
                                          𝑜2    𝑃(𝑦 = 1)
                                                                 Input Layer
                                          𝑜3    𝑃(𝑦 = 2)
                                                                 Softmax Layer
                                          𝑜4    𝑃(𝑦 = 3)
                                          𝑜5    𝑃(𝑦 = 4)
                        …                                           such that
                                          𝑜6    𝑃(𝑦 = 5)
                                                               𝑃 𝑦=𝑖 ≥0
                                          𝑜7    𝑃(𝑦 = 6)
                                                                9
                                          𝑜8    𝑃(𝑦 = 7)        𝑃(𝑦 = 𝑖) = 1
                                          𝑜9    𝑃(𝑦 = 8)       𝑖=0
                                          𝑜10   𝑃(𝑦 = 9)
Lecturer : Hongpu Liu       Lecture 9-5          PyTorch Tutorial @ SLAM Research Group
Softmax Layer
   Suppose 𝑍 𝑙 ∈ ℝ𝐾 is the output of the last linear layer, the Softmax function:
                                    𝑒 𝑧𝑖
                        𝑃(𝑦 = 𝑖) = 𝐾−1 𝑧𝑗 , 𝑖 ∈ 0, … , 𝐾 − 1
                                  σ𝑗=0 𝑒
Lecturer : Hongpu Liu            Lecture 9-6        PyTorch Tutorial @ SLAM Research Group
Softmax Layer - Example
                        0.2
                        0.1
                …
                        −0.1
Lecturer : Hongpu Liu          Lecture 9-7   PyTorch Tutorial @ SLAM Research Group
Softmax Layer - Example
                        0.2                   1.22
                        0.1                   1.11
                …                  Exponent
                        −0.1                  0.90
Lecturer : Hongpu Liu          Lecture 9-8           PyTorch Tutorial @ SLAM Research Group
Softmax Layer - Example
                        0.2                   1.22
                        0.1                   1.11
                …                  Exponent
                        −0.1                  0.90
                                                     Sum
                                                           3.23
Lecturer : Hongpu Liu          Lecture 9-9             PyTorch Tutorial @ SLAM Research Group
Softmax Layer - Example
                        0.2                   1.22                           0.38
                        0.1                   1.11
                …                  Exponent                   Divide         0.34
                        −0.1                  0.90                           0.28
                                                     Sum
                                                           3.23
Lecturer : Hongpu Liu          Lecture 9-10            PyTorch Tutorial @ SLAM Research Group
Softmax Layer - Example
                                               Softmax
                        0.2                   1.22                           0.38
                        0.1                   1.11
                …                  Exponent                   Divide         0.34
                        −0.1                  0.90                           0.28
                                                     Sum
                                                           3.23
Lecturer : Hongpu Liu          Lecture 9-11            PyTorch Tutorial @ SLAM Research Group
Loss function - Cross Entropy
                                               
                                               𝒀
                        0.2                   0.38
                        0.1                   0.34
                …                   Softmax
                        −0.1                  0.28
Lecturer : Hongpu Liu          Lecture 9-12          PyTorch Tutorial @ SLAM Research Group
Loss function - Cross Entropy
                                               
                                               𝒀             𝒀
                        0.2                   0.38           1
                                                                   One-hot
                        0.1                   0.34           0
                …                   Softmax                                  𝟏
                        −0.1                  0.28           0
Lecturer : Hongpu Liu          Lecture 9-13          PyTorch Tutorial @ SLAM Research Group
Loss function - Cross Entropy
                                                 
                                                 𝒀             𝒀
                        0.2                    0.38            1
                                                                     One-hot
                        0.1                    0.34            0
                …                   Softmax           Loss                     𝟏
                        −0.1                   0.28            0
                                                    𝑌 = −𝑌 log 𝑌
                                              𝐿𝑜𝑠𝑠 𝑌,
Lecturer : Hongpu Liu          Lecture 9-14            PyTorch Tutorial @ SLAM Research Group
Loss function - Cross Entropy
                                                                     NLLLoss
                                                             Negative Log Likelihood Loss
                                                 
                                                 𝒀                           𝒀
                        0.2                    0.38                          1
                                                                                    One-hot
                        0.1                    0.34                          0
                …                   Softmax           Log              
                                                                −𝒀 𝒍𝒐𝒈 𝒀                      𝟏
                        −0.1                   0.28                          0
                                                                  Loss
                                                    𝑌 = −𝑌 log 𝑌
                                              𝐿𝑜𝑠𝑠 𝑌,
Lecturer : Hongpu Liu          Lecture 9-15             PyTorch Tutorial @ SLAM Research Group
Cross Entropy in Numpy
                                                                      Loss
                                                    
                                                    𝒀                          𝒀
                             0.2                   0.38                        1
                                                                                     One-hot
                             0.1                   0.34                        0
                …                        Softmax          Log              
                                                                    −𝒀 𝒍𝒐𝒈 𝒀                   𝟏
                           −0.1                    0.28                        0
 import numpy as np
 y = np.array([1, 0, 0])
 z = np.array([0.2, 0.1, -0.1])
 y_pred = np.exp(z) / np.exp(z).sum()
 loss = (- y * np.log(y_pred)).sum()
 print(loss)
Lecturer : Hongpu Liu               Lecture 9-16            PyTorch Tutorial @ SLAM Research Group
Cross Entropy in Numpy
                                                                      Loss
                                                    
                                                    𝒀                          𝒀
                             0.2                   0.38                        1
                                                                                     One-hot
                             0.1                   0.34                        0
                …                        Softmax          Log              
                                                                    −𝒀 𝒍𝒐𝒈 𝒀                   𝟏
                           −0.1                    0.28                        0
 import numpy as np
 y = np.array([1, 0, 0])
 z = np.array([0.2, 0.1, -0.1])
 y_pred = np.exp(z) / np.exp(z).sum()
 loss = (- y * np.log(y_pred)).sum()
 print(loss)
Lecturer : Hongpu Liu               Lecture 9-17            PyTorch Tutorial @ SLAM Research Group
Cross Entropy in Numpy
                                                                      Loss
                                                    
                                                    𝒀                          𝒀
                             0.2                   0.38                        1
                                                                                     One-hot
                             0.1                   0.34                        0
                …                        Softmax          Log              
                                                                    −𝒀 𝒍𝒐𝒈 𝒀                   𝟏
                           −0.1                    0.28                        0
 import numpy as np
 y = np.array([1, 0, 0])
 z = np.array([0.2, 0.1, -0.1])
 y_pred = np.exp(z) / np.exp(z).sum()
 loss = (- y * np.log(y_pred)).sum()
 print(loss)
Lecturer : Hongpu Liu               Lecture 9-18            PyTorch Tutorial @ SLAM Research Group
Cross Entropy in Numpy
                                                                      Loss
                                                    
                                                    𝒀                          𝒀
                             0.2                   0.38                        1
                                                                                     One-hot
                             0.1                   0.34                        0
                …                        Softmax          Log              
                                                                    −𝒀 𝒍𝒐𝒈 𝒀                   𝟏
                           −0.1                    0.28                        0
 import numpy as np
 y = np.array([1, 0, 0])
 z = np.array([0.2, 0.1, -0.1])
 y_pred = np.exp(z) / np.exp(z).sum()
 loss = (- y * np.log(y_pred)).sum()
 print(loss)
Lecturer : Hongpu Liu               Lecture 9-19            PyTorch Tutorial @ SLAM Research Group
Cross Entropy in PyTorch
                                                   
                                                   𝒀                   Loss     𝒀
                            0.2                   0.38                          1
                                                                                     One-hot
                            0.1                   0.34                          0
                …                       Softmax          Log               
                                                                    −𝒀 𝒍𝒐𝒈 𝒀                   𝟏
                           −0.1                   0.28                          0
                                                  Torch.nn.CrossEntropyLoss()
   import torch
   y = torch.LongTensor([0])
   z = torch.Tensor([[0.2, 0.1, -0.1]])
   criterion = torch.nn.CrossEntropyLoss()
   loss = criterion(z, y)
   print(loss)
Lecturer : Hongpu Liu              Lecture 9-20             PyTorch Tutorial @ SLAM Research Group
Mini-Batch: batch_size=3
      import torch
      criterion = torch.nn.CrossEntropyLoss()
      Y = torch.LongTensor([2, 0, 1])
      Y_pred1 = torch.Tensor([[0.1,   0.2,   0.9],
                              [1.1,   0.1,   0.2],
                              [0.2,   2.1,   0.1]])
      Y_pred2 = torch.Tensor([[0.8,   0.2,   0.3],
                              [0.2,   0.3,   0.5],
                              [0.2,   0.2,   0.5]])
      l1 = criterion(Y_pred1, Y)                                     Batch Loss1 = tensor(0.4966)
      l2 = criterion(Y_pred2, Y)
      print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data)    Batch Loss2 = tensor(1.2389)
Lecturer : Hongpu Liu                 Lecture 9-21             PyTorch Tutorial @ SLAM Research Group
Exercise 9-1: CrossEntropyLoss vs NLLLoss
  • What are the differences?
  • Reading the document:
      • https://pytorch.org/docs/stable/nn.html#crossentropyloss
      • https://pytorch.org/docs/stable/nn.html#nllloss
  • Try to know why:
      • CrossEntropyLoss <==> LogSoftmax + NLLLoss
Lecturer : Hongpu Liu        Lecture 9-22        PyTorch Tutorial @ SLAM Research Group
Back to MNIST Dataset
                                       There are 10 labels in MNIST dataset.
                                        How to design the neural network?
Lecturer : Hongpu Liu   Lecture 9-23           PyTorch Tutorial @ SLAM Research Group
MNIST Dataset
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.5 0.5 0.7 0.1 0.7 1.0 1.0 0.5 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.4 0.6 0.7 1.0 1.0 1.0 1.0 1.0 0.9 0.7 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.9 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.4 0.3 0.3 0.2 0.2 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.9 1.0 1.0 1.0 1.0 1.0 0.8 0.7 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.6 0.4 1.0 1.0 0.8 0.0 0.0 0.2 0.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.6 1.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.8 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.0 0.9 0.6 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.9 1.0 1.0 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.7 1.0 1.0 0.6 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.4 1.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.5 0.7 1.0 1.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.6 0.9 1.0 1.0 1.0 1.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.5 0.9 1.0 1.0 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.3 0.8 1.0 1.0 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.7 0.9 1.0 1.0 1.0 1.0 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.2 0.7 0.9 1.0 1.0 1.0 1.0 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.5 1.0 1.0 1.0 0.8 0.5 0.5 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
            28 ∗ 28 = 784                  0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
                                           0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Lecturer : Hongpu Liu       Lecture 9-24                                   PyTorch Tutorial @ SLAM Research Group
Implementation of classifier to MNIST dataset
                Prepare dataset                             Design model using Class
         1                                              2
                Dataset and Dataloader                       inherit from nn.Module
                Construct loss and optimizer                Training cycle
         3                                              4
                using PyTorch API                           forward, backward, update
Lecturer : Hongpu Liu                    Lecture 9-25       PyTorch Tutorial @ SLAM Research Group
Implementation of classifier to MNIST dataset
                Prepare dataset                             Design model using Class
         1                                              2
                Dataset and Dataloader                       inherit from nn.Module
                Construct loss and optimizer                Training cycle + Test
         3                                              4
                using PyTorch API                           forward, backward, update
Lecturer : Hongpu Liu                    Lecture 9-26       PyTorch Tutorial @ SLAM Research Group
Implementation – 0. Import Package
     import torch
     from torchvision import transforms
     from torchvision import datasets          For constructing DataLoader
     from torch.utils.data import DataLoader
     import torch.nn.functional as F
     import torch.optim as optim
Lecturer : Hongpu Liu               Lecture 9-27          PyTorch Tutorial @ SLAM Research Group
Implementation – 0. Import Package
     import torch
     from torchvision import transforms
     from torchvision import datasets
     from torch.utils.data import DataLoader
     import torch.nn.functional as F               For using function relu()
     import torch.optim as optim
Lecturer : Hongpu Liu               Lecture 9-28             PyTorch Tutorial @ SLAM Research Group
Implementation – 0. Import Package
     import torch
     from torchvision import transforms
     from torchvision import datasets
     from torch.utils.data import DataLoader
     import torch.nn.functional as F
     import torch.optim as optim                   For constructing Optimizer
Lecturer : Hongpu Liu               Lecture 9-29            PyTorch Tutorial @ SLAM Research Group
Implementation – 1. Prepare Dataset
  batch_size = 64
  transform = transforms.Compose([
      transforms.ToTensor(),                                 Convert the PIL Image to Tensor.
      transforms.Normalize((0.1307, ), (0.3081, ))
  ])
  train_dataset = datasets.MNIST(root='../dataset/mnist/',
                                 train=True,
                                                                           PIL Image
                                 download=True,
                                 transform=transform)           ℤ28×28 , 𝑝𝑖𝑥𝑒𝑙 ∈ 0, … , 255
  train_loader = DataLoader(train_dataset,
                            shuffle=True,
                            batch_size=batch_size)
                                                                        PyTorch Tensor
  test_dataset = datasets.MNIST(root='../dataset/mnist/',
                                train=False,
                                download=True,                    ℝ1×28×28 , 𝑝𝑖𝑥𝑙𝑒 ∈ 0,1
                                transform=transform)
  test_loader = DataLoader(test_dataset,
                           shuffle=False,
                           batch_size=batch_size)
Lecturer : Hongpu Liu                   Lecture 9-30             PyTorch Tutorial @ SLAM Research Group
Implementation – 1. Prepare Dataset
  batch_size = 64
  transform = transforms.Compose([                           The parameters are mean and std
      transforms.ToTensor(),
      transforms.Normalize((0.1307, ), (0.3081, ))           respectively. It use formulation
  ])
  train_dataset = datasets.MNIST(root='../dataset/mnist/',
                                                             below:
                                 train=True,
                                 download=True,
                                 transform=transform)                       𝑷𝒊𝒙𝒆𝒍𝒐𝒓𝒊𝒈𝒊𝒏 − 𝒎𝒆𝒂𝒏
  train_loader = DataLoader(train_dataset,                    𝑷𝒊𝒙𝒆𝒍𝒏𝒐𝒓𝒎   =
                            shuffle=True,                                           𝒔𝒕𝒅
                            batch_size=batch_size)
  test_dataset = datasets.MNIST(root='../dataset/mnist/',
                                train=False,
                                download=True,
                                transform=transform)
  test_loader = DataLoader(test_dataset,
                           shuffle=False,
                           batch_size=batch_size)
Lecturer : Hongpu Liu                   Lecture 9-31              PyTorch Tutorial @ SLAM Research Group
Implementation – 1. Prepare Dataset
  batch_size = 64
  transform = transforms.Compose([
      transforms.ToTensor(),
      transforms.Normalize((0.1307, ), (0.3081, ))
  ])
  train_dataset = datasets.MNIST(root='../dataset/mnist/',
                                 train=True,
                                 download=True,
                                 transform=transform)
  train_loader = DataLoader(train_dataset,
                            shuffle=True,
                            batch_size=batch_size)
  test_dataset = datasets.MNIST(root='../dataset/mnist/',
                                train=False,
                                download=True,
                                transform=transform)
  test_loader = DataLoader(test_dataset,
                           shuffle=False,
                           batch_size=batch_size)
Lecturer : Hongpu Liu                   Lecture 9-32         PyTorch Tutorial @ SLAM Research Group
Implementation – 2. Design Model
                                         x = x.view(-1, 784)
                        (𝑁, 784)
    (𝑁, 1,28,28)                         self.l1 = torch.nn.Linear(784, 512)
                        (𝑁, 512)
                                         x = F.relu(self.l1(x))
                        (𝑁, 512)
      Input Layer
                                         self.l2 = torch.nn.Linear(512, 256)
      Linear Layer      (𝑁, 256)
                                         x = F.relu(self.l2(x))
      ReLU Layer        (𝑁, 256)
      Output Layer                       self.l3 = torch.nn.Linear(256, 128)
                        (𝑁, 128)
                                         x = F.relu(self.l3(x))
                        (𝑁, 128)
                                         self.l4 = torch.nn.Linear(128, 64)
                        (𝑁, 64)
                                         x = F.relu(self.l4(x))
                        (𝑁, 64)
        (𝑁, 10)                          self.l5 = torch.nn.Linear(64, 10)
Lecturer : Hongpu Liu     Lecture 9-33                 PyTorch Tutorial @ SLAM Research Group
Implementation – 2. Design Model
                                         class Net(torch.nn.Module):
                        (𝑁, 784)
                                             def __init__(self):
    (𝑁, 1,28,28)
                                                 super(Net, self).__init__()
                        (𝑁, 512)
                                                 self.l1 = torch.nn.Linear(784, 512)
                                                 self.l2 = torch.nn.Linear(512, 256)
                        (𝑁, 512)
      Input Layer                                self.l3 = torch.nn.Linear(256, 128)
                                                 self.l4 = torch.nn.Linear(128, 64)
      Linear Layer      (𝑁, 256)
                                                 self.l5 = torch.nn.Linear(64, 10)
      ReLU Layer        (𝑁, 256)
                                             def forward(self, x):
      Output Layer                               x = x.view(-1, 784)
                        (𝑁, 128)
                                                 x = F.relu(self.l1(x))
                        (𝑁, 128)
                                                 x = F.relu(self.l2(x))
                                                 x = F.relu(self.l3(x))
                        (𝑁, 64)                  x = F.relu(self.l4(x))
                                                 return self.l5(x)
                        (𝑁, 64)
        (𝑁, 10)                          model = Net()
Lecturer : Hongpu Liu     Lecture 9-34           PyTorch Tutorial @ SLAM Research Group
Implementation – 3. Construct Loss and Optimizer
                    criterion = torch.nn.CrossEntropyLoss()
                    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
                                                       
                                                       𝒀                  Loss         𝒀
                              0.2                    0.38                              1
                                                                                           One-hot
                              0.1                    0.34                              0
                …                         Softmax            Log               
                                                                        −𝒀 𝒍𝒐𝒈 𝒀                     𝟏
                             −0.1                    0.28                              0
                                                     Torch.nn.CrossEntropyLoss()
Lecturer : Hongpu Liu                Lecture 9-35               PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
     def train(epoch):
         running_loss = 0.0
         for batch_idx, data in enumerate(train_loader, 0):
             inputs, target = data
             optimizer.zero_grad()
             # forward + backward + update
             outputs = model(inputs)
             loss = criterion(outputs, target)
             loss.backward()
             optimizer.step()
             running_loss += loss.item()
             if batch_idx % 300 == 299:
                 print('[%d, %5d] loss: %.3f' % (epoch + 1, batch_idx + 1, running_loss / 300))
                 running_loss = 0.0
Lecturer : Hongpu Liu                Lecture 9-36             PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
     def train(epoch):
         running_loss = 0.0
         for batch_idx, data in enumerate(train_loader, 0):
             inputs, target = data
             optimizer.zero_grad()
             # forward + backward + update
             outputs = model(inputs)
             loss = criterion(outputs, target)
             loss.backward()
             optimizer.step()
             running_loss += loss.item()
             if batch_idx % 300 == 299:
                 print('[%d, %5d] loss: %.3f' % (epoch + 1, batch_idx + 1, running_loss / 300))
                 running_loss = 0.0
Lecturer : Hongpu Liu               Lecture 9-37              PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
     def train(epoch):
         running_loss = 0.0
         for batch_idx, data in enumerate(train_loader, 0):
             inputs, target = data
             optimizer.zero_grad()
             # forward + backward + update
             outputs = model(inputs)
             loss = criterion(outputs, target)
             loss.backward()
             optimizer.step()
             running_loss += loss.item()
             if batch_idx % 300 == 299:
                 print('[%d, %5d] loss: %.3f' % (epoch + 1, batch_idx + 1, running_loss / 300))
                 running_loss = 0.0
Lecturer : Hongpu Liu               Lecture 9-38              PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
     def train(epoch):
         running_loss = 0.0
         for batch_idx, data in enumerate(train_loader, 0):
             inputs, target = data
             optimizer.zero_grad()
             # forward + backward + update
             outputs = model(inputs)
             loss = criterion(outputs, target)
             loss.backward()
             optimizer.step()
             running_loss += loss.item()
             if batch_idx % 300 == 299:
                 print('[%d, %5d] loss: %.3f' % (epoch + 1, batch_idx + 1, running_loss / 300))
                 running_loss = 0.0
Lecturer : Hongpu Liu               Lecture 9-39              PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
                  def test():
                      correct = 0
                      total = 0
                      with torch.no_grad():
                          for data in test_loader:
                              images, labels = data
                              outputs = model(images)
                              _, predicted = torch.max(outputs.data, dim=1)
                              total += labels.size(0)
                              correct += (predicted == labels).sum().item()
                      print('Accuracy on test set: %d %%' % (100 * correct / total))
Lecturer : Hongpu Liu               Lecture 9-40               PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
                  def test():
                      correct = 0
                      total = 0
                      with torch.no_grad():
                          for data in test_loader:
                              images, labels = data
                              outputs = model(images)
                              _, predicted = torch.max(outputs.data, dim=1)
                              total += labels.size(0)
                              correct += (predicted == labels).sum().item()
                      print('Accuracy on test set: %d %%' % (100 * correct / total))
Lecturer : Hongpu Liu              Lecture 9-41             PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
                  def test():
                      correct = 0
                      total = 0
                      with torch.no_grad():
                          for data in test_loader:
                              images, labels = data
                              outputs = model(images)
                              _, predicted = torch.max(outputs.data, dim=1)
                              total += labels.size(0)
                              correct += (predicted == labels).sum().item()
                      print('Accuracy on test set: %d %%' % (100 * correct / total))
Lecturer : Hongpu Liu              Lecture 9-42              PyTorch Tutorial @ SLAM Research Group
Implementation – 4. Train and Test
                                                   [1,   300] loss: 0.335
                                                   [1,   600] loss: 0.154
                                                   [1,   900] loss: 0.067
                                                   Accuracy on test set: 90   %
                                                   [2,   300] loss: 0.048
                                                   [2,   600] loss: 0.040
                                                   [2,   900] loss: 0.035
            if __name__ == '__main__':             Accuracy on test set: 93   %
                for epoch in range(10):            ………………………………
                    train(epoch)                   [9,   300] loss: 0.005
                    test()
                                                   [9,   600] loss: 0.006
                                                   [9,   900] loss: 0.007
                                                   Accuracy on test set: 97   %
                                                   [10,   300] loss: 0.005
                                                   [10,   600] loss: 0.005
                                                   [10,   900] loss: 0.005
                                                   Accuracy on test set: 97   %
Lecturer : Hongpu Liu               Lecture 9-43          PyTorch Tutorial @ SLAM Research Group
Softmax and CrossEntropyLoss
                                               
                                               𝒀                   Loss     𝒀
                        0.2                   0.38                          1
                                                                                 One-hot
                        0.1                   0.34                          0
                …                   Softmax          Log               
                                                                −𝒀 𝒍𝒐𝒈 𝒀                   𝟏
                        −0.1                  0.28                          0
                                              Torch.nn.CrossEntropyLoss()
Lecturer : Hongpu Liu          Lecture 9-44             PyTorch Tutorial @ SLAM Research Group
Exercise 9-2: Classifier Implementation
  • Try to implement a classifier for:
      • Otto Group Product Classification Challenge
      • Dataset: https://www.kaggle.com/c/otto-group-product-classification-
        challenge/data
Lecturer : Hongpu Liu        Lecture 9-45        PyTorch Tutorial @ SLAM Research Group
                        PyTorch Tutorial
                          09. Softmax Classifier
Lecturer : Hongpu Liu    Lecture 9-46         PyTorch Tutorial @ SLAM Research Group