Introduction to pyTorch #3

Image classification with CNN

Introduction to pyTorch #3 : Image classification with CNN
March 7, 2018 nschaetti

Introduction to pyTorch

Deep-Learning has gone from breakthrough but mysterious field to a well known and widely applied technology. In recent years (or months) several frameworks based mainly on Python were created to simplify Deep-Learning and to make it available to the general public of software engineer. In this battle field to be the future framework of reference, some stand out such a Theano, Keras and especially Google’s TensorFlow and Facebook’s pyTorch. This article is the first of a series of tutorial on pyTorch that will start with the basic gradient descend algorithm to very advanced concept and complex models. The goal of this article is to give you a general but useful view of the gradient descent algorithm used in all the Deep-Learning frameworks.

Do not miss the two previous articles :

  1. Introduction to pyTorch #1 : The stochastic gradient algorithm;
  2. Introduction to pyTorch #2 : The linear regression;

The FashionMNIST dataset

The Fashion-MNST dataset contains Zalando’s article images with 60,000 images in the training set and 10,000 in the test set. Each sample is a 28×20 grayscale image with a label from 10 classes. This dataset base designed to be used as a drop-in replacement of the original MNST dataset.

The Code

As usual, the first step is to import some packages. Here we need obsviously pyTorch but also TorchVision, which provide tools and dataset for computer vision. Matplotlib and Numpy for visualisation and numerical toosl.

# Imports
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

For you to have the same result as me, we initialise the random number generators of Numpy and pyTorch.

# Random seed

We will work with 4 samples by batch (do the first to tutorials if you don’t know what is the batch size)

# Batch size
batch_size = 4

We will normalise the image contained in the dataset. To do this, torchvision gives us special object from the transforms module. We use here the object Compose do apply to transformation. First, we transform the image to a tensor and we normalise the values.

# Transformation to tensor and normalization
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]

Torchvision gives access to some dataset and specially to the one we are interested in : FashionMNIST. In the submodule datasets, just create and instance of the FashionMNIST object with the root directory (where you want to put the data), train as True (do you want the training or the test set), and the transformation to apply).

# Download the training set
trainset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)

To handle  the sample, just give the dataset object create above to a pyTorch’s DataLoader object. The first argument is the dataset to loader, the second one set the batch size, and third tell the data loader if we want to shuffle the samples before. The DataLoader object will allow us to access the dataset’s samples batch by batch.

# Training set loader
trainloader =, batch_size=batch_size, shuffle=False, num_workers=2)

We just have to do the same with the test set.

# Test set
testset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)
# Test set loader
testloader =, batch_size=batch_size, shuffle=False, num_workers=2)

Let just create a function to display an image sample from the dataset. Just to see what it looks like.

# Function to show an image
def imshow(img):
    img = img / 2 + 0.5
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
# end imshow

Each images belong to one of the 10 possibles classes : t-shirt, trouser, pullver, dress, coat, sandal, shirt, sneaker, bag and ankle boot.

# Classes
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')

We can iterate through the training set with the trainloader object.

# Dataset as iterator
dataiter = iter(trainloader)

We get the first batch of four images and labels.

# Get next batch
images, labels =

Let’s print the labels and display the images.

# Show images
n_batches = len(dataiter)
print(u"First 4 labels {}".format([classes[labels[j]] for j in range(4)]))

And here is the result.

Nice. Now, let’s create our first convolutional neural network in a separate file in a submodule named “modules”. In this new file, the pyTorch’s submodule “nn” and its functional submodule which provides respectively neural networks and functional tools.

from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

To create a neural net, we just have to create a class in our new file which inherits from the object nn.Module.

# Neural net
class Net(nn.Module):
    Neural net

To create a basic neural network, we first create the different layer objects in the constructor and we link these layers in the forward() function. We thus create these following layers :

  • A first convolutional layer with 6 filters of size 5;
  • A max pool layer of size 2 (we will use it two times);
  • A second convolutional layer with 16 filters of size 5;
  • We transform all these filters to a linear layer of size 120;
  • A second linear layer of size 10 (for each class);
    # Constructor
    def __init__(self):
        super(Net, self).__init__()
        self.conv_layer1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv_layer2 = nn.Conv2d(6, 16, 5)
        self.linear_layer1 = nn.Linear(16 * 4 * 4, 120)
        self.linear_layer2 = nn.Linear(120, 10)
    # end __init__

The forward() function is important in the Module object. This function compute the output of our model when we call the object with inputs at each iteration. The first and only argument $x$ is the given inputs. In this function we call each layer objects and function to compute the model’s output, this linked functions and arguments will be used in the backward pass to compute the gradient. We proceed in the same way as in the constructor.

  • We compute the output of the first convolution layer with the object we created in the constructor and we pass its output to a ReLU nonlinear function;
  • We pass the output of the previous layer to the max pooling layer;
  • The proceed similar way with the second convolution layer as with the first (conv layer + ReLU);
  • We apply the max pooling layer a second time;
  • We use view to transform the filter into a one-dimension layer;
  • We compute the outputs of the two linear layer with a ReLU non linear function at each layer;
    # Forward pass
    def forward(self, x):
        Forward pass
        :param x:
        x = self.conv_layer1(x)
        x = F.relu(x)
        x = self.pool(x)
        x = self.conv_layer2(x)
        x = F.relu(x)
        x = self.pool(x)
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.linear_layer1(x))
        x = F.relu(self.linear_layer2(x))
        return x
    # end forward

Our neural net finished, we can import it in the main file.

from modules.Net import Net

We create an instance of our neural net created in the other file and we transfert it to the GPU (remove the second line if you don’t have any GPU).

# Our neural net
net = Net()

The optimisation algorithm will not use the mean square error as objective function but the cross-entropy. From Wikipedia,

In information theory, the cross entropy between two probability distributions {\displaystyle p} and {\displaystyle q} over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an “unnatural” probability distribution {\displaystyle q}, rather than the “true” distribution {\displaystyle p}

# Objective function is cross-entropy
criterion = nn.CrossEntropyLoss()

We set the learning rate to 0.001 (you can try different values as an exercise).

# Learning rate
learning_rate = 0.001

We use stochastic gradient descent (SGD) with model parameters, learning rate and momentum to 0.9 (we will see momentum in an article dedicated to advanced gradient algorithm).

# Stochastic Gradient Descent
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)

We will do 30 iterations.

# Nb iterations
n_iterations = 30

To see how the model evolves, we save the accuracies obtained at each iterations.

# List of training and test accuracies
train_accuracies = np.zeros(n_iterations)
test_accuracies = np.zeros(n_iterations)

We do a loop for each iteration, and at each iteration we set the loss and the counters (success and total samples) to zero.

# Training !
for epoch in range(n_iterations):
    # Average loss during training
    average_loss = 0.0

    # Data to compute accuracy
    total = 0
    success = 0

We iterate over the batches.

    # Iterate over batches
    for i, data in enumerate(trainloader, 0):
        # Get the inputs and labels
        inputs, labels = data

For each batch, we get the inputs (images) and labels and transform them to Variable (remove cuda() if you don’t have any GPUs).

        # To variable
        inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
        # inputs, labels = Variable(inputs), Variable(labels)

We will compute the gradients after the forward pass so we reset them.

        # Put grad to zero

We call our model to do the forward pass and compute the loss (cross-entropy between outputs and target class labels.

        # Forward
        outputs = net(inputs)
        loss = criterion(outputs, labels)

We call the backward() function to compute the gradients with respect to each model’s parameters.

        # Backward

And we change the parameter values in the direction of the gradients.

        # Optimize

We take the predicted class as the one with the highest output probability.

        # Take the max as predicted
        _, predicted = torch.max(, 1)

We add the number of images (4) to the total counter, and we add the rightly predicted sample to the success counter.

        # Add to total
        total += labels.size(0)

        # Add correctly classified images
        success += (predicted ==

The train accuracy is the success over the total number of samples in the training set, multiplied by 100 (percentage)

    train_accuracy = 100.0 * success / total

Now, we try to predict the class of images in the test set to evaluate our model performance. We set the counters (success and total) to zero, we start iterating over the test set and we transform the inputs and labels to Variable.

    # Test model on test set
    success = 0
    total = 0
    for (inputs, labels) in testloader:
        # To variable
        inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())

With out trained model (net), we use the inputs and get the outputs.

        # Neural net's output
        outputs = net(inputs)

As before, the predicted class is the one with the maximum output probability.

        # Take the max is predicted
        _, predicted = torch.max(, 1)

We add the number of samples in the batch to the counter “total” and the number of correctly predicted sample to the counter “success”.

        # Add to total
        total += labels.size(0)

        # Add correctly classified images
        success += (predicted ==
    # end for

And we display the iteration, the average train loss, the train accuracy and the test accuracy.

    # Print average loss
    print(u"Epoch {}, average loss {}, train accuracy {}, test accuracy {}".format(
        epoch, average_loss / n_batches,
        100.0 * success / total

We save this result in our arrays.

    # Save
    train_accuracies[epoch] = train_accuracy
    test_accuracies[epoch] = 100.0 * success / total
# end for

And we plot the result with matplotlib.

plt.plot(np.arange(1, n_iterations+1), train_accuracies)
plt.plot(np.arange(1, n_iterations+1), test_accuracies)

The output in the terminal.

First 4 labels ['Ankle boot', 'T-shirt/top', 'T-shirt/top', 'Dress']
Epoch 0, average loss 0.962579420497, train accuracy 65.3616666667, test accuracy 75.61
Epoch 1, average loss 0.596013127398, train accuracy 77.26, test accuracy 77.78
Epoch 2, average loss 0.548232187277, train accuracy 78.9166666667, test accuracy 78.85
Epoch 3, average loss 0.519503551461, train accuracy 79.8466666667, test accuracy 79.28
Epoch 4, average loss 0.499694178671, train accuracy 80.5633333333, test accuracy 79.68
Epoch 5, average loss 0.483394132044, train accuracy 81.0533333333, test accuracy 79.66
Epoch 6, average loss 0.338950649367, train accuracy 87.1033333333, test accuracy 89.03
Epoch 7, average loss 0.2333326693, train accuracy 91.4266666667, test accuracy 88.82
Epoch 8, average loss 0.221399005592, train accuracy 91.83, test accuracy 88.55
Epoch 9, average loss 0.211384769694, train accuracy 92.2516666667, test accuracy 89.02
Epoch 10, average loss 0.202250013586, train accuracy 92.395, test accuracy 88.99
Epoch 11, average loss 0.193305731225, train accuracy 92.7716666667, test accuracy 88.84
Epoch 12, average loss 0.187061455429, train accuracy 93.0466666667, test accuracy 88.64
Epoch 13, average loss 0.182053107389, train accuracy 93.1566666667, test accuracy 88.67
Epoch 14, average loss 0.176176647568, train accuracy 93.335, test accuracy 88.61
Epoch 15, average loss 0.171825858625, train accuracy 93.56, test accuracy 88.33
Epoch 16, average loss 0.164413874177, train accuracy 93.7916666667, test accuracy 88.6
Epoch 17, average loss 0.164977300862, train accuracy 93.685, test accuracy 88.43
Epoch 18, average loss 0.160332164462, train accuracy 93.9966666667, test accuracy 88.17
Epoch 19, average loss 0.152791212412, train accuracy 94.1583333333, test accuracy 88.53
Epoch 20, average loss 0.153746423076, train accuracy 94.26, test accuracy 88.49
Epoch 21, average loss 0.151990672322, train accuracy 94.1433333333, test accuracy 87.92
Epoch 22, average loss 0.148130840091, train accuracy 94.2883333333, test accuracy 88.45
Epoch 23, average loss 0.140715094622, train accuracy 94.6433333333, test accuracy 88.32
Epoch 24, average loss 0.143673503029, train accuracy 94.4816666667, test accuracy 87.92
Epoch 25, average loss 0.14204166914, train accuracy 94.5866666667, test accuracy 87.77
Epoch 26, average loss 0.148164064189, train accuracy 94.3416666667, test accuracy 88.3
Epoch 27, average loss 0.142364215469, train accuracy 94.6133333333, test accuracy 88.64
Epoch 28, average loss 0.13410825446, train accuracy 94.9633333333, test accuracy 88.01
Epoch 29, average loss 0.134372444657, train accuracy 94.895, test accuracy 88.08

The plot show us the evolution of train and test accuracy in blue and orange respectively.

As you can see the training accuracy rises quickly and continue until the end. The test accuracy rises until iteration 6 and then gently drop. This phenomenon is declining test accuracy while the training accuracy continue to rise is called over-fitting.

Our can find the code on GitHub :

Nils Schaetti is a doctoral researcher in Switzerland specialised in machine learning and artificial intelligence.


Leave a reply

Your email address will not be published. Required fields are marked *