Introduction to pyTorch #2

The linear regression

Introduction to pyTorch #2 : The linear regression
March 5, 2018 nschaetti

Introduction to pyTorch

Deep-Learning has gone from breakthrough but mysterious field to a well known and widely applied technology. In recent years (or months) several frameworks based mainly on Python were created to simplify Deep-Learning and to make it available to the general public of software engineer. In this battle field to be the future framework of reference, some stand out such a Theano, Keras and especially Google’s TensorFlow and Facebook’s pyTorch. This article is the first of a series of tutorial on pyTorch that will start with the basic gradient descend algorithm to very advanced concept and complex models. The goal of this article is to give you a general but useful view of the gradient descent algorithm used in all the Deep-Learning frameworks.

Do not miss the previous and the next articles :

  1. Introduction to pyTorch #1 : The gradient descent algorithm;
  2. Introduction to pyTorch #3 : Image classification with CNN;

The Linear Regression

A linear regression model is a model of regression which seeks to establish a linear relation between one variable and one or multiple other variables. Given a n samples \{y_i, x_{i0}, x_{i1},\dots,x_{ip}\}, a linear regression model assumes that the relationship between the dependent variable y_i and the p predictors x_i is linear. This relationship is modeled with an additional unobserved variable that adds noise, thus the model is defined by

(1)   \begin{equation*} y_i = \beta_0x_{i0} + \beta_1x_{i1} + \cdots + \beta_px_{ip} + \epsilon_i \end{equation*}

These n equations are sometimes stacker together and written as vectors . The model’s parameters are the \beta variables written as p+1-dimensional parameter vector, where \beta_0 is the constant term. In the rest of the article we will use pyTorch to find these parameters with Stochastic Gradient Descent. In mose of the cases, we can use direct analytic methods but we use SGD here as an example. We will have to following model,

(2)   \begin{equation*} y_i = bx_i + c + \epsilon_i \end{equation*}

and the following parameters :

  • b, the only predictor;
  • c, the constant term (or bias);

First we will set true value for this two parameters and generate sample using a random number generator for \elpsilon and finally, we will use the generated samples to find an approximation of the true parameters with an numerical optimisation algorithm. We start by importing some usefull packages.

# Imports
import argparse
import matplotlib.pyplot as plt
import math
import numpy as np
from matplotlib import cm
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.optim as optim

We import packages like matplotlib for visualization, numpy for numercial tools and pytorch to define our model and optimization methods. We use then torch function manual_seed() and numpy function seed() to initialize the random number generator, that way we will always get the same results.

# Random seed
torch.manual_seed(1)
np.random.seed(1)

We define the parameter value with which we will generate samples for our dataset.

# True parameter values
a = 4
c = 2

The variable v set the noise magnitude.

# Noise parameter
v = 8

And n_samples set the number of samples we’re going to generate.

# Number of samples
n_samples = 50

We will put the x and y values in two array X and Y. The values for x will be in [0,10]. For each sample, we generate the x value with the rand() function (which gives float number in [0,1]) and use this value in the linear equation. We use the rand() function again for the noise (\epsilon).

# Generate samples
X = np.zeros(n_samples)
Y = np.zeros(n_samples)
for i in range(n_samples):
    x = np.random.rand()*10.0
    y = a*x + c + v*(2*np.random.rand()-1.0)
    X[i] = x
    Y[i] = y
# end for

First, we create a linear model, the first parameter of the object nn.Linear() is the input size (number of predictors) and the second the number of dependent variables (y). We set the bias as True as it corresponds to our c parameter.

# Linear layer
linear = nn.Linear(1, 1, bias=True)
linear.cuda()

We now need an objective function which measure the difference between the current model’s output \hat{y} and the true output y. Here we use the Mean Squared Error which measure the error as the squared difference between \hat{y}_i and y_i.

(3)   \begin{equation*} MSE = \frac{1}{n}\sum^n_{i=1}(y_i - \hat{y}_i)^2 \end{equation*}

To use MSE with pyTorch, there is the object nn.MSELoss().

# Objective function is Mean Squared Error
criterion = nn.MSELoss()

We set the learning rate to 0.01.

# Learning parameters
learning_parameters = 0.01

The optim package as an object SGD for the stochastic gradient descent algorithm (SGD). The first argument is the list of parameters we want to optimize and the second is the learning rate.

# Optimizer
optimizer = optim.SGD(linear.parameters(), lr=learning_parameters)

We will do 500 iterations.

# Loop over the data set
for epoch in range(500):

We take our generated samples and add a dimension at the end as we have a one-dimensional feature vector and a one-dimensional output vector (y).

    # Inputs and outputs (n_samples * in_features)
    inputs, outputs = torch.Tensor(X).unsqueeze(1), torch.Tensor(Y).unsqueeze(1)

We transform the sample vector in Variable object. You can remove the cuda() function if you don’t have any GPU in your computer.

    # To variable
    inputs, outputs = Variable(inputs.cuda()), Variable(outputs.cuda())

Then we put the gradient of each parameter to zero.

    # Zero param gradients
    optimizer.zero_grad()

Then, we can run a forward pass, feeding the inputs into our linear layer, and then computing the Mean Squared Error between the model’s output (linear_outputs) and the target (outputs). The backward() function compute the gradient for each parameter base on the computed MSE. And finally, the step() function update the parameter using the computer gradients.

    # Forward + Backward + optimize
    linear_outputs = linear(inputs)
    loss = criterion(linear_outputs, outputs)
    loss.backward()
    optimizer.step()

Each 10 iterations, we display the MSE.

    # Print result
    if epoch % 10 == 0:
        print(u"Loss {} : {}".format(epoch, loss.data[0]))
    # end if
# end for

At the end of the iterations, we get our two approximated parameters b and c, and display them.

# Get and print parameter
model_a = float(list(linear.parameters())[0])
model_c = float(linear.bias)
print(u"Found a : {}".format(model_a))
print(u"Found c : {}".format(model_c))

And finally, we display the true model (in blue), the predicted model (in red), and all the generated samples.

# Show points and line
plt.scatter(X, Y, c='r', marker='o', s=1)
plt.plot([0, 10], [c, a * 10 + c], c='b')
plt.plot([0, 10], [model_c, model_a * 10 + model_c], c='g')
plt.show()

And the final result.

Loss 0 : 456.620697021
Loss 10 : 8.18101978302
Loss 20 : 8.01882457733
Loss 30 : 7.87698125839
Loss 40 : 7.75292825699
Loss 50 : 7.64443922043
Loss 60 : 7.54955673218
Loss 70 : 7.46657657623
Loss 80 : 7.39400577545
Loss 90 : 7.33053779602
Loss 100 : 7.27503061295
Loss 110 : 7.22648668289
Loss 120 : 7.18403053284
Loss 130 : 7.14690303802
Loss 140 : 7.11443138123
Loss 150 : 7.08603286743
Loss 160 : 7.0611948967
Loss 170 : 7.0394744873
Loss 180 : 7.02047872543
Loss 190 : 7.003865242
Loss 200 : 6.98933458328
Loss 210 : 6.97662782669
Loss 220 : 6.96551513672
Loss 230 : 6.95579576492
Loss 240 : 6.94729471207
Loss 250 : 6.93986082077
Loss 260 : 6.93336200714
Loss 270 : 6.92767572403
Loss 280 : 6.92270326614
Loss 290 : 6.91835308075
Loss 300 : 6.91455078125
Loss 310 : 6.91122436523
Loss 320 : 6.90831565857
Loss 330 : 6.90577077866
Loss 340 : 6.90354633331
Loss 350 : 6.90160083771
Loss 360 : 6.89989852905
Loss 370 : 6.89840984344
Loss 380 : 6.89710950851
Loss 390 : 6.89597034454
Loss 400 : 6.89497566223
Loss 410 : 6.89410400391
Loss 420 : 6.8933429718
Loss 430 : 6.89267683029
Loss 440 : 6.8920955658
Loss 450 : 6.89158439636
Loss 460 : 6.89113903046
Loss 470 : 6.89074993134
Loss 480 : 6.89040851593
Loss 490 : 6.89011240005
Found a : 4.03545856476
Found c : 2.14601278305

Nils Schaetti is a doctoral researcher in Switzerland specialised in machine learning and artificial intelligence.

2 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*