# Introduction to pyTorch #1

Introduction to pyTorch #1 : The gradient descent algorithm
January 25, 2018

# Introduction to pyTorch

Do not miss the two next articles :

The purpose of  the gradient descent algorithm is to find the minimum of a function. As you will see, all the training algorithms in machine learning consist in finding the minimum of a function which represents the difference between what we have (the output of a mathematical model) and what we want (the target output to be learned). We can start with a simple function, with two variables, and try to find the minimum with an iterative algorithm. The main idea behind GDA is to evaluate the gradient of a function at a certain point, and to move in the direction of the gradient, just like a stone would roll along a slope. More complex algorithm are more efficience but there are all based on this principle.

We will then start with a simple quadratic equation with two variables. We want to find the global minimum of this equation, global meaning here the smallest overall value of a function over its entire range. Of course this is not always possible as some functions are non-convex. In mathematics, a function is called convex if the line segment between any two points on the graph of the function lies above or on the graph. Non-convex functions have local minima, also called a relative minimum, which is a minimum within some neighbourhood that need not be a global minimum. You can see below an example of a convex and non-convex function We want here to find the minimum of the following function.

(1) To find a minimum, the gradient algorithm works as follow,

1. We take an initial point ;
2. We compute the gradient at the current position : ;
3. We compute the next point : , where is the learning rate;
4. Repeat 2 and 3 for a specified number of iteration or until a stop condition fullfilled;

We start by importing some packages for argument parsing, plots, math and obviously numpy.

import argparse
import matplotlib.pyplot as plt
import math
import numpy as np
from mpl_toolkits.mplot3d import axes3d, Axes3D
from matplotlib import cm


We implement the function we want to minimise.

# Function
def func(xv, yv):
"""
Function
:param xv: X
:param yv: Y
:return: Z
"""
return 0.5*math.pow(xv, 2) + xv + 0.25*math.pow(yv, 2) - 2
# end func


We add some arguments to our program :

• The learning rate;
• The number of iteration;
• The x position of the starting point;
• The y position of the starting point;
# Arguments
args = parser.parse_args()

# Create two variables
x = args.x
y = args.y


The learning rate set the size of the step at each point, with a learning rate of 0.001, we will move in the direction of the gradient, multiplied by this learning rate. If the learning rate is to small, the convergence to the mimum will be to slow, in the contrary, if the learning rate is too big, an iteration could miss the minimum point and terminate in a point higher that before. Leading the algorithm to diverge from the minimum point.

We create three lists to start the results to display at the end.


# List of positions
x_values = list([x])
y_values = list([y])
z_values = list([func(x, y)])


The main loop will iterate the number of time we specified in our command parameters.

# Do the iterations
for i in range(args.iterations):


The compute the gradient we must find the partial derivative for each arguments. If you do not know how to do a derivative, find some course on the internet as it is mandatory to understand the underlying concepts of deep-learning and machine learning. To compute the partial derivate for a variable, we consider the other variable as a constant. For the variable, we compute the derivative for parts containing and consider as a constant, we then get the following result.

(2) We do the same thing for the variable and we consider this time the variable as a constant. We have our second partial derivative.

(3) With this, we can compute the gradient of each variable in our Python code.

# Compute gradient


The main idea of the gradient descent algorithm is to update to position in de direction of the gradient. The learning rate is here to module the importance of the deplacement.

# Update each parameters


We add the new position to our list and the corresponding z value of the function at this point.

# Print gradients and value
x_values.append(x)
y_values.append(y)
z_values.append(func(x, y))


That’s all for the iteration loop. When the iterations are finised, we add a subplot to a matplotlib figure.

# Plot
fig = plt.figure()


We want to display the function surface and the path followed by the gradient descend algorithm. To create a 3D surface with matplotlib, we need each point on the , and axis. We will set the range as -2.5 to 2.5 with a step of 0.1. We can use the Numpy’s function arange for X positions.

# X position
X = np.zeros((51, 51))
X[:, ] = np.arange(-2.5, 2.6, 0.1)


We have a X matrix with size 51 by 51, with values going from -2.5 to 2.5 on one axis. We now want an Y matrix of the same size with values varying on the other axis.

# Y position
Y = np.zeros((51, 51))
for i in range(51):
Y[i, :] = X[0, i]
# end for


We then compute corresponding value for each position.


# Compute Z
Z = np.zeros((51, 51))
for j in range(51):
for i in range(51):
x_pos = X[j, i]
y_pos = Y[j, i]
Z[j, i] = func(x_pos,y_pos)
# end for
# end for


We add labels to the figure with axis names.

# Labels
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')


We can now plot the 3D surface with our three matrices , and and the plot_surface() function.

# Plot a basic surface.
ax.plot_surface(X, Y, Z, cmap=cm.hot, linewidth=0, antialiased=True, alpha=1.0)


And we use the function plot() function with the , , values saved during the iterations.

# Scatter points
ax.plot(x_values, y_values, z_values, label='Learning curve', color='lightblue')


And we simply display the resulting figure with show(). The following plot show the path followed by the gradient descent algorithm with a learning rate of 0.1 and 10 iterations. We cannot reach the minimum point, let’s try with a learning rate of 1. The algorithm jumps directly near the minimum point, and slowly reach the minimum point. Let’s see what appends with a learning rate of 2. The algorithm at the first step goes way to far a miss the minimum point, and is going higher than the preceding point. Nils Schaetti is a doctoral researcher in Switzerland specialised in machine learning and artificial intelligence.