Automatic Differentiation for Neural Network

Automatic differentiation, derivatives, backpropagation

Posted on November 21, 2016  

Moving forward on the last post, I implemented a toy library to let us write neural networks using reverse-mode automatic differentiation. Just to show how to use the library I am using the minimal neural network example from Andrej Karpathy's CS231n class. If you have already read Karpathy's notes, then the following code should be straight-forward to understand.

In [1]:
using ToyAD
using PyPlot

N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = AD(zeros(N*K, D)) # data matrix (each row = single example)
y = zeros(N*K, 1) # class labels
for j in range(1,K)
    idx = range(1+N*(j-1), N); #index for X and Y
    r = linspace(0.0,1,N); # radius
    t = linspace((j-1)*4,(j)*4,N) + randn(N)*0.2 
    X.value[idx, :] = [r.*sin(t) r.*cos(t)]
    y[idx,1] = j;
end
In [3]:
# lets visualize the data:
scatter(X.value[:, 1], X.value[:, 2],  s=40, c=y[:,1], alpha=0.5)
Out[3]:
PyObject <matplotlib.collections.PathCollection object at 0x000000002B6800B8>

We operate on the same type AD as in the last post. We build the computational graph and use the chain rule applying the reverse-mode automatic differentiation to calculate the gradients. The name of the function which does this is 'backprop'. There are few other functions such as relu and softmaxLoss, which I implemented in the library as and when required for finishing this example.

In [4]:
# initialize parameters randomly
h = 100 # size of hidden layer
W1 = AD(0.01 * randn(D,h))
b1 = AD(zeros(1,h))
W2 = AD(0.01 * randn(h,K))
b2 = AD(zeros(1,K))

# some hyperparameters
step_size = 1e-0
reg = 1e-3 # regularization strength

# gradient descent loop
num_examples = size(X.value,1)
numIterations = 10000
computeLoss = true

for i in 1:numIterations
    # prepare the graph
    hidden1 = relu(X*W1 .+ b1)
    hidden2 = hidden1*W2 .+ b2
    output = softmaxLoss(y, hidden2, 2, computeLoss, 0.5*reg*sum(W1.value .^2) + 0.5*reg*sum(W2.value .^2))
    
    # print loss
    if (i==1 || i % 1000 == 0) && computeLoss
        println("loss: ", output.loss)
    end
    
    # backpropate the gradient to the parameters
    backprop(output)
    
    # add regularization gradient contribution
    W2.grad += reg * W2.value
    W1.grad += reg * W1.value

    # update the weights and biases
    W1.value += -step_size * W1.grad
    b1.value += -step_size * b1.grad
    W2.value += -step_size * W2.grad
    b2.value += -step_size * b2.grad
    
    # reset gradient
    resetGrad(output)
end
loss: 1.0986446193493518
loss: 0.32813360881163
loss: 0.27978924092212776
loss: 0.2643202018008652
loss: 0.25724035611478624
loss: 0.2542016882067151
loss: 0.25253935566388686
loss: 0.251988711823047
loss: 0.251658646047599
loss: 0.2514123466915379
loss: 0.2512032706191144

There is also a function available to visualize the graph but it still needs a lot more work. The graph shows up in the IJulia kernel but not in my blog so I am just pasting the code below. Once you have the plot you can get more information if you hover over any of the nodes.

hidden1 = relu(X*W1 .+ b1)
hidden2 = hidden1*W2 .+ b2
output = softmaxLoss(y, hidden2, 2, computeLoss, 0.5*reg*sum(W1.value .^2) + 0.5*reg*sum(W2.value .^2))
ToyAD.plot(output)
In [5]:
# evaluate training set accuracy
hidden_layer = max(0, X.value*W1.value .+ b1.value)
scores = hidden_layer*W2.value .+ b2.value
predicted_class = zeros(size(scores,1))
for i in 1:size(scores,1)
    predicted_class[i] = indmax(scores[i,:])
end
#println(predicted_class)
correct = 0;
for i in 1:length(y)
    if y[i,1] == predicted_class[i]
        correct = correct + 1;
    end
end
println("training accuracy: ", correct/size(y,1))
training accuracy: 0.99
In [6]:
# plot the resulting classifier
h = 0.02;
x_min = minimum(X.value[:, 1]) - 1;
x_max = maximum(X.value[:, 1]) + 1;
y_min = minimum(X.value[:, 2]) - 1;
y_max = maximum(X.value[:, 2]) + 1;
numX = convert(Int, floor((x_max - x_min)/h));
xx = zeros(numX);
xx[1] = x_min;
yy = zeros(numX);
yy[1] = y_min;
for i in 2:numX
    xx[i] = xx[i-1] + h;
    yy[i] = yy[i-1] + h;
end
grid_x = [i for i in xx, j in yy];
grid_y = [j for i in xx, j in yy];
xy = [grid_x[:] grid_y[:]];
z0 = xy*W1.value .+ b1.value
z0[z0 .< 0] = 0 
z = z0*W2.value .+ b2.value
zz = zeros(size(z,1));
for i in 1:size(z,1)
    zz[i] = indmax(z[i,:])
end
zz = reshape(zz, size(grid_x));
In [7]:
contourf(grid_x, grid_y, zz, cmap=get_cmap("Spectral"), alpha=0.8) 
scatter(X.value[:, 1], X.value[:, 2], c=y, s=40)
Out[7]:
PyObject <matplotlib.collections.PathCollection object at 0x000000002B7DD9B0>

There is a lot missing from the library but, again, I implemented it just for educational purpose (it's a toy library ;)). The performance can be improved and more functions can be added. But there are far better libraries already available and after understanding the basics behinds such libraries I am tempted to use one for my next post. Let's see, till then keep it constructive and have fun!