Automatic Differentiation for Neural Network

· 739 words · 4 minute read

Moving forward on the last post, I implemented a toy library to let us write neural networks using reverse-mode automatic differentiation. Just to show how to use the library I am using the minimal neural network example from Andrej Karpathy's CS231n class. If you have already read Karpathy's notes, then the following code should be straight-forward to understand.

In [1]:
using ToyAD
using PyPlot

N = 100 # number of points per class D = 2 # dimensionality K = 3 # number of classes X = AD(zeros(NK, D)) # data matrix (each row = single example) y = zeros(NK, 1) # class labels for j in range(1,K) idx = range(1+N(j-1), N); #index for X and Y r = linspace(0.0,1,N); # radius t = linspace((j-1)4,(j)4,N) + randn(N)0.2 X.value[idx, :] = [r.sin(t) r.cos(t)] y[idx,1] = j; end

In [3]:
# lets visualize the data:
scatter(X.value[:, 1], X.value[:, 2],  s=40, c=y[:,1], alpha=0.5)
Out[3]:
PyObject <matplotlib.collections.PathCollection object at 0x000000002B6800B8>

We operate on the same type AD as in the last post. We build the computational graph and use the chain rule applying the reverse-mode automatic differentiation to calculate the gradients. The name of the function which does this is 'backprop'. There are few other functions such as relu and softmaxLoss, which I implemented in the library as and when required for finishing this example.

In [4]:
# initialize parameters randomly
h = 100 # size of hidden layer
W1 = AD(0.01 * randn(D,h))
b1 = AD(zeros(1,h))
W2 = AD(0.01 * randn(h,K))
b2 = AD(zeros(1,K))

# some hyperparameters step_size = 1e-0 reg = 1e-3 # regularization strength

# gradient descent loop num_examples = size(X.value,1) numIterations = 10000 computeLoss = true

for i in 1:numIterations # prepare the graph hidden1 = relu(XW1 .+ b1) hidden2 = hidden1W2 .+ b2 output = softmaxLoss(y, hidden2, 2, computeLoss, 0.5regsum(W1.value .^2) + 0.5regsum(W2.value .^2)) # print loss if (i==1 || i % 1000 == 0) && computeLoss println("loss: ", output.loss) end # backpropate the gradient to the parameters backprop(output) # add regularization gradient contribution W2.grad += reg W2.value W1.grad += reg W1.value # update the weights and biases W1.value += -step_size W1.grad b1.value += -step_size b1.grad W2.value += -step_size W2.grad b2.value += -step_size b2.grad # reset gradient resetGrad(output) end

loss: 1.0986446193493518
loss: 0.32813360881163
loss: 0.27978924092212776
loss: 0.2643202018008652
loss: 0.25724035611478624
loss: 0.2542016882067151
loss: 0.25253935566388686
loss: 0.251988711823047
loss: 0.251658646047599
loss: 0.2514123466915379
loss: 0.2512032706191144

There is also a function available to visualize the graph but it still needs a lot more work. The graph shows up in the IJulia kernel but not in my blog so I am just pasting the code below. Once you have the plot you can get more information if you hover over any of the nodes.

hidden1 = relu(X*W1 .+ b1)
hidden2 = hidden1*W2 .+ b2
output = softmaxLoss(y, hidden2, 2, computeLoss, 0.5*reg*sum(W1.value .^2) + 0.5*reg*sum(W2.value .^2))
ToyAD.plot(output)
In [5]:
# evaluate training set accuracy
hidden_layer = max(0, X.value*W1.value .+ b1.value)
scores = hidden_layer*W2.value .+ b2.value
predicted_class = zeros(size(scores,1))
for i in 1:size(scores,1)
    predicted_class[i] = indmax(scores[i,:])
end
#println(predicted_class)
correct = 0;
for i in 1:length(y)
    if y[i,1] == predicted_class[i]
        correct = correct + 1;
    end
end
println("training accuracy: ", correct/size(y,1))
training accuracy: 0.99
In [6]:
# plot the resulting classifier
h = 0.02;
x_min = minimum(X.value[:, 1]) - 1;
x_max = maximum(X.value[:, 1]) + 1;
y_min = minimum(X.value[:, 2]) - 1;
y_max = maximum(X.value[:, 2]) + 1;
numX = convert(Int, floor((x_max - x_min)/h));
xx = zeros(numX);
xx[1] = x_min;
yy = zeros(numX);
yy[1] = y_min;
for i in 2:numX
    xx[i] = xx[i-1] + h;
    yy[i] = yy[i-1] + h;
end
grid_x = [i for i in xx, j in yy];
grid_y = [j for i in xx, j in yy];
xy = [grid_x[:] grid_y[:]];
z0 = xy*W1.value .+ b1.value
z0[z0 .< 0] = 0 
z = z0*W2.value .+ b2.value
zz = zeros(size(z,1));
for i in 1:size(z,1)
    zz[i] = indmax(z[i,:])
end
zz = reshape(zz, size(grid_x));
In [7]:
contourf(grid_x, grid_y, zz, cmap=get_cmap("Spectral"), alpha=0.8) 
scatter(X.value[:, 1], X.value[:, 2], c=y, s=40)
Out[7]:
PyObject <matplotlib.collections.PathCollection object at 0x000000002B7DD9B0>

There is a lot missing from the library but, again, I implemented it just for educational purpose (it's a toy library ;)). The performance can be improved and more functions can be added. But there are far better libraries already available and after understanding the basics behinds such libraries I am tempted to use one for my next post. Let's see, till then keep it constructive and have fun!