Linear Regression - Laksh Gupta

Alright, in the last post we looked at the very basic building block of a neural network: a neuron. But what could possibly a single neuron be good for? Well, as I mentioned in my last post it can be used to learn very simple models. Let us try to solve a linear regression problem using a neuron.

Linear regression is the simplest form of regression. We model our system with a linear combination of features to produce one output.
- Brian Dolhansky

The Problem

I’ll use the problem used in the Andrew Ng’s machine learning course. The dataset is located here. We will try to predict the profit for the franchise based on the population of the city. We’ll use the previous data to prepare a model. So let us first understand the data.

Looking at the data we can say that we don’t need a complex model and linear regression is good enough for our purpose.

Training a model

Our neuron will receive two values as an input. One of them is the actual value from the data and the other is a bias value. We usually include the bias value along with the input feature matrix x.

b is the bias, a term that shifts the decision boundary away from the origin and does not depend on any input value.
- Wikipedia

Since we want to linearly fit the data, we’ll use the linear activation function. When our neuron will receive the inputs, we’ll calculate the weighted sum and consider that as our output from the neuron.

$$f(x_i,w) = \phi(\sum\limits_{j=0}^n(w^j x_i^j)) = \sum\limits_{j=0}^n(w^j x_i^j) = w^Tx_i$$

where

$i$ represents a row of a matrix
$j$ represetns an element of a matrix

The other way to look at our setup is that we are trying to fit a line to the data represented as

$y_i = w^0x_i^0 + w^1b$

We then try to figure out how close our neuron output or prediction is from the actual answer, i.e. we’ll apply a loss function, also known as a cost function over our dataset. A commonly used one is the least square error:

$$J(w) = \sum\limits_{i=0}^n(f(x_i,w) - y_i)^2$$

The idea is to use this value to modify our randomly initialized weight matrix till the time we stop observing the decrease in the cost function value. The method we’ll use to modify the weight matrix is known as Gradient Descent.

$$w = w - \frac{\alpha}{m}\Delta J(w)$$

here

$w$ is the weight matrix
$\alpha$ is the learning rate
$m$ is the size of our data acting as a normalizing factor
$\Delta J(w)$ is the gradient of the cost function with respect to each of the weight under consideration say weight for the connection between a neuron $j$ and a neuron $k$

$\frac{\partial}{\partial w_{jk}} J(w) = \sum\limits_{i=0}^n 2\left(f(x_i, w)-y_i\right) \frac{\partial}{\partial w_{jk}} f(x_i, w)$

So let us train the model and see how it is behaving by plotting the results of the above equation in red using the weight matrix and the x-axis.

Making a Prediction

To make a prediction we just need to use the modified weight matrix, obtained after the gradient descent step, along with the new input values and apply the same function we used above:

$$f(x_i,w) = w^Tx_i$$