Neural Networks Breakdown Part II
You may want to check <a href="/post/33">this</a> first, if you haven't.
This is where the "tricky" math will come in, which is actually not that tough. The main concept that you must be aware of is calculating the derivative.
The derivative basically means, calculating the change in some value due to a change in some other value. More specifically, it is the amount of change which occurs in a variable, when another variable changes by the slightest amount possible, also called infinitesimally small. Just to get a grip on it, you can check out this great Khan Academy video explaining derivatives.
<iframe style="width:100%; height: 400px;" src="https://www.youtube.com/embed/rAof9Ld5sOg" frameborder="0" allowfullscreen></iframe>
Calculating the derivatives actually helps us know whether the neural network is learning or not. But derivatives of what? How does it help in learning? How does it know if it is learning? These are the probable questions which can arise. This is what we will get into.
The neural network traverses the neural network two times. Once from the fornt and the second time from the back, forward propagation and backward propagation. The forward propagation helps in calculating the output of the entire network, whilst, the backward propagation helps in telling us how much the weights (the connections between each layer), should be adjusted so as to get an output which is closer to the expected output.
The backpropagation step is where the real learning happens. When we get an output, we can tell whether, it is right or wrong, as we just need to compare it with the expected output or label which was assigned with the data (0 or 1). The artificial neural network also needs to know how incorrect the output is, with respect to the labelled data.
For this purpose, we use a loss function to calculate the amount of loss that will take place when we use the weights which have been currently assigned to the neural network. There are many types of loss functions, namely, cross-entropy error function, mean squared error function, Gaussian log likelihood function, etc.
For starters, we can use the mean squared error function, for calculating the loss incurred by a neural network. The mean squared error looks like:
<center><font size="+3"><b>loss = (1/n) x Σ(d<sub>i</sub>-y<sub>i</sub>)<sup>2</sup></b></font></center>
Here, n = number of examples used for training, d<sub>i</sub> = the label (0 or 1 for the dog example) for the i<sup>th</sup> example, y<sub>i</sub> = output of the i<sup>th</sup> example from the neural network.
The objective of the neural network during the back-propagation is to reduce the loss function value after each iteration of the back-propagation. How will it do it? Adjust the weights.
The neural networks adjust the weight, so that the loss function is reduced to the point that it can't be reduced anymore.
The adjustment is the actual <b>"learning"</b> the neural network does.
How does the neural network adjust the weights so that the error is reduced? Recap the derivatives part and click <a href="/post/36">here</a>.
- Shubham Anuraj, 11:56PM, 16 July, 2018