Neural Networks Breakdown Part IV
Continuing from where we left off, we have now covered the activation function and how all the calculations happen within each node of the neural network to give the final output. We will now continue to the update step.
<h2>Adjustment/Update Step Continued</h2>
In this step, there is a backwards propagation through the neural network, to fix the weights so that the loss or cost can be reduced as much as possible.
Now this is the part where knowledge of differentiation and how it helps in minimizing or maximizing a function comes into play. In this case we have to minimize the cost function with respect to all the parameters we have in the neural network, which are the weights and bias values.
We haven't discussed about bias values yet. Bias values are actually added to the weighted sum which was sent into the activation function. The actual output is calculated by adding the bias as shown.
<center><font size="+2"><b>z<sub>11</sub>=∑w<sub>ji</sub>x<sub>i</sub> + b</b></font></center>
The "b" is the bias value and is same for the entire layer, meaning the b value is added to all the weighted sum values. This also is a parameter which has to be adjusted as it influences the output of the neural network. The reason for having a bias value is to shift the graph from the origin. Suppose, we have the normal equation of a line which is y=mx. Now, if the y is assumed to be the output of the neural node, x is the input to the node and m being the weights, we see a graph which will pass through the origin. If we wanted an output of zero from the neural node with an input of 2, this would be impossible without having the weights set to zero, which would defeat the purpose of the whole neural network. Hence, we need some value c, to shift the graph, which makes the equation to y=mx+c. The bias is analogous to the function of c in the line equation.
To adjust the parameters, we have to start with differentiating the cost function with respect to the weights and the biases respectively. These are the calculations we first need to do.
<center><img src="/static/differentials.png" alt="nn4"></img></center>
The first one is differentiating the cost with respect to the weights and the second one is with respect to the bias. So the cost is 0.5x(Y<sub>output</sub>-Y<sub>desired</sub>)<sup>2</sup>. To know how much to adjust the W3 weight layer, which were the weights for the final/output layer, we calculate this.
<center><img src="/static/dw3.png" alt="dw3"></img></center>
A2 represents the output values of the second layer of the neural network.
To know how much to adjust the W2 weight layer, we calculate d(cost)/dW2.
<center><img src="/static/dw2.png" alt="dw2"></img></center>
The derivative of the hidden weight layers (all the weight layers excluding the final weight layer) are calculated by a general formula.
<center><img src="/static/dgeneral.png" alt="dgeneral"></img></center>
The same is done using the bias values, where we calculate d(cost)/db3, d(cost)/db2, etc.
What has to be done with the values when we have calculated them? We have to understand what these values indicate. For example, let us say we have dy/dx. If the derivative is negative, that means value of y is reducing just a little, when value of x is increased just a little. If the derivative is positive, that means value of y increases just a little, when the value of x is increased just a little.
What is the aim again? Yes, to reduce the cost. So the weights should be adjusted by subtracting the derivative value from it.
<center><b><font size="+2">W = W - d(cost)/dW</font></b></center>
<center><b><font size="+2">b = b - d(cost)/db</font></b></center>
Now, why is it subtracted? If the derivative is negative, which means the cost is reducing when the weights are increasing, hence, we would want to increase the weight values even more, which is accomplished through the W = W - (negative value), which increases the value of W (negative-negative gives positive!). If the derivative is positive, which means the cost is increasing when the weights are increasing, the weights have to be reduced, which is accomplished through the W = W - (positive value), which decreases the value of W.
This was the concept of the weight and bias adjustment. We also add another factor called the learning rate to stabilize the learning of the neural network. This is set through trial and error, where the learning is the most stable. The final equation of the adjustment will look like,
<center><b><font size="+2">W = W - ((learning rate) x d(cost)/dW)</font></b></center>
This is done for the bias adjustment as well.
This is how backpropagation works in a neural network. When this step is completed, the neural network is ran again, and the whole process is repeated, forward propagation and then backpropagation. We do it many times till the cost is reduced and can't be reduced any further.
<a href="/Neural-Networks-Breakdown-Conclusion">Click here</a> to read the conclusion to this series.
- Shubham Anuraj, 02:03, 18 February, 2019