Name: Backpropagation. A peak into the mathematics of optimization
Uploaded: 2020-03-21T22:30:20.000Z
Duration: 6 min
Description: Thousands of YouTube videos with English-Chinese subtitles! Now you can learn to understand native speakers, expand your vocabulary, and improve your pronunciation...

back propagation the most intuitive lesson yet, the hardest to grasp in mathematical terms.

We will first explore the intuition behind it, then look into the mathematics, okay, And, yes, math is optional, but we would like to encourage you to look at it to gain a better understanding.

Transition words & phrases

You shouldn't skip this lesson, though, as it's the fun part of back propagation.

So far, we've seen and understood the logic of how layers air stacked.

We've also explored a few activation functions and spend extra time showing their central to the concept of stacking layers.

Moreover, by now we have said 100 times that the training process consists of updating parameters through the Grady and dissent for optimizing the objective function and supervised learning.

The process of optimization consisted of minimizing the loss.

Our updates were directly related to the partial derivatives of the loss and indirectly related to the errors or Delta's, as we called them.

Let me remind you that the deltas were the differences between the targets and the outputs.

All right, as we will see later, Delta's for the hidden layers, air trickier to define still they have a similar meaning.

The procedure for calculating them is called back Propagation of Errors.

Having these deltas allows us to very parameters using the familiar update rule.

Let's start from the other side of the coin forward propagation.

Forward propagation is the process of pushing inputs through the net.

At the end of each epoch, the obtained outputs are compared to the targets to form the errors.

Then we back propagate through partial derivatives and change each parameter, so errors that the next epic or minimized for the minimal example.

The back propagation consisted of a single step aligning the weights.

Given the errors we obtained, here's where it gets a little tricky.

When we have a deep net, we must update all the weight's related to the input layer and the head and layers.

For example, in this famous picture, we have 270 waits.

And yes, this means we have to manually draw all 270 arrows you see here, so updating all 270 waits is a big deal.

This means we have to update the weights accordingly, considering the use, non linearity, ease and their derivatives.

Finally, to update the weights, we must compare the outputs to the targets.

But we have no targets for the hidden units.

That's what back propagation is all about.

We must arrive the appropriate updates as if we had targets.

Now the way academic solve this issue is through errors.

The main point is that we can trace the contribution of each unit hidden or not, to the error of the output.

Let's look at the schematic illustration of back propagation shown here.

Each note is labeled So we have inputs X one and X two hidden layer units, output layer units.

And finally the target's t one auntie too.

The weights are W 11 w one to W 13 W 21 w 22 and W 23 For the first part of the net.

You want to you to one you two to you 31 and you three to so we can differentiate between the two types of weights.

We know the error associated with why one and why two, as it depends, unknown targets.

So let's call the two errors E one Andy to based on them, we can adjust the weights labeled with you.

For example, you won one contributes to E one.

Then we find it's derivative and update the coefficient.

Now let's examine W 11 w 11 helped us predict h one.

But then we needed h one to calculate why one and why to thus it played a role in determining both errors.

So while you won one contributes to a single error.

Therefore, it's adjustment rule must be different.

The solution to this problem is to take the errors and back, propagate them through the net, using the weights.

Knowing the U weights, we can measure the contribution of each hidden unit to the respective errors.

Then, once we found out the contribution of each head and unit to the respective errors, we can update the W waits so essentially threw back propagation, the algorithm identifies which waits lied to which errors.

Then it adjusts the weights that have a bigger contribution to the errors by more than the weights with a smaller contribution.

A big problem arises when we must also consider the activation functions.

They introduce additional complexity to this process.

Linear contributions are easy, but non linear ones are tougher.

Imagine back propagating in our introductory net.

Once you understand it, it seems very simple.

While pictorially straightforward, mathematically, it is rough to say the least.

That is why back propagation is one of the biggest challenges for the speed of an algorithm form or videos like this one, please subscribe.

Subtitles ListPlay Video

Backpropagation. A peak into the mathematics of optimization

process

tough

recap

determine