## Subtitles section Play video

• your own network's air good for learning lots of different types of patterns To give an example of how this would work.

• I imagine you had a four pixel camera, so not not for mega pixels, but just four pixels.

• And it was only black and white, and you wanted to go around and take pictures of things and determined automatically, then whether these pictures were of solid, all white or all dark image, vertical line or a diagonal line or a horizontal line.

• This is tricky because you can't do this with simple rules about the brightness of the pixels.

• Both of these are horizontal lines.

• But if you try to make a rule about which picnic pixel was bright and which was dark, you wouldn't be able to do it.

• So to do this with the neural network, you start by taking all of your inputs.

• In this case are four pixels, and you break them out into input neurons, and you assign a number to each of these, depending on the brightness or darkness of the pixel.

• Plus one is all the way.

• White minus one is all the way black, and then gray is zero right in the middle.

• So these values, once you haven't broken out and listed like this on the input neurons.

• It's also called the input vector, or array.

• It's just a list of numbers that represents your inputs.

• Right now it's a useful notion to think about the receptive field of a neuron.

• All this means is what set of inputs makes the value of this neuron as high as it can possibly be.

• For input neurons, this is pretty easy.

• Each one is associated with just one pixel.

• And when that pixel is all the way white, the value of that input neuron is this high.

• As it could go, the black and white checkered areas show pixels that an input neuron doesn't care about.

• If they're all the way white or all the way black, it still doesn't affect the value of that in putting her on it all.

• Now, to build a neural network, we create a neuron.

• The first thing this does is it adds up all of the values of the input neurons.

• So in this case, what if we add up all of those values?

• We get a 0.5 now to complicate things just a little bit.

• Each of the connections are weighted, meaning they're multiplied by a number that never could be one or minus one or anything in between.

• So, for instance, if something has a weight of minus one, it's multiplied and you get the negative of it, and that's added it.

• If something has a weight of zero, then it's effectively ignored.

• So here's what those weighted connections might look like.

• You noticed that after the values of the input neurons, air waited and added the values that the final value is completely different.

• Graphically, it's convenient to represent these weights as white links being positive waits, black links being negative weights and the thickness of line is roughly proportional to the magnitude of the weight.

• Then, after you add the weighted input neurons, they get squashed, and I'll show you what that means.

• You have a sigmoid squashing function.

• Sigmoid just means s shaped.

• And what this does is you put a value in Let's say, 0.5, and you run a vertical line up to your sigmoid and then a horizontal horizontal line over from where it crosses.

• And then where that hits the Y axis.

• That's the output of your function.

• So in this case slightly lasted 0.5.

• It's pretty close as your input number gets larger.

• Your output number also gets larger, but more slowly and eventually, no matter how big the number you put in, the answer is always, uh, less than one.

• Similarly, when you go negative, the answer is always greater than negative one.

• So this insurers that that neurons value never gets outside of the range of +12 minus one, which is helpful for keeping that computations in the neural network bounded and stable.

• So after you, some the weighted values of neurons and squash.

• The result.

• You get the output in this case 0.746 That is a neuron.

• So we can call this we can collapse all that down.

• And this is ah, neuron that doesn't weighted sum and squash the result.

• And now, instead of just one of those, I assume you have a whole bunch.

• There are four shown here, but there could be 400 or four million now to keep our picture clear, We'll assume for now that the weights are either plus one white lines minus one black lines or zero, in which case they're missing entirely.

• But in actuality, all of these neurons that we created our each attached toe all of the input neurons and they all have some weight between minus one and plus what?

• When we create this first layer of our neural network, the receptive fields get more complex.

• For instance, here each of those end up combining two of our input neurons.

• And so the value, the receptive field, the pixel values that make that first layer neuron as large as it could possibly be.

• Look now, like pairs of pixels, either all white or a mixture of white black, depending on the weights.

• So, for instance, this neuron here is attached to this input pixel, which is upper left, and this input pixel just lower left.

• And both of those weights are positive, so it combines the two of those.

• And that's its receptive field.

• The receptive field of this one plus receptive field, this one.

• However, if we look at this neuron, it combines are this pixel upper right and this pixel lower right.

• It has a weight of minus one for the lower right pixel, so that means it's most active when this pixel is black.

• So here is It's receptive field now, uh, the because we were careful of how we created that first layer.

• Its values look a lot like input values, and we can turn right around and create another layer on top of it, the exact same way with the output of one layer being the input to the next layer.

• And we can repeat this three times or seven times or 700 times for additional layers each time.

• The receptive fields get even more complex, so you can see here using the same logic.

• Now they cover all of the pixels and Maur more special arrangement of which are black and which are white.

• We can create another layer er again.

• All of these neurons in one layer are connected toe all of the neurons in the previous layer.

• But we're assuming here that most of those weights or zero and not shown it's not generally the case, Um, so just to mix things up, we'll create a new layer.

• But if you notice are squashing function isn't there anymore?

• We have something new called a rectified linear unit.

• This is another popular neuron type so you do your weighted sum of all your inputs.

• And instead of squashing, you do rectified linear units.

• You rectify it.

• So if it is negative, you make the value.

• Zero.

• If it's positive, you keep the value.

• This is obviously very easy to compute, and it turns out to have very nice stability properties for neural networks as well in practice.

• So after we do this, because some of our weights are positive and some are negative connecting to those rectified linear units, we get receptive fields and they're opposites.

• You look at the patterns there, and then finally, when we've created as many layers with as many neurons as we want, we create an output layer.

• Here we have four outputs they were interested in Is the image solid vertical, diagonal or horizontal?

• So to walk through an example here of how this would work, let's say we start with this input image showed on the left dark pixels on top, white on the bottom as we propagate that to our input layer.

• This is what those values would look like.

• The top pixels, the bottom pixels.

• As we move that to our first layer, we can see the combination of a dark pixel in a light pixel.

• Some together get a zero gray.

• Um, where is down here?

• We have the combination of a dark pixel, plus a light pixel with a negative wait.

• So that gets as a value of negative one there.

• Which makes sense, because if we look at the receptive field here upper left pixel white, lower left, pixel black, it's the exact opposite of the input that we're getting.

• And so we would expect its value to be as low as possible minus one.

• As we move to the next layer, we see the same types of things combining zeros to get zeros, um, combining a negative and in negative with the negative weight, which makes a positive to get a zero.

• And here we have combining two negatives to get a negative.

• So again you'll notice the receptive field of this is exactly the inverse of our input.

• So it makes sense that its weight would be negative.

• His value would be negative, and we moved to the next layer.

• All of these, of course, the zeros propagate forward.

• Um, here, this is a negative, has a negative value and it gets has a positive wait, so it just moves straight forward.

• Because we have a rectified linear unit.

• Negative values become zero, so now it is zero again, too.

• But this one gets rectified and becomes positive.

• Negative times negative is positive.

• And so when we finally get to the output, we can see they're all zero, except for this horizontal, which is positive.

• And that's the answer, Our neural network said.

• This is an image of a horizontal line.

• Now neural networks usually aren't that good, not that clean.

• So there's a notion off with an input.

• What is truth in this case?

• The truth is, this has a zero for all of these values, but a 14 horizontal.

• It's not solid.

• It's not vertical.

• It's not diagonal.

• Yes, it is horizontal.

• An arbitrary neural network will give answers that are not exactly truth.

• It might be off by a little or a lot.

• And then the error is the magnitude of the difference between the truth and the answer given and you can add all these up to get the total error for the neural network.

• So the idea the whole idea with learning and trading is to adjust the weights to make the error as low as possible.

• So the way this is done is we put an image in.

• We calculate the error at the end, then we look for how to adjust those weights higher or lower to either make that error go up or down.

• And we, of course, it just the weights in the way that make the error go down.

• Now the problem with doing this is each time we go back and calculate the error, we have to multiply all of those weights by all of the neuron values at each layer.

• And we have to do that again and again once for each wait.

• This takes forever in computing terms on computing scale.

• And so it's not a practical way to train a big neural network you can imagine.

• Instead of just rolling down to the bottom of a simple valley.

• We have a very high dimensional valley, and we have to find our way down.

• And because there are so many dimensions one for each of these weights that the computation just becomes prohibitively expensive.

• Luckily, there was an insight that lets us do this in a very reasonable time, and that's that.

• If we're careful about how we design our neural network, we can calculate the slope directly.

• The Grady in't We can figure out the direction that we need to adjust the weight without going all the way back through our neural network and re calculating.

• So just review.

• The slope that we're talking about is when we make a change in weight, the error will change a little bit.

• And that relation of the change in weight to the change in error is this slope.

• Mathematically, there are several ways to write.

• This will favor the one on the bottom.

• It's technically most correct.

• We'll call it D d w for shorthand every time you see it.

• Just think that change in error when I change, await or the change in the thing on the top when I change the thing on the bottom.

• Um, this is, uh, does get into a little bit of calculus.

• We do take derivatives.

• It's how we calculate slope.

• If it's new to you, I strongly recommend a good semester of calculus just because the concepts air so universal and, ah, a lot of them have very nice physical interpretations, which I find very appealing.

• But don't worry.

• Otherwise, just gloss over this and pay attention to the rest, and you'll get a general sense for how this works.

• So in this case, if we change the weight by plus one, the error changes by minus two, which gives us a slope of minus two.

• That tells us the direction that we should adjust our weight and how much we should adjust it to bring the error down.

• Now, to do this, you have to know what your error function is.

• So assume we had error function.

• That was the square of the weight.

• And you can see that our weight is right at minus one.

• So the first thing we do is we take the derivative change in error, divided by changing Wait d E d w, the derivative of weight squared is two times the weight.

• And so we plug in our weight of minus one and we get a slope D d w of minus two.

• Now, the other trick that lets us do this with deep neural networks is chaining.

• And to show you how this works, imagine a very simple, trivial neural network with just one hidden layer, one input layer, one output layer and one wait connecting each of them.

• So it's obvious to see that the value why is just the value x times the weight connecting them?

• W one.