Placeholder Image

Subtitles section Play video

  • And you thought we were done with the ML5 neural network

  • tutorials.

  • But no.

  • There is one more because I am leading to something.

  • I am going to-- you will soon see in this playlist

  • a section on convolutional neural networks.

  • But before I get to convolutional neural networks,

  • I want to look at reasons why a convolutional layer.

  • I have to answer this question like, what is a convolution?

  • I've got to get to that.

  • But before I get to that, I want to just see why

  • they exist in the first place.

  • So I want to start with another scenario

  • for training your own neural network.

  • That scenario is an image classifier.

  • Now you might rightfully be sitting

  • there saying to yourself, you've done videos

  • on image classifiers before.

  • And in fact, I have.

  • The very beginning of this whole series

  • was about using a pre-trained model for an image classifier.

  • And guess what?

  • That pre-trained model had convolutional layers in it.

  • So I want to now take the time to unpack what that means more

  • and look at how you could train your own convolutional neural

  • network.

  • Again, first though, let's just think

  • about how we would make an image classifier

  • with what we have so far.

  • We have an image.

  • And that image is being sent into an ML5 neural network.

  • And out of that neural network comes either a classification

  • or regression.

  • And in fact, we could do an image regression.

  • And I would love to do that.

  • But let me start with a classifier

  • because I think it's a lot simpler to think about

  • and consider.

  • So maybe it comes out with one of two things,

  • either a cat or a dog and some type of confidence score.

  • I previously zoomed in on the ML5 neural network

  • and looked at what's inside, right?

  • We have this hidden layer with some number

  • of units and an output layer, which, in this case,

  • would have just two if there's two classes.

  • Everything is connected, and then there are the inputs.

  • With post net, you might recall, there were 34 inputs

  • because there were 17 points on my body,

  • each with an xy position.

  • What are these?

  • Let's just say, for the sake of argument,

  • that this image is 10 by 10 pixels.

  • So I could consider every single pixel

  • to be an individual input into this ML5 neural network.

  • But each pixel has three channels,

  • and R, G, and B. So that would make 100 times three inputs,

  • 300 inputs.

  • That's reasonable.

  • So this is actually what I want to implement.

  • Take the idea of a two layer neural network

  • to perform classification, the same thing I've

  • done in previous videos, but, this time, use as the input

  • the actual raw pixels.

  • Can we get meaningful results from just doing that?

  • After we do that, I want to return back to here

  • and talk about why this is inadequate or not going

  • to say inadequate but how this can be improved on

  • by adding another layer.

  • So this layer won't--

  • sorry.

  • The inputs will still be there.

  • We're always going to have the inputs.

  • The hidden layer will still be there.

  • And the output layer will still be there.

  • But I want to insert right in here

  • something called a convolutional layer.

  • And I want to do a two dimensional convolutional

  • layer.

  • So I will come back.

  • If you want to just skip to that next video,

  • if and when it exists, that's when I

  • will start talking about that.

  • But let's just get this working as a frame of reference.

  • I'm going to start with some prewritten code.

  • All this does, it's a simple P5JS sketch

  • that opens a connection to the web cam,

  • resizes it to 10 by 10 pixels, and then

  • draws a rectangle in the canvas for each and every pixel.

  • So this could be unfamiliar to you.

  • How do you look at an image in JavaScript in P5

  • and address every single pixel individually?

  • If that's unfamiliar to you, I would refer

  • to my video on that topic.

  • That's appearing over next to me right now.

  • If you go take a look at that and then come back here.

  • But really, this is just looking at every x and y position,

  • getting the R, G, B values, filling a rectangle,

  • and drawing it.

  • So what I want to do next is think about,

  • how do I configure this ML5 neural network,

  • which expects that 10 by 10 image as its input?

  • I'm going to make a variable called pixel brain.

  • And pixel brain will be a new ML5 neural network.

  • I should have mentioned that you could find the link to the code

  • that I'm starting with, in case you

  • wanted to code along with me, both the finished code

  • and the code I'm starting with will

  • be in this video's description.

  • So to create a neural network, I call the neural network

  • function and give it a set of options.

  • One thing I should mention is while in all the videos

  • I've done so far, I've said that you

  • need to specify the number of inputs

  • and the number of outputs to configure your neural network.

  • The truth is ML5 is set up to infer

  • the total number of inputs and outputs

  • based on the data you're training it with.

  • But to be really explicit about things

  • and make the tutorial as clear as possible,

  • I'm going to write those into the options.

  • So how many inputs?

  • Think about that for a second.

  • The number of columns times the number of the rows times

  • R, G, B. Maybe I would have a grayscale image.

  • Maybe I could just make it I don't

  • need a separate input for R, G, and B. But let's do that.

  • Why not?

  • I have the 10 by 10 in a variable called video size.

  • So let's make that video size times video size times three.

  • Let's just make a really simple classifier that's

  • like I'm here or not here.

  • So I'm going to make that two.

  • The task is classification.

  • And I want to see debugging when I train the model.

  • Now I have my pixel brain, my neural network.

  • Oops.

  • That should be three.

  • Let's go with my usual typical, terrible interface,

  • meaning no interface.

  • And I'm just going to train the model based on when

  • I press keys on the keyboard.

  • So I'll add a key press function.

  • And then let me just a little goofy here,

  • which I'm just going to say when I press the key,

  • add example key.

  • So I need a new function called add example.

  • Label.

  • So basically, I'm going to make the key that I press the label.

  • So I'm going to press a bunch of keys

  • when I'm standing in front the camera

  • and then press a different key when I'm not standing

  • in front of the camera.

  • Now comes the harder work.

  • I need to figure out how to make an array of inputs

  • out of all of the pixels.

  • Luckily for me, this is something

  • that I have done before.

  • And in fact, I actually have some code

  • that I could pull from right in here,

  • which is looking at how to go through all the pixels

  • to draw them.

  • But here's the thing.

  • I am going to do something to flatten the data.

  • I am not going to keep the data in its original columns

  • and rows orientation.

  • I'm going to take the pixels and flatten them out

  • into one single array.

  • Guess what?

  • This is actually the problem that

  • convolutional neural networks will address.

  • It's bad to flatten the data because its spatial arrangement

  • is meaningful.

  • I'll start by creating an empty array called inputs.

  • Then I'll loop through all of the pixels.

  • And to be safe, I should probably