Subtitles section Play video Print subtitles And you thought we were done with the ML5 neural network tutorials. But no. There is one more because I am leading to something. I am going to-- you will soon see in this playlist a section on convolutional neural networks. But before I get to convolutional neural networks, I want to look at reasons why a convolutional layer. I have to answer this question like, what is a convolution? I've got to get to that. But before I get to that, I want to just see why they exist in the first place. So I want to start with another scenario for training your own neural network. That scenario is an image classifier. Now you might rightfully be sitting there saying to yourself, you've done videos on image classifiers before. And in fact, I have. The very beginning of this whole series was about using a pre-trained model for an image classifier. And guess what? That pre-trained model had convolutional layers in it. So I want to now take the time to unpack what that means more and look at how you could train your own convolutional neural network. Again, first though, let's just think about how we would make an image classifier with what we have so far. We have an image. And that image is being sent into an ML5 neural network. And out of that neural network comes either a classification or regression. And in fact, we could do an image regression. And I would love to do that. But let me start with a classifier because I think it's a lot simpler to think about and consider. So maybe it comes out with one of two things, either a cat or a dog and some type of confidence score. I previously zoomed in on the ML5 neural network and looked at what's inside, right? We have this hidden layer with some number of units and an output layer, which, in this case, would have just two if there's two classes. Everything is connected, and then there are the inputs. With post net, you might recall, there were 34 inputs because there were 17 points on my body, each with an xy position. What are these? Let's just say, for the sake of argument, that this image is 10 by 10 pixels. So I could consider every single pixel to be an individual input into this ML5 neural network. But each pixel has three channels, and R, G, and B. So that would make 100 times three inputs, 300 inputs. That's reasonable. So this is actually what I want to implement. Take the idea of a two layer neural network to perform classification, the same thing I've done in previous videos, but, this time, use as the input the actual raw pixels. Can we get meaningful results from just doing that? After we do that, I want to return back to here and talk about why this is inadequate or not going to say inadequate but how this can be improved on by adding another layer. So this layer won't-- sorry. The inputs will still be there. We're always going to have the inputs. The hidden layer will still be there. And the output layer will still be there. But I want to insert right in here something called a convolutional layer. And I want to do a two dimensional convolutional layer. So I will come back. If you want to just skip to that next video, if and when it exists, that's when I will start talking about that. But let's just get this working as a frame of reference. I'm going to start with some prewritten code. All this does, it's a simple P5JS sketch that opens a connection to the web cam, resizes it to 10 by 10 pixels, and then draws a rectangle in the canvas for each and every pixel. So this could be unfamiliar to you. How do you look at an image in JavaScript in P5 and address every single pixel individually? If that's unfamiliar to you, I would refer to my video on that topic. That's appearing over next to me right now. If you go take a look at that and then come back here. But really, this is just looking at every x and y position, getting the R, G, B values, filling a rectangle, and drawing it. So what I want to do next is think about, how do I configure this ML5 neural network, which expects that 10 by 10 image as its input? I'm going to make a variable called pixel brain. And pixel brain will be a new ML5 neural network. I should have mentioned that you could find the link to the code that I'm starting with, in case you wanted to code along with me, both the finished code and the code I'm starting with will be in this video's description. So to create a neural network, I call the neural network function and give it a set of options. One thing I should mention is while in all the videos I've done so far, I've said that you need to specify the number of inputs and the number of outputs to configure your neural network. The truth is ML5 is set up to infer the total number of inputs and outputs based on the data you're training it with. But to be really explicit about things and make the tutorial as clear as possible, I'm going to write those into the options. So how many inputs? Think about that for a second. The number of columns times the number of the rows times R, G, B. Maybe I would have a grayscale image. Maybe I could just make it I don't need a separate input for R, G, and B. But let's do that. Why not? I have the 10 by 10 in a variable called video size. So let's make that video size times video size times three. Let's just make a really simple classifier that's like I'm here or not here. So I'm going to make that two. The task is classification. And I want to see debugging when I train the model. Now I have my pixel brain, my neural network. Oops. That should be three. Let's go with my usual typical, terrible interface, meaning no interface. And I'm just going to train the model based on when I press keys on the keyboard. So I'll add a key press function. And then let me just a little goofy here, which I'm just going to say when I press the key, add example key. So I need a new function called add example. Label. So basically, I'm going to make the key that I press the label. So I'm going to press a bunch of keys when I'm standing in front the camera and then press a different key when I'm not standing in front of the camera. Now comes the harder work. I need to figure out how to make an array of inputs out of all of the pixels. Luckily for me, this is something that I have done before. And in fact, I actually have some code that I could pull from right in here, which is looking at how to go through all the pixels to draw them. But here's the thing. I am going to do something to flatten the data. I am not going to keep the data in its original columns and rows orientation. I'm going to take the pixels and flatten them out into one single array. Guess what? This is actually the problem that convolutional neural networks will address. It's bad to flatten the data because its spatial arrangement is meaningful. I'll start by creating an empty array called inputs. Then I'll loop through all of the pixels. And to be safe, I should probably