Placeholder Image

Subtitles section Play video

  • Hello and welcome to another beginner's guide

  • to machine learning with ML5.js video on pose estimation

  • and posenet.

  • So this is the third, the last one that I'll do in this series

  • here about posenet.

  • First I looked at just what posenet is and how it works

  • and how you can get the key points of a human skeleton.

  • Then I took the output of the posenet model, all

  • those key points, and fed them into another neural network

  • to do pose classification, to recognize different poses

  • that I made with my body.

  • And in this grand finale pose video,

  • I will do exactly what I did in the previous video

  • with post classification.

  • But perform a regression.

  • So the final output instead of being a classifier,

  • am I making a Y, M, C, or A pose, I will make a regression.

  • What do I mean by that exactly?

  • So to review, the setup I have is as follows.

  • [MUSIC PLAYING]

  • The system starts with an image.

  • It sends that image into the pre-trained posenet

  • machine learning model.

  • That model performs pose estimation and gives as its

  • output 17 x,y pairs.

  • Wrist, elbow, shoulder, shoulder, elbow, wrist,

  • et cetera, et cetera, et cetera.

  • And then I take all of those and feed them

  • into another neural network, an ML5 neural network, which

  • then classifies those key points as Y, M, C, or A.

  • So that's the process that I've built in the first two videos.

  • I want the final output to no longer be categorical.

  • It's not one of four option.

  • The final output is any number.

  • So you could think of it as the final output

  • is going to control a slider.

  • And that slider is going to have some sort of range.

  • So what I did previously in other examples of regression

  • in this full series if you go back,

  • I used a neural network to output a frequency value

  • to play a musical note.

  • So I certainly could do that here.

  • I could train the machine learning model

  • to play the note [SINGING] for this pose

  • and [SINGING] for this pose.

  • And I could actually have something that output like

  • [SINGING].

  • So I could go and do that.

  • And boy wouldn't that be fun to watch?

  • But I want to do something different.

  • That I'll leave as an exercise to you.

  • Make a gesture or posed based musical instrument.

  • I am going to control color.

  • And this comes from a project that I referenced inspired

  • by a viewer, Darshawn, who made a project that does an output.

  • Because specifically what I want to demonstrate

  • here is that the regression output doesn't

  • have to be a single number.

  • In this case, I want to have three values.

  • And I'm going to think of those values as an R for red,

  • a G for green, and a B for blue.

  • So I can say things like, and the training can be,

  • this pose is this particular color.

  • This pose is this particular color.

  • And then this pose is this other particular color.

  • And then as I move, it will interpolate

  • between those colors by trying to guess the value according

  • to the regression.

  • Now I'm ready to start implementing this in code.

  • So I'm not going to write everything again.

  • I'm going to start from the pose classifier.

  • And the first thing that I need to do

  • is adjust the configuration of the neural network.

  • The differences instead of four categorical outputs, Y, M, C,

  • or A, I just need three continuous outputs.

  • So I could actually just change this number to three.

  • Because it's still a number of outputs but the task

  • is now regression.

  • The other thing I really need to do

  • is think about during the training process,

  • how am I going to create these target values?

  • And this is going to be really tricky.

  • So maybe this color scenario isn't the best one.

  • I only was one person here.

  • But I think to demonstrate this idea,

  • the best way would be for me to make these literal sliders.

  • So I'm going to adjust the sliders

  • and make the target outputs based

  • on the position of the sliders.

  • And then when I actually deploy the model,

  • the model will control the sliders themselves

  • and I'll see the color.

  • I think that's going to work.

  • So this target label is no more.

  • I don't have a target label, there's no categorical output.

  • Instead I'm going to have sliders.

  • So let's comment this out and say, three sliders four red,

  • green, and blue.

  • They're all going to have a range between 0 and 255

  • with some default value, in this case 0.

  • And I'll have the sliders start with red at 255 and G

  • and B at 0.

  • So we can see these are the sliders

  • that I'm now going to control.

  • And match their positions with a given pose.

  • Now if you recall, I had this horribly awkward,

  • for a variety of reasons, interface.

  • As in, no interface at all with just key presses

  • to set a label.

  • And then I'd have this, like, callback

  • hell with nested set time outs.

  • Let me improve this for a little bit for this round.

  • So one thing that I can do to improve this,

  • and I haven't been using this throughout this video series,

  • I've been staying away from it.

  • But I'm going to replace this with something called async

  • and await.

  • These are key words that operate in JavaScript.

  • They're part of ES 8 which is a newer version of JavaScript

  • that allows me to have asynchronous events happen much

  • more sequentially in the code.

  • And I've covered this previously in several videos.

  • If you haven't seen that, you'll want

  • to go watch those or read up about promises and async and a

  • await somewhere else.

  • But what I'm actually going to do

  • is I'm just going to go get the code

  • from a very specific video where I wrote this delay function.

  • I'm going to bring that in here.

  • And then I'm going to change key press to use async and await

  • with that delay function.

  • And let me just do that and then explain what I mean.

  • [MUSIC PLAYING]

  • Oh, it is so lovely, look at it.

  • Look at this nice sequential code that's, like,

  • set the target label, console log it, wait 10 seconds,

  • then do this.

  • Then wait 10 more seconds, then do this.

  • Isn't this lovely?

  • It is really worth taking some time

  • to read up and explore async and await so that you can have

  • some much more readable code.

  • This is all still happening asynchronously.

  • JavaScript, everything happens asynchronously.

  • This is just sweet syntactic sugar

  • to make our lives a little bit more joyful today.

  • But, ah, that's not really the content of this video.

  • That's not the topic.

  • The topic is, I don't have a target label anymore.

  • What I have is--

  • and actually, let's just change this to if key equals--

  • like, I'm no longer going to be collecting

  • a particular key press.

  • So let's just have the collection

  • moment happen when I do--

  • so D for data.

  • And then I'm going to have a target color.

  • And it's going to be an array with the values of all

  • the sliders.

  • [MUSIC PLAYING]

  • So the idea is that when I pressed the D key,

  • I'm going to pull the values from the sliders.

  • I'm going to set that to a target color.

  • I'm going to wait 10 seconds so I can get in position.

  • And then start the collecting process,

  • collect for 10 seconds, and then jump out.

  • Now it would be much better interaction wise

  • if I could manipulate the sliders

  • while I'm making the pose.

  • And if I could just, like, open the magic door

  • and have a volunteer come in and help me with this,

  • that might make more sense.

  • But I guess I didn't think of that in advance

  • so I'll do that another time.

  • I also think that I'm going to be able to get

  • into position a little faster.

  • So let me change this to 3,000.

  • But I haven't done the important part.

  • This target color needs to replace the target label

  • when I collect the data.

  • That's happening right here.

  • So previously I had this target label

  • that was a character that I put into an array.

  • And then passed it and add data.

  • I think I can get rid of this now and just say target color.

  • So this should be good.

  • OK, dare I say that I can collect this data now?

  • Oh, the chat thankfully is pointing out

  • that I missed adjusting these to G

  • and B. Oh, that would have really gotten me later,

  • thank you.

  • So I think also I just want to collect

  • data for, like, 3 seconds.

  • Because I'm going to do things like set the color,

  • set the color.

  • I'm going to move my arms maybe like this.

  • And then just set a lot of different colors

  • with lots of in-between states.

  • That'll really show, I think, the regression more clearly.

  • Let me also console log what colors there just so I see it.

  • I'm going to start with the sliders

  • in their original position.

  • And press D. One, two, three.

  • Collecting.

  • OK, I got some data.

  • Now let me adjust the slider a little bit.

  • Let me add some of this color.

  • I really should pick something where I could see what it is.

  • Oh well, next time.

  • Add, press D.

  • Wait, happened to my pose?

  • Uh-oh, I have a bug.

  • Bug, bad bug.

  • Bad, bad, bad bug.

  • I re-declared target color.

  • I'm making it a global variable so that I can use it across.

  • I mean, there's ultimately a nicer way to organize the code.

  • But I want it to be a global variable.

  • So I set it here and then when I'm adding it I get it here.

  • That was the problem, OK.

  • Now, let's collect some data.

  • Collecting, OK.

  • Now, let me move the sliders around.

  • I really should visualize the color.

  • But what are you going to do?

  • I'll just add a little green and take away a little bit of red.

  • I don't know.

  • And press D again.

  • And, where was I?

  • I'll go like this.

  • Really make this pretty arbitrary.

  • Oh, it really would be good for me to see what I'm doing.

  • I'll make this pose.

  • Let's do this.

  • So you, following this along, if you're

  • going to try to build the same thing,

  • think about how you might really thoughtfully make

  • a bunch of colors with a defined set of poses

  • that means something to you.

  • I'm doing this somewhat arbitrarily

  • just to see if we get some results.

  • Now I could hit S to save the data.

  • And I have a nice JSON file, this default name

  • that downloaded.

  • Let me change this to color poses.

  • Let's take a look at in Visual Studio code

  • just to make sure it makes sense.

  • Looks like it does.

  • It's got a bunch of X's, 34.

  • It's got some Y's.

  • The Y's are the outputs, and it's an R, G, and B value.

  • So I could have done the thing where I named the outputs.

  • If I wanted to have names show up in the data

  • I could change this to--

  • [MUSIC PLAYING]

  • So ML5, the neural network is just dealing with numbers.

  • But ML5 will allow you to specify names of the output

  • so that when you get them back later