Placeholder Image

Subtitles section Play video

  • what is going on?

  • Everybody.

  • And welcome to part four of the reinforcement learning Siri's in this video we're gonna be doing is building our own cue learning environment.

  • So the first thing I wanted to do when I learned to learning yes, it was useful to use the Open.

  • Aye, aye, Jim Environment.

  • But the first thing I want to do is make my own environment, and I didn't really intend to make a tutorial out of it, but people were asking.

  • And then it's kind of like, Well, obviously that's what I wanted to do.

  • So of course, that's what other people probably want to do, too.

  • So anyway, here we are.

  • Let's make our own environment.

  • So I'm actually just gonna kind of run through the environment that I made and explain it as I go.

  • So well, we're just gonna do the program linearly.

  • Ah, but if you have questions, whatever, as always, comment below, join the discord dot gov slash Centex, um, and feel free to ask any questions, but it should be pretty self explanatory, super complicated environment.

  • And just in case, I guess I'll explain it a little bit before we get in But so, first of all, I love blobs every I used blobs for, like, examples of everything.

  • So if you haven't noticed by now, um I'm a blob.

  • A file anyway, So I always wanna make blob things.

  • And so the idea was to make blobs, and in this case, you've got a player blob of food Blob.

  • That's the objective we're trying to achieve.

  • And then just for, uh, you know, complex city, I added an enemy blob as well.

  • So the idea is that you have this player blob and we'll start by having the enemy and the food not moving, just stay stationary and they just kind of initialize in random positions.

  • And then the player blob has to move to get to that position, and then later we can add in movement.

  • It's just that letting the things move is kind of irrelevant to the model.

  • Learning howto actually move except in one example, which is so if the enemy is moving and you're moving, you, couldn't they remove into each other.

  • If the enemy is not moving in training, you, you might learn that it's totally fine to get close to the enemy as long as you don't hit the enemy.

  • But if the enemy can also move, they could collide.

  • But anyway, not really worried about that.

  • But you guys can feel for you to tinker with that.

  • Anyway, let's get into it.

  • So we're gonna be using open CV.

  • So the first thing that I want to do is open up a command prompt and let's go ahead and pip Install python dash open CVS.

  • Make sure we have that is open so it might be open C v dash python hoping C v dash Python.

  • Okay, I feel like I always get that wrong anyway.

  • Okay, Pip, install open C V dash pine time.

  • If for whatever reason Python dash opens evey did install for you, you'll probably just installed something really bad.

  • So don't use it.

  • Get rid of it moving along.

  • So, uh, all right, so we are also going to use numb pie, but you should already have that.

  • And then we're going to use the python imaging library, and I want to say that would be a pip install pillow.

  • This is a virtual machine, and it should be clean.

  • So if that's not what I want, Um we'll find out soon enough.

  • But we'll get robbed.

  • Pillow and then to use pillow you import capital P I L So that should be everything we need.

  • Let's go ahead and in poor in poor, numb pie as in peace or use numb pie for the array types of stuff from pl we're going to import import how to type image with capital I import CV to We're gonna import Matt plot lib dot pipe lot as p l t.

  • We're going to import pickle on.

  • That's to save and load our cue table.

  • Then we're going to from Matt.

  • Plot lived will import style just to make her graph pretty.

  • And we'll do style that use g pliant, uh, and then finally will import time.

  • We're unused time purely to set dynamic que table file names.

  • And so it's It has, like, some order.

  • What's your deal?

  • Probably to underfund 00 it's type of style, huh?

  • Okay, great.

  • So the first thing I'm gonna do is just try to run this real quick, make sure it actually runs.

  • It does.

  • Okay, so all our imports worked, and we're gonna get so s.

  • So now we're gonna have some lips.

  • Did I just delete?

  • Totally did.

  • Okay, so now we're gonna start with some of our constant.

  • So we're going to say size and we'll say this is 10.

  • So we're going to skip the whole, make a huge environment and then boil it down to action or observation spaces.

  • Discreet observation spaces.

  • Just skip that step.

  • I'm just gonna make it a grid.

  • So in this case, it's going to be a 10 by 10 grid.

  • So the player, the food and the enemy will be initialized at a random location on a 10 by 10.

  • But we can change this.

  • As time goes on, I'll talk about how some of those changes will impact things.

  • But obviously, as you increase this size, especially depending on the size of your actions basis, well, that is going to just exponentially explode the number of, ah, you know, possible combinations in your cue table.

  • So, anyway, we'll start with a 10 by 10.

  • That should be pretty simple.

  • How many episodes of H M episodes was A 25,000 were going to say we're going to give a move, Move, underscore Penalty.

  • That's gonna be a one.

  • An enemy underscore.

  • Penalty.

  • So this is if we if we hit the enemy well, it's a it's a 300 so we'll subtract that penalty.

  • Basically, we're going to say a food reward 25.

  • This is I haven't really decided where I want this.

  • I don't know if I really wanted to be 10 like we had with the mountain car.

  • 25.

  • Haven't really decided.

  • I don't really know what's the best way to it was the best one to you, so I'm just throwing 25.

  • Um, I haven't noticed anything.

  • You know, huge.

  • Tell me one way or the other.

  • Um, we're gonna have capsule line lower case, because it's gonna change over time.

  • We'll start at 0.9.

  • Another thing that we could change a time.

  • EPPS decay.

  • So it's Epsilon decay.

  • Zero point I went with triple 98 and the way I came into this was I just made a four loop and I literally just decayed.

  • So I just took Absalon times.

  • I'm sorry for, you know, for I and range say 25,000 multiplied this and I just kind of looked and I was like, Okay, that looks good.

  • So I literally just pulled that number out of nowhere.

  • Moving along.

  • Um, now we'll say show underscore.

  • Every says it's just like before.

  • How often do we want to show?

  • I'm gonna say every 3000 episodes, if I recall.

  • Right This, Actually, my environment is actually a little quicker than, uh, the mountain car one.

  • Okay.

  • Show every 3000.

  • Okay, so now what we're gonna say is start, we'll have a variable here, start Q table table.

  • And for now, we'll say none.

  • But then it could or file name.

  • So if you happen to have an existing que table that you want to, like, load in and train from that point, you would throw that right in there.

  • Now, you might want to do this for a variety of reasons.

  • You want to continue training, or, uh or at least we haven't really coated in a good way to, like, decay Epsilon to zero in them, you know, trained for a little bit and then reintroduce epsilon.

  • We don't really have a good way to do that.

  • So if you really wanted to do that, you could decay epistle on zero or trained for a little bit, as you know, with very low or no Absalon and then loaded in ad Absalon and continue.

  • And I've actually found that that works really well.

  • Toe like decay.

  • Absalon let it go for a little bit and then set ups long back 2.9 or something indicate again and then let it train for a while, then decay again.

  • And keep repeating that that that seems actually learned pretty well.

  • So anyway, for now, it's none because we don't have a cute table.

  • But if we had one, we just passed the file name.

  • So now we'll throw in learning.

  • Underscore.

  • Rates were going to say 0.1 discount will set again.

  • 0.95 These two variables, I really haven't played with much, so I couldn't really tell you how it's gonna impact things.

  • But now that you know how to do analysis from the previous video feel for you tinker with them.

  • Uh, now what I'm gonna do is I'm gonna give a player in equals one.

  • We're gonna save food.

  • And these are just numbers for labels and keys.

  • Rather in a dictionary.

  • That's it.

  • So player and food and and and enemy end three again.

  • It's just definitions.

  • For what number?

  • These things represent in a dictionary.

  • So we're just a d equals and then we're gonna have one Colin something.

  • Um, biologist, Fill it in.

  • So these will be the color.

  • So to 55 1 75 0 Now, I'm actually defining these colors in B G or format.

  • Uh, even though I'm really true.

  • I was really trying to use RGB.

  • So side project is if anybody can tell me why is it be gr I don't think it's like the I don't know.

  • Maybe I'll figure it out as we go, but that's that was a problem I haven't solved yet anyway, to 55 0 So the player B is mostly blue, some green, so it's kind of like a lite ish blue.

  • Um, food is B g.

  • So full green.

  • So the food is green, and then the enemy will be 00 to 55.

  • So beauty are so maximum are so it'll be very red.

  • Cool.

  • So, um, you don't like that, huh?

  • All right.

  • I'm surprised it doesn't surprise.

  • Is accepting my spacing over of the dictionary.

  • I don't think that's Pepe.

  • Maybe it is.

  • Must it must be, if it's accepting it.

  • Uh, okay, so now what we need is we need a blob class because, really, all of these blobs are gonna have a lot of the same attributes.

  • At least they're going to need to be able to move.

  • They're going to need a starting location like they need to be initialized randomly.

  • They're gonna need to be able to be moved.

  • And then later, basically our observation.

  • I didn't really want to use because you would have, like, a huge observation space if you needed to pass.

  • What is the location of every bully?

  • I felt like it.

  • You'd have to be.

  • The problem would be much more complex, I guess if you passed these the physical location of everything.

  • So my my my plan is instead the observation space is actually going to be the relative position off the food and then the relative position of the enemy to the player.

  • That's gonna be the observation.