Placeholder Image

Subtitles section Play video

  • what's going on?

  • Everybody And welcome to self driving cars with Python Carla and hopefully, some reinforcement learning.

  • We'll see.

  • Ah, where we left off.

  • We actually built the environment code that we're going to use sort of the environment layer on top of the Carla server client code.

  • Basically, uh And so this is this is where this is basically what translates information to our agent, which is what's going to train our actual model.

  • Now what we want to do today is actually code that agent.

  • But doing that brings up a different problem that we have.

  • And that is we want to be able to predict and trained at the same time, right?

  • That's pretty inherent to reinforcement learning.

  • The problem is, this is a relatively complex environment that requires a decent amount of processing, both just to display the environment.

  • But then the actual model that we're gonna wind up with is taking in a large amount of data because it's taking in this image data here.

  • So the model's gonna be large.

  • Okay, there's a lot of weights going on in a lot of calculations.

  • So So we've got that as a problem.

  • The other thing is we want to do we really want to do this of real time?

  • We could do this in synchronous mode with Carla, so it's not running full, like full simulation speed, right?

  • It's running in waiting for everyone to get their update.

  • And you could say I want this to run exactly 60 frames per second or 42 or what?

  • For that you wanted not good.

  • So So instead, what we have is this this challenge where we want to train and predict at the same time on something that isn't going to be super fast.

  • But yet at the same time, we want this to happen at I mean ideally 90 frames a second or 60 frames a second.

  • But we're probably not gonna see anything even remotely close to that.

  • Um, but we want as good as we can get.

  • So we're gonna end up using threats.

  • So, uh, and basically that way we can train and predict at the same time and hopefully not have so much delay in between doing these things.

  • So anyway, that's what we're going to be doing today, or at least starting to d'oh.

  • So what I'm gonna do is.

  • I'm just gonna come down here, uh, and then we're gonna begin some new code.

  • So there is a class D Q and agents, um, and let me just bring this over.

  • So if you haven't done that reinforcement learning serious, I strongly recommend you pause now and do that.

  • Siri's If you go to Python Permanent neck, go to machine learning, then click on reinforcement Learning.

  • Um, and then I would honestly just do the whole thing.

  • It's not really a long serious.

  • There's only six parts, but definitely want to do deep que learning and get an idea for how that works because we're gonna be using very, very everything's going to be very similar.

  • Um, just as it's been even up to this point.

  • So definitely check that out, because you're gonna be pretty confused otherwise, but well, anyway, here we go.

  • So define and we're gonna have our innit method self.

  • And we're gonna start off by saying self top model equals self.

  • Don't create model.

  • So we're obviously gonna have a method that creates model.

  • For now, we're not gonna worry about that.

  • Then we're gonna say itself up target underscore model, and that's gonna be the exact same thing.

  • Self.

  • Don't create model.

  • Then we're gonna say self dot target model dot set weights and we're gonna set the weights to be self dot model dot get waits.

  • So again, if these is that this line here is confusing to you, go back to that reinforcement learning Siris.

  • Okay, so we want this every now and then.

  • Basically, the target model's gonna update to the models.

  • So, basically, with reinforcement learning as a reminder, even if you have seen it, you have two models, right?

  • One model is the one that's actually being constantly trained.

  • The other one's the one that we predict against We want the one that we predict against to be relatively stable.

  • If we're always updating that model, we're going to get really volatile results is gonna be very hard for this model to actually get like, coherent results.

  • So instead, what we do is we kind of hold that model steady to predict against.

  • And then we're constantly training the other model and then after n number of episodes, or even steps or whatever you want to go with, usually it's episodes, though we're going to update the the what?

  • The other model.

  • The one that we're predicting against.

  • So anyway, cool Target model Papa block.

  • Okay, cool.

  • So now we're going to say is self doubt Replay underscore.

  • Memory is equal to Dick you or day Q?

  • I don't really know.

  • And they're gonna say Max, land equals lips.

  • Uh, and then this is replay memory signs.

  • So So again, if you don't know what that is, go back to that course.

  • But we want basically, this is the memory of previous actions.

  • And again, we pick a random set of actions again to help with just crazy volatility and actually keeping things relatively sane, at least attempting to.

  • So, um, let's go ahead and make these these two things since we're using them.

  • So let's go to the top.

  • Let's say from collections import Q, nice.

  • Got it.

  • And then we're gonna say replay memories size.

  • We're gonna set that to be 5000.

  • I know I've said it before, but yeah, the underscores is like a comma, right?

  • So that means 5000 and super useful, Like if you wanted to do five million, right?

  • That that's so much easier to read than that, right?

  • I can't just quickly glance it and be like, Oh, that's five million.

  • Or even if he especially once you start getting into more astronomical numbers, the underscore is super helpful.

  • Okay, uh, replay.

  • Memory size.

  • That's fine.

  • Um, let's go ahead and make a few more of these constants.

  • Wow, we're here.

  • We're gonna need replay memory size.

  • We're also going tohave.

  • Um, we can set men replay.

  • Memory size will set that to be 1000.

  • We're gonna set mini batch size.

  • We'll say that.

  • 16.

  • Uh, we're gonna say prediction.

  • Batch signs will set that to be one training batch size will be mini batch size divided by four no remainder.

  • That's what the double div is there.

  • Updates up.

  • Update target every.

  • So this is how many, basically, at the end of how many episodes.

  • So basically, every five episodes will update that target model we're going to be, and we're gonna end up using model name at some point.

  • For now, I'm gonna call this exception.

  • That's the model that I'm gonna use.

  • You can feel free to change it.

  • You can make your own custom model use different one.

  • I don't really care.

  • Uh, and, uh, throw in Katherine's other things to memory.

  • Fraction equals, and I'm gonna say 0.8.

  • This is how much of your GP are you gonna want to use?

  • So I'm I'm using an arty ex Titan card in for some reason, that card and I apparently all the r T X cards.

  • I have this weird issue where they attempt to allocate more memory than they have.

  • And the only way for me that I found overcome this is to use a memory for action that is less than one point.

  • Oh, basically.

  • So don't let the card attempt to allocate all the memory.

  • I don't know why that is.

  • I don't know if that's tensorflow specifically is fall towards that coud a tool kit or cootie nn I don't know whose fault that is, but it's just a thing that's happening right now, so I have to do this problem.

  • Most people maybe don't even have to do that at all.

  • I don't really know.

  • So, um, let's see.

  • So the other things we can say men reward is negative 202 for the model saving?

  • Um, actually, we wouldn't even use men reward anyway.

  • uh, I think that's the US All the stuff I'm gonna use for now, uh, we could also do our discount Absalon and all that stuff as well.

  • So it's going to that too.

  • So we're gonna set discount discount will set that 2.99 Then we're gonna set, uh, let's do episodes.

  • So how many episodes?

  • How many episodes do we want to?

  • D'oh ah, 100 is way too few.

  • But I'm just going to set it to something reasonable for now s So we got a discount.

  • So then, uh, Eppes Eppes not caps Epsilon IX.

  • This is gonna change.

  • Hopefully, uh, Absalon underscore decay 0.95 just to see it decay.

  • Probably later on, we're gonna wind up with something more like 0.9975 or 99975 Something like that.

  • Honestly, to check that I usually determined how many's episodes do I really want to go?

  • And then, uh, we're actually Japanese.

  • It depends either episodes or steps, depending on when you're gonna decide to decay.

  • Then I just read a four loop to determine.

  • Determine the number.

  • There's there's got to be, like some sort of function out there that you could just say, Hey, I want to do this many steps make Give me a reasonable decayed number.

  • Um, but for now, we'll go with that, uh, and then we're gonna say, men, men, Epps, ill epistle on equals This.

  • So 1/10 of a percent.

  • Then we'll have aggregate stats.

  • Every will say 10 episodes, and then we'll have, actually, Do we?

  • We've really gotta show preview.

  • Okay, so now that we've added, you know, 6000 lines of Constance and starting variables, let's go back to our actual agent code.

  • So replay memory cool.

  • The next thing we're gonna do is self Don't self dot tensor board.

  • And that's going to be a modified tensor board, I believe.

  • Oh, I closed it.

  • I believe that, um, the this will come up, Put this in the code as well, uh, longer equals.

  • Um, I think this is I think I use this as well in, uh, the Q learning stuff.

  • So if you've done that, Siri's, you should have this code somewhere, but I'll put in the text based version two for this modified tensor board function.

  • At some point, I'll show where to get it in until I have it pasted somewhere reasonable.

  • I'm not gonna show it.

  • So right now, I'm not gonna add this code.

  • Um, but anyway, just modifies tents or board to be a little more reasonable for the task of I am confident.

  • It's for Q learn.

  • Because the problem is we don't need as many updates to tensor board is tense aboard wants to d'oh.

  • So yeah, because it's gonna basically want to create a custom file per train session or trained, uh, loop, I want to say or something.

  • Or is it might be a perf dot fit, I think is the problem.

  • I can't I can't remember.

  • It's been too long.

  • But anyway, regardless, we want to modify it so it stops doing that nonsense.

  • And both for speed purposes and storage purposes.

  • Anyway, wrist at least Best buy the logger.

  • Uh, and it will be model.

  • It's an F string.

  • In case you