Longer-term model results - Self-driving cars with Carla and Python p.6

Subtitles section Play video

what is going on?
Everybody.
And welcome to another Carla video, possibly the last one.
But we will see.
So what I wanted to do is show you guys at least the best is a model that we've had, as well as just what what models look like after training for quite some time.
So so, as you can see, I mean this model.
This is just playing from one of the twitch streams.
Those videos will probably be viewable for awhile.
I'm not really sure when twitch removes the bad's.
But any waging, good twitch dot tv slash Centex and then go to videos and you can watch them in all their glory.
There's probably basically from the models beginning of training.
All the way up to this point has was all like streamed.
So anyway, you could definitely check it out.
I mean, it's pretty good.
It still eventually runs into things.
It's it's decent at staying on a straight road.
It can take turns, not the greatest and the same thing.
Back when I was doing the GT A Siri's, although even here it's a little more impressive like this again.
Nobody told this model.
Hey, here's how you do things right.
It started off purely randomly, just taking random turns and straight and just random actions, and it at least learned something.
It is admittedly not very good at driving, but still, it has learned to do some things pretty good.
And I've seen it do some pretty cool maneuvers and all that.
And so I mean, it's something, but it's not.
It's not that great of a driver yet, but I mean, that's I I don't know what we should have expected, to be honest, Um, especially considering this is a de que en so dick.
Humans are really good at learning things, but they're slow at learning things, so yeah.
Anyway, let's talk about some of the findings that I, uh that we've made basically the things that we tried, what worked, what didn't work in, kind of like some of the theories as to why so, first of all, the model that you're looking at here, basically, this is the tensor board we trained for over 300,000 steps and by steps, I mean episodes, which translates to well over five days.
So So it took a long time, and that's not just five days on like a decent machine.
That's five days on the machine behind me.
That is the Lenovo P 9 20 workstations got to Artie X eight thousands in it.
So that's 96 gigabytes of V.
Aram.
It's got, like, 200 gigabytes of regular ram to see pews.
So five days on that on this machine, like in this machine's not weak.
Like my main usual machine is has a Titan or t X on it.
And it's still really just one runs, like one agent you could maybe stuff into.
But the problem is frames per second.
So the other issue is, uh, you wanna have where we were kind of hoping for about 15 frames per second.
Let me get this to stop updating.
We don't need that anymore.
And let me just bump this to, like 70.
Cool.
Ah, you wanna have high friends per second And ideally, we'd have even higher than 15.
15 is just We're just kind of beggars.
Can't be choosers at some point.
So s 0 15 All right?
At least with self driving car in grand theft auto, you know, really wanted more like 30 to 60 or more like the more friends per second you get, the just the better.
That agent is just simply gonna perform because making a couple of mistakes here and there just look, it's It's just a wash.
So anyway, unfortunately, that was as good as it got was 15 frames per second, but the model actually learned stuff.
So first of all, I'm scrolling through a bunch of stuff because it pride doesn't matter.
Well, first of all, accuracy.
I mean, obviously, people think, Oh, that must matter.
But not quite with the D.
Q.
And A.
I mean you want you definitely need the neural network toe.
Have some degree of accuracy because you want that model.
Um, it needs to sort of learn the Q values, but especially initially, the Q values are relatively worthless, right?
So especially initially, the accuracy is gonna be pretty weak.
And then over time, um, at least this one, it's almost like it didn't The model itself was pretty stagnant.
And keep in mind there's three options, so this is significantly above random, but it's still not like, super accurate or anything.
So anyway, uh, you know, we later decided, Hey, let's try a larger model.
So this was a 64 by three convolution.
All nor network changed from Exception, which the main issue, with exception going to the text based version.
That's tutorial was the following this this loss.
So the loss just continued to explode on us and in general of loss is exploding.
It's just not.
It's super rare that the model is going to recover from a loss explosion.
And even if it does, it's just probably gonna happen again.
Like they're usually a loss.
Explosion signals Something is fundamentally wrong Now.
Um, Daniel claims to have a model that's pretty good that also exhibits and has just constant loss explosions says it works Well, we'll see if we find that we have a model that actually works well despite having lost explosion, I'll be sure to let you know.
But for the most part wanted to move away from exception.
And ah, at least the reason why we assumed that was probably a problem is mainly exception has basically the amount of parameters that you can actually train in tweak as a neural network, all basically 23 million, Uh, and then er, actually, yeah, yeah, 23 million pretty much, whereas a 64 by three continents three million.
So it's just a much simpler problem for the agent for the neural network just to kind of try to figure out.
And again it's with the D.
Q.
N.
Yes, we're using a neural network, but it's kind of like the neural network is more so there to help us generalize que values right?
That's all it has to do.
It's not really a super complex task of the neural network has to do so anyway, it seems like maybe exception was overkill.
But then it seems like a 64 by three was maybe not large enough of a model because our accuracy just never became very good.
So in general with neural networks, if you're trying to get good accuracy, the goal is often make.
The network is small as you can while it's still learns.
Once you get to a point where you've made the network too small and it doesn't quite learn, make it just slightly bigger again.
Now I did try that.
We did try like a 60 are like doing something like 64 64 1 28 and then, like maybe making four layers of 64 adding yet we tried a bunch of different combinations, and honestly, nothing was better in this model on and again, I understand this model isn't look the greatest, but it does Lane keep.
Okay, okay.
It does.
Has learned, like straights and sort of how to get back on track and stuff like that.
And also in at least what we're looking at, I don't know.
We set the minimum absalon to be 10% basically.
So, still, 10% of the time these agents do some random action, but anyway, this is, I think, when I was streaming in a pre sure, we always just put it to 10%.
Just so it was mostly what that agent was actually going to do if it was left to its own devices.
So anyways, okay, so, uh, continuing on let me scroll past this is just like weather patterns and stuff that we decided to graph as we change them.
Um, episode time.
This is a decent metric.
Just because the longer the agent is surviving in the episode, that's useful information.
Also, that would kind of tell me like especially if episode average time was approaching like 10 seconds, which was the limit per episode that might suggest maybe we should boost that time.
Give the agent more time.
Therefore they earn more Ward, we That would be good.
So if it approached 10 seconds, that's what I would do.
Unfortunately, it basically stayed stagnant around like 6 to 7.
So I didn't really see too much reason to raise episode time too much.
But eventually I put it to 12 just because I felt like Let's make it at least half so continuing on you can see the minimum time That really didn't change at all.
I think it's probably cause the car we get dropped in may be dropped on top of another car or something like that.
I like just being put in a bad situation.
Uh, excellent overtime.
I just tried to recycle Epsilon just to see if the if the model would learn anything new overtime in the most recent changes that I bumped it up just to see, can we get anything else out of the model and then here you can see loss to again?
This is, you know, with you no less than 10.
In general, it assumes loss starts like breaching, like, let's say, 100.
But, ah, you're in trouble.
It's probably you're in trouble before that point.
But as you can see, like if we look at the the loss here, this is a gigantic loss.
It's only gonna get worse.
Most likely.
So, uh, Okay.
So coming back here and then basically here were graphing, um, the cube, the actual que values themselves.
So the question was, are we may be favoring one action over the other ones because some of the models really would.
I learned to do just one thing, and we would just call them like, loopy models or look, all they want to do is turn.
And they would just go in circles the whole time on, and they would just get stuck in that kind of behavior.
Um, trying.
Think there's really anything else?
I mean, yeah.
I mean, obviously the most important thing, uh, the reward average.
So I mean, as you can see here for the first, like, 50 K episodes.
Yeah.
A huge improvement drops back down again.
I'm guessing that with a lineup with yeah, that's just lining up with Epsilon going back up again.
Eso drops back down again, climbs back up Now we're averaging a positive reward again, Probably an Epsilon.
Then it recovers, comes back up to its highest peak ever.
Really?
And then it's got have been stagnant and is questionable to me whether or not we're going to see any more reward from here is basically from from this point on, we haven't really had much improvement, so it's not enough to suggest to me that there will be no more improvement.
But again, this is five days in on an incredible machine, so training is just taking forever.
Plus, we wanted to test other models as well, so we wanted to test that was my phone.
Wow.
Okay, can we stop?
Uh, we wanted to test other of models to.
And as you can see, it can take five days per any specific model that you want to train.
So, um, it's it's very time consuming to do this.
So, anyway, uh, yeah.
So let's talk about some of the changes that we did also try besides just changing the model, So we also address the reward function.
So the reward function initially was actually super basic.
It was just simply if you have a collision minus 200 Uh, if you are just driving around, there is just negative one.
Unless you're driving around at greater than 50 kilometers per hour, that's a plus one Pretty quickly, we determined that likely.
What also was contributing to this explosion of loss was really just this huge queue value that was happening when we would collide with cars.
So this negative 200 seemed like, um that's that's that's causing trouble to ensure enough when we got rid of that, that's also that's when we had no more loss explosion around.
That same time is when we removed Exception.
I honestly can't remember if we ever tested exception with, um with without doing the negative 200 penalty.
But anyway, regardless, um, I think the negative 200 was causing some trouble.
Uh, see, what were some of the other changes that we made in trying o get back to the reward, the best model that we had still waited by time.
So as the as we continued along in the episode, the longer we went, the more weighted the rewards were.
But there was no longer a negative 200 of his waited in between zero and one.
So again, just in general, with neural networks like, I feel like I have to relearn this every freaking time.
I do like a real big project, but you pretty much always want to keep everything between zero and one or negative one and one, right?
You want to retain that range?
You want everything to be scaled, if you can, in that range.
And for some reason, I just keep deciding.
Thio, get outside of the range.
And I, like forget.
Oh, yeah, that's right.
That's not gonna work out very well.
So anyway, um so, yeah, we tried toe also do, like speed, scale or speed waiting.
So rather than having this, like, arbitrary 50 kilometers per hour What about what if we like, we wait with time?
Could we wait speed instead?
Or together anyway?
No luck.
Nothing really seemed to pan out there.
Like I said, we tried bunch of different models and stuff like that, too.
So at least so far it seems like 64 by three is yet again the best.
It's just funny to me because 64 by three has been the model that I have used, and we really Me and Daniel have used to be successful in a lot of things, and there's nothing special about 64 by three.
But that is always the winner like it is just always the best model for us.
And obviously there's so many other things coming into play.
Besides 64 by three, you've got the native 200 stuff and, you know, making the reward, you know, function very simple and basic, but still kind of comical and ironic.
But yeah, we'll keep trying at it, and I'll keep training it for as long as I have the machine on.
Unfortunately, Lenovo does want their machine back one someday.
So up there, they, uh So I can't train this many agents at a time, so yeah, it's gonna be kind of slow going.
Um, also, they're starting to be issues specifically with Carla.
So again, one of the issues that we had was like, for example, to run eight agents.
We were finding that it's more efficient to run two agents for four Carla instances rather than say, uh, eight agents in one Carla instance, which is very unfortunate because, um, because each Carla instance were running 25 NPC scripts, which also suck up memory on CPU power but also a client connecting to Carla costs.
It's almost like a client.
Connecting to Carla at a certain point costs more than running another Carla instance, which is bonkers to me and again, Like I said in the video number one, um, Carla is really impressive.
I could never code something that good.
But we are starting to find certain limitations with Carla that are just kind of weird that are making things difficult because obviously running four Carla instances with 25 NPC is, per instance eyes not ideal.
Which reminds me, actually, we did do a couple.
We did one more thing, and my question was with generalization, because with Carla, you've got seven different cities that you could be driving in.
We were always using one city, and, uh, in this this version or this video that you're seeing, this was all one city.
Um, so But it turns out they have seven different cities.
So then we started saying OK, well, what if we every 10 episodes, let's just change to a random city on C.
Maybe that will help with generalization, right?
And get more scenes and stuff to the agent again.
Doesn't seem to have helped.
So I did try.
I took, basically, while we just clipped right through that building.
That's hilarious.
That's another thing that we found with Carla, especially this city.
Seems pretty good.
That's actually one of the first clippings I've seen in this mean city.
But when you load some of the other and I guess I'm calling him cities, but I think Carla reverse of them is towns.
But anyway, um nice.
Um, yeah, and some of the other towns There's, like a lot of places like trees, for example, that if you drive through, you'll just clip right through the trees.
And by clipping, I mean, you go through him like they're not even there s so there's, like, no collision detection being happy, you know, happening there for whatever reason, on the engine side.
So, um, so, yeah, that seeing that happen very often on the other towns makes me nervous to use the other towns in general.
But anyway, we didn't find any success.
And like I said, this city Seems pretty good.
Um, and again, I'm not really knocking, Carla.
I think for what it is, it's a pretty epic simulator.
But there are certain issues that, uh, definitely are causing trouble.
Also, uh, we like this is probably my first larger de que en just reinforcement learning experience.
So I have to learn a lot of stuff.
So anyway, I think that's all I can't think of anything else I've forgotten.
So, uh, that's probably everything.
Uh, there.
I didn't put up the code on this index slash Carlo dash or l uh, this is quite a bit different code.
If you go into sources models, I think the default model it's going to be using is not the same one as I'm using or that I found to be the best.
It has a bunch of different models in it.
I'm looking the one that's somewhere here, This one.
This is the model I found to work the best.
But like I said, it's it's probably more so based in the reward function like the neural network is not that important.
Like just about any basic Deke you are Basic neural network would suffice.
I can't talk anymore.
So I should wrap this one up.
So anyway, uh, you can play around with some of these other ones.
We could do a lot more fancy stuff.
I think probably it's better to play with the reward function or maybe approach with the different reinforcement learning algorithm all together.
Like I said, De que En is cut.
I mean, it's OK.
It can learn some really awesome things.
It just takes a very long time, which is very challenging when what you're running is Carla, which already, like most people, have a hard time running like one Carly instance, and we need to do just tons and tons and tons.
So anyway, cool.
A quick shout out to my most recent brand new channel members.
Ah simp you.
I'm not positive how to pronounce us.
I'm guessing simp e in almost looks like a French.
Maybe I don't know.
Anyway, Ken and Dema thank you guys very much for your support.
You guys are amazing individuals, so that's all for now.
If you've got questions, comments, concerns, whatever you can feel free to leaving below.
I'm definitely not leaving behind reinforcement learning and self driving cars.
I might be leaving behind Carla for something else, but we will see.
That said, if you have any ideas, feel free to make a pole request or just make a suggestion and share with us an idea for a model or something like that.
I'd be happy to test it out and see like here's the results, especially if you don't have any hardware to actually try it on.
I probably won't have this one for much longer, but I've got plenty of GP use and machines lying around the old house.
So, um, I'd be happy is still curious to see if we can do better, because this one is just not that impressive like it was so close.
Like after the first day of training, I was really excited.
I was like, Oh my gosh, we found one.
It works.
Look at this.
It's actually learning stuff.
And then it stopped learning stuff.
So anyway, cool.
Uh, yeah, that's it.
Questions, comments below Come hang out with us and discord!
Discord, org slash Centex.