Placeholder Image

Subtitles section Play video

  • what is going on?

  • Everybody.

  • And welcome to another Carla video, possibly the last one.

  • But we will see.

  • So what I wanted to do is show you guys at least the best is a model that we've had, as well as just what what models look like after training for quite some time.

  • So so, as you can see, I mean this model.

  • This is just playing from one of the twitch streams.

  • Those videos will probably be viewable for awhile.

  • I'm not really sure when twitch removes the bad's.

  • But any waging, good twitch dot tv slash Centex and then go to videos and you can watch them in all their glory.

  • There's probably basically from the models beginning of training.

  • All the way up to this point has was all like streamed.

  • So anyway, you could definitely check it out.

  • I mean, it's pretty good.

  • It still eventually runs into things.

  • It's it's decent at staying on a straight road.

  • It can take turns, not the greatest and the same thing.

  • Back when I was doing the GT A Siri's, although even here it's a little more impressive like this again.

  • Nobody told this model.

  • Hey, here's how you do things right.

  • It started off purely randomly, just taking random turns and straight and just random actions, and it at least learned something.

  • It is admittedly not very good at driving, but still, it has learned to do some things pretty good.

  • And I've seen it do some pretty cool maneuvers and all that.

  • And so I mean, it's something, but it's not.

  • It's not that great of a driver yet, but I mean, that's I I don't know what we should have expected, to be honest, Um, especially considering this is a de que en so dick.

  • Humans are really good at learning things, but they're slow at learning things, so yeah.

  • Anyway, let's talk about some of the findings that I, uh that we've made basically the things that we tried, what worked, what didn't work in, kind of like some of the theories as to why so, first of all, the model that you're looking at here, basically, this is the tensor board we trained for over 300,000 steps and by steps, I mean episodes, which translates to well over five days.

  • So So it took a long time, and that's not just five days on like a decent machine.

  • That's five days on the machine behind me.

  • That is the Lenovo P 9 20 workstations got to Artie X eight thousands in it.

  • So that's 96 gigabytes of V.

  • Aram.

  • It's got, like, 200 gigabytes of regular ram to see pews.

  • So five days on that on this machine, like in this machine's not weak.

  • Like my main usual machine is has a Titan or t X on it.

  • And it's still really just one runs, like one agent you could maybe stuff into.

  • But the problem is frames per second.

  • So the other issue is, uh, you wanna have where we were kind of hoping for about 15 frames per second.

  • Let me get this to stop updating.

  • We don't need that anymore.

  • And let me just bump this to, like 70.

  • Cool.

  • Ah, you wanna have high friends per second And ideally, we'd have even higher than 15.

  • 15 is just We're just kind of beggars.

  • Can't be choosers at some point.

  • So s 0 15 All right?

  • At least with self driving car in grand theft auto, you know, really wanted more like 30 to 60 or more like the more friends per second you get, the just the better.

  • That agent is just simply gonna perform because making a couple of mistakes here and there just look, it's It's just a wash.

  • So anyway, unfortunately, that was as good as it got was 15 frames per second, but the model actually learned stuff.

  • So first of all, I'm scrolling through a bunch of stuff because it pride doesn't matter.

  • Well, first of all, accuracy.

  • I mean, obviously, people think, Oh, that must matter.

  • But not quite with the D.

  • Q.

  • And A.

  • I mean you want you definitely need the neural network toe.

  • Have some degree of accuracy because you want that model.

  • Um, it needs to sort of learn the Q values, but especially initially, the Q values are relatively worthless, right?

  • So especially initially, the accuracy is gonna be pretty weak.

  • And then over time, um, at least this one, it's almost like it didn't The model itself was pretty stagnant.

  • And keep in mind there's three options, so this is significantly above random, but it's still not like, super accurate or anything.

  • So anyway, uh, you know, we later decided, Hey, let's try a larger model.

  • So this was a 64 by three convolution.

  • All nor network changed from Exception, which the main issue, with exception going to the text based version.

  • That's tutorial was the following this this loss.

  • So the loss just continued to explode on us and in general of loss is exploding.

  • It's just not.

  • It's super rare that the model is going to recover from a loss explosion.

  • And even if it does, it's just probably gonna happen again.

  • Like they're usually a loss.

  • Explosion signals Something is fundamentally wrong Now.

  • Um, Daniel claims to have a model that's pretty good that also exhibits and has just constant loss explosions says it works Well, we'll see if we find that we have a model that actually works well despite having lost explosion, I'll be sure to let you know.

  • But for the most part wanted to move away from exception.

  • And ah, at least the reason why we assumed that was probably a problem is mainly exception has basically the amount of parameters that you can actually train in tweak as a neural network, all basically 23 million, Uh, and then er, actually, yeah, yeah, 23 million pretty much, whereas a 64 by three continents three million.

  • So it's just a much simpler problem for the agent for the neural network just to kind of try to figure out.

  • And again it's with the D.

  • Q.

  • N.

  • Yes, we're using a neural network, but it's kind of like the neural network is more so there to help us generalize que values right?

  • That's all it has to do.

  • It's not really a super complex task of the neural network has to do so anyway, it seems like maybe exception was overkill.

  • But then it seems like a 64 by three was maybe not large enough of a model because our accuracy just never became very good.

  • So in general with neural networks, if you're trying to get good accuracy, the goal is often make.

  • The network is small as you can while it's still learns.

  • Once you get to a point where you've made the network too small and it doesn't quite learn, make it just slightly bigger again.

  • Now I did try that.

  • We did try like a 60 are like doing something like 64 64 1 28 and then, like maybe making four layers of 64 adding yet we tried a bunch of different combinations, and honestly, nothing was better in this model on and again, I understand this model isn't look the greatest, but it does Lane keep.

  • Okay, okay.

  • It does.

  • Has learned, like straights and sort of how to get back on track and stuff like that.

  • And also in at least what we're looking at, I don't know.

  • We set the minimum absalon to be 10% basically.

  • So, still, 10% of the time these agents do some random action, but anyway, this is, I think, when I was streaming in a pre sure, we always just put it to 10%.

  • Just so it was mostly what that agent was actually going to do if it was left to its own devices.

  • So anyways, okay, so, uh, continuing on let me scroll past this is just like weather patterns and stuff that we decided to graph as we change them.

  • Um, episode time.

  • This is a decent metric.

  • Just because the longer the agent is surviving in the episode, that's useful information.

  • Also, that would kind of tell me like especially if episode average time was approaching like 10 seconds, which was the limit per episode that might suggest maybe we should boost that time.

  • Give the agent more time.

  • Therefore they earn more Ward, we That would be good.

  • So if it approached 10 seconds, that's what I would do.

  • Unfortunately, it basically stayed stagnant around like 6 to 7.

  • So I didn't really see too much reason to raise episode time too much.

  • But eventually I put it to 12 just because I felt like Let's make it at least half so continuing on you can see the minimum time That really didn't change at all.

  • I think it's probably cause the car we get dropped in may be dropped on top of another car or something like that.

  • I like just being put in a bad situation.

  • Uh, excellent overtime.

  • I just tried to recycle Epsilon just to see if the if the model would learn anything new overtime in the most recent changes that I bumped it up just to see, can we get anything else out of the model and then here you can see loss to again?

  • This is, you know, with you no less than 10.

  • In general, it assumes loss starts like breaching, like, let's say, 100.

  • But, ah, you're in trouble.

  • It's probably you're in trouble before that point.

  • But as you can see, like if we look at the the loss here, this is a gigantic loss.

  • It's only gonna get worse.

  • Most likely.

  • So, uh, Okay.

  • So coming back here and then basically here were graphing, um, the cube, the actual que values themselves.

  • So the question was, are we may be favoring one action over the other ones because some of the models really would.

  • I learned to do just one thing, and we would just call them like, loopy models or look, all they want to do is turn.

  • And they would just go in circles the whole time on, and they would just get stuck in that kind of behavior.

  • Um, trying.

  • Think there's really anything else?

  • I mean, yeah.

  • I mean, obviously the most important thing, uh, the reward average.

  • So I mean, as you can see here for the first, like, 50 K episodes.

  • Yeah.

  • A huge improvement drops back down again.

  • I'm guessing that with a lineup with yeah, that's just lining up with Epsilon going back up again.

  • Eso drops back down again, climbs back up Now we're averaging a positive reward again, Probably an Epsilon.

  • Then it recovers, comes back up to its highest peak ever.

  • Really?

  • And then it's got have been stagnant and is questionable to me whether or not we're going to see any more reward from here is basically from from this point on, we haven't really had much improvement, so it's not enough to suggest to me that there will be no more improvement.

  • But again, this is five days in on an incredible machine, so training is just taking forever.

  • Plus, we wanted to test other models as well, so we wanted to test that was my phone.

  • Wow.

  • Okay, can we stop?

  • Uh, we wanted to test other of models to.

  • And as you can see, it can take five days per any specific model that you want to train.

  • So, um, it's it's very time consuming to do this.

  • So, anyway, uh, yeah.

  • So let's talk about some of the changes that we did also try besides just changing the model, So we also address the reward function.

  • So the reward function initially was actually super basic.

  • It was just simply if you have a collision minus 200 Uh, if you are just driving around, there is just negative one.

  • Unless you're driving around at greater than 50 kilometers per hour, that's a plus one Pretty quickly, we determined that likely.

  • What also was contributing to this explosion of loss was really just this huge queue value that was happening when we would collide with cars.

  • So this negative 200 seemed like, um that's that's that's causing trouble to ensure enough when we got rid of that, that's also that's when we had no more loss explosion around.

  • That same time is when we removed Exception.

  • I honestly can't remember if we ever tested exception with, um with without doing the negative 200 penalty.

  • But anyway, regardless, um, I think the negative 200 was causing some trouble.

  • Uh, see, what were some of the other changes that we made in trying o get back to the reward, the best model that we had still waited by time.

  • So as the as we continued along in the episode, the longer we went, the more weighted the rewards were.

  • But there was no longer a negative 200 of his waited in between zero and one.

  • So again, just in general, with neural networks like, I feel like I have to relearn this every freaking time.

  • I do like a real big project, but you pretty much always want to keep everything between zero and one or negative one and one, right?

  • You want to retain that range?

  • You want everything to be scaled, if you can, in that range.

  • And for some reason, I just keep deciding.