Subtitles section Play video Print subtitles what's going on? Everybody and welcomes you. Part four of the M l in daylight. Three tutorials. Ah, where we left off we were creating for real. Hopefully this time some training data gonna be pretty angry if this is not correct. But we checked it last time, and I think I'm all set. Um, Anyways, what I've got here is actually over 50,000 files and was 50 gigabytes of training data. I think we're all set. So I'm gonna go and stop all these from running, and we'll close these and we'll pop into here, come into training data, and let's sort by size, make sure looks like they're all between 2000 4 2000 and six. Want to make sure none of them are just, like empty. And then we can also sort by the name. Looks like the largest is just under 5500 But pretty quickly, I mean, we only have a handful of 1000. How light games. Like where? Where the Aye. Aye, Somehow collected over 1000. Which is pretty impressive because we're called. We're only making one ship, so that starts at 5001 ship goes down to 4000 and we set this threshold to save as just basically 4100. So did that ship collect anyhow, Light. Now what we want to do is find out, um, where should we draw the line? So s O, for example. Um, it's come into this is just the data. I don't really care about that. Let's go over it. Okay. So I am, Um I guess we could take testing grounds. Um, I think we'll just modify testing grounds is a couple of things I want to check. Um, so the first thing would be let's do import O s. And then all files equals O Estado lister training data. Don't forget to put this in quotes and what I'd like to do import, Matt, plot lib DuPuy plot as peel tea and what I want to do. Actually, this machine pride doesn't have Matt plot lips. Let's go ahead and grab that. Let me make sure Pip is to python that we're using. Hurry up. Okay. Pit pit. Been stall, Matt. Lot lib. So what, Grandma, plot lib. Make sure we can. We should be good, though. I just wanna make sure no errors on. What I'd like to do is plot, like a distribution of the score's kind of for two reasons. One is I'm just curious right now. But also, what we could do is keep this exact same threshold. And then after we've trained a model, we can see, uh, is the distribution exactly the same as in, like, almost like, what's the average score of the games? You also could do an average, but or both. I'm curious to know, Uh, after we train a model, how much better did is our new model. So coming back over here. All files host out Lister trained data. Let's just print Lin all files. Run that really quick. Let's see how quickly we can get through that pretty pretty darn fast. So then four f because if we use file, I think that's Ah. I thought file was a key word. Anyway, I'm gonna keep with F for F in all files. Um, so that should be the file name. So then we just want to say, uh how light Hal item amount equals. F dot split. We'll split by the dash, and then we'll go with zero with elements. So if we print, Hal, Iet amount will just break after the 1st 1 Brick, um, whom we could see 4100. So then, um I'll just do how light amounts. Here. Uh, then we will delete Thio here, pal. I amounts on DDE amount. Uh, print. Let's do print, Lynn, pal. I underscore amounts also just so I can dev this to start, Let's just do, like, 500. Cool. Okay, so now we want a plot. Hissed a gram. I believe it's p l t dot hissed in Matt plot lib and then for a history, ma'am, you just have to pass exes. So we want to pass, Hal. I TTE amounts, and then it's like bins. It might be a TSH bins or end bins. We're gonna find out in a second. I'm gonna say five and then peel tea that show Run that. Whoo! It worked. Okay, so it's only 4100 because as we generate, I think we're just going in order. So let's do, um, let's do 15th hated. Let's do 1501st to see OK, so we can see clearly. The most common is 4100 and it kind of goes down. Uh, let's do them all. I think I think that's looking pretty darn pretty darn gate. Yeah. I'm actually not sure why that was so difficult. Why would that be so challenging for it? Because we said we only wanted so many bins. So why is the display so challenging for it? Well, it's curious. What about done wrong? Um, I feel like it didn't actually put them in the bins. Right. Uh, hello. Uh, like, pulling up real quick. I don't know what I've done wrong here. Um, it's too, uh, map plot live, hissed a gram pipe lot hissed. That's correct. See if we say you know, five bins. It really should only have, like, five categories have been, uh, for clearly. It wanted to label like a ba jillion. Could you please not take five years to load and just, I don't know, load any of them. These are all mad plot lived out or about this one. This is a map lot live. Yeah. There we go. Yeah. How come? How come they're bins? Look normal. Numb bins, X numbers. How come miner is so ugly? Uh, let him out. Ben's equals five just to make sure Ben's is the key word. It looks like that person passed. It's still gonna be a pain. Isn't it weird? I don't know what I've done. I thought with the bins, it would just show it would be like the range Almost. But that doesn't appear to be the case. Yeah, we got, like, all these 1,000,000 tick marks, Someone coming below what I've done wrong. It's kind of a bummer. I'll keep these around. I'll keep the script. Uh, the other thing we could do is, um I think it's from Is it statistics that has the mean operation from statistics import mean? Um, for now, I'm just gonna comment this out, I guess because I don't think that's really working Print. I mean, Hallet amounts. What? I was wrong. Oh, dude. Okay. So I bet I know what I've done wrong up here. Let's see. So right now, these are not, uh this needs to be an integer, because right now it's a string. We'll see if that fixed it. Whoa. Wow, man, what a dumb mistake. Okay. Okay. So, uh, so this sort of helps. I'd probably cut it off at about the 10,000 mark just to see what the real distribution is. But yes, you can see it's a pretty significant or even maybe 2000 mark or something. But you can see we got very few of these games and it takes up real fast. Okay, Cool. So then the other thing we really could do now is this. And we get the mean, which is 42 02 So that's the average. So later, after we train a model, we can come back and see what is the new meat. Okay, so, um, so now that we've done that, um, I'm going. What I'm gonna do is I'm gonna come in here training data old. This is from before. So over time, I'll probably keep it least the previous training data set. Obviously, it's 50 gigabytes at some point. We can't be doing that. And soon we're gonna have multiple ships, and, uh, we're just gonna have a huge day sets so we can't keep too many of them. Normally, I probably won't have 50,000 games, though that's a lot of games. So really, we would just raise the threshold, Probably. But for now, I do want to see if we can improve the mean game from one training example. So testing grounds instead of calling that testing grounds Now what? I'd like to call it I'm not sure why I had to save it. Let me just test it real quick. Here. Cool. Uh, what? We're gonna call this instead? Now we're gonna just copy paste. I don't. Why do we have another test? Oh, this is the output. Okay. Coming back over here, I'm gonna get rid of test.