Placeholder Image

Subtitles section Play video

  • All right.

  • Hello, world.

  • This is CS Moody Live.

  • My name is Cole Tenaga.

  • Enjoying today again My CI's buddies, Nick Long Neck What we're talking about today.

  • So today we're continuing a little bit of our subject from last stream over.

  • Take a step back for, like, a moment from images and they were happening to neural networks to kind of the biggest buzz word of ml at the moment.

  • Going a little bit more level than images, we're gonna we're gonna use a little bit easier data set when I build kind of a sample version of this This last weekend it took, I think, like an hour to run or so not great.

  • So we're going to use a very simple data set and figure out how to just what a neural network is.

  • You know what isn't neuron?

  • Because they're not quite the same as the ones in our heads were in the rest of our bodies, and it's kind of complex.

  • There's a lot of weird terms like back prop and grading descent and things like that.

  • None of these things I know.

  • Yeah, I know them for quite some time before the stream tonight That's a lot of jargon, but Indy Colts intended target elucidate what it all means.

  • And if anybody's familiar scenes Video Alive is a show hosted here at Harvard University, CS 50 is Harvard's introduce computer science, topped by David Malin, who may make an appearance in the chat.

  • Um, and this is kind of an offshoot of that war.

  • We build things from scratch.

  • Talk about concepts.

  • Nick is a regular here on the stream.

  • We've done not only programming things but also operating system related things.

  • Kelly Lennox was you talked about it at some point, and your first thing was actually a machine learning related.

  • It was becoming a classifier, a very high level of really.

  • We're gonna do some AP eyes.

  • We're gonna just kind of skim over a lot of the math.

  • Now we're going to have a little bit more into the math on.

  • Then we're going to actually end up motivating the high level approach again.

  • But this time we're going to explain each of the kind of lower level concepts that we talked about.

  • Actually, I think in that stream Yeah, the first dream was used tensorflow Google's like there's even higher Yeah, today we're gonna be using just just no way got matrices with the golden eventually of continue because this is kind of like a new sub Siri's.

  • Yeah, when we're doing a sort of like building a neural network, talking about machine learning with the eventual goal of being able to generate images from part of streams which we talked about last week.

  • Yep.

  • And we're going towards that.

  • I think that is our next week.

  • That's where we're going to start hitting you with these, like generative neural networks.

  • This kind of like we have Variation a Lotto encoding.

  • We have kind of like distributional networks.

  • And then we have Ganz, which are generated adversarial networks on those air like state of the art.

  • At the moment, they're super cool.

  • The things that generate the really cool landscape.

  • They produce all these like very sharp images.

  • So even like deep fakes, is kind of a similar concept where it's like you can't necessarily tell that it's not a real person, like like presidential videos where they like dancing a little wacky.

  • The only thing that I think we'll probably explore, we might even start with next week because it's a little easier to build from scratch is style transfer networks.

  • So let's say I give you like, then go painting and I then hand you a modern day picture of something right.

  • I transfer that style onto that, and it's beautiful.

  • I mean, they create.

  • This is like incredible works of art, and it's pretty easy to build.

  • Once we have kind of like our basic understanding of, like, what's a CNN?

  • What is convolution?

  • There's a lot of steps along the way.

  • It's kind of overwhelming, but I think it's cool is a lot.

  • So if you're ever confused, luckily is on video.

  • See if anybody wants to watch YouTube.

  • Our first part where we built the K means classifier because we need We're gonna eventually need a classified images so presuming to be able to generate them, comparing him against those those clusters.

  • Exactly.

  • Um, but you can check that that's on YouTube.

  • That should be about it at the time of this recording.

  • Actually true, Connie says Hi.

  • Oh, nice job on the podcast called Thank you very Much Way are releasing a podcast very soon, probably today.

  • We just filmed it.

  • It's on YouTube and all the major sort of spot of our, although may just modify it.

  • Major podcast distribution channels, including Spotify.

  • But that's to be formally announce probably on Facebook, and otherwise we'll definitely check that out.

  • Sweet.

  • Why?

  • Sarah says nice music.

  • Thank you very much.

  • Um, cocoa.

  • Thanks so much.

  • Everybody for tuning in.

  • I say, let's just weigh.

  • Have a lot to cover.

  • Why don't we just we just dive in?

  • Yeah, right.

  • Make studio Magical.

  • People can actually see what I'm doing now.

  • Way changed our camera lens.

  • Lot whiter.

  • So?

  • So, boom.

  • There we go.

  • See?

  • My favorite classic See matrix screen saver.

  • Cool.

  • So we're gonna hop out of that and we have an empty file start programming.

  • Kind of.

  • Yeah, it looks like that, I guess.

  • You know, no one's really seen those.

  • Yeah, this Sure.

  • Yeah, I guess I got to do this, but it's a little harder to see.

  • Yeah.

  • Yeah.

  • So we're starting out with a, uh, just kind of rob.

  • Fine.

  • I literally nothing but lilies.

  • Zero.

  • This is kind of like keep ourselves on track.

  • I do have a like reference file in case I cannot forget some of the math of the top of my head.

  • A lot of the service.

  • Very.

  • Yeah, there's a lot, A lot, A lot of very, very down in the nitty gritty details.

  • So we're gonna start with kind of like my favorite important lumpy, uh, as God, Q e.

  • My ideas like, Oh, you wantto will abbreviate that we got this overeager.

  • I start with dumpy.

  • And for those that are familiar enough, he's a pretty awesome library.

  • Let's do all sorts of beautiful things mathematically.

  • It also has just kind of inbuilt matrix operations and matrices that make all sorts of math really convenient.

  • We're going to have a simple extent.

  • Ascent, which is going to be a dumpy array of literally, just will put 00 01 And for those that are kind of curious as to what we're doing right now and why we're doing it, it's a great curiosity toe have.

  • Essentially, we're just creating kind of like a toy data sent on.

  • The reason It's really just kind of like a toy thing is the function that we are kind of testing is deterministic.

  • I can given input, tell you immediately, with the output is it's human computer bowl.

  • It's not particularly interesting, but it motivates using a neural net because its function is not linear, and it's not separable in any way by a linear function.

  • Now, if you're familiar with, like basis functions, you can actually transform this into a basis in which it is linear.

  • But we're going to pretend like you can't do that in order to force us to use a neural things.

  • So in case maybe if you're watching very closely, you'll notice that this is the exclusive or function eso given exclusively 11 as its input.

  • It will return to you, won anything else, return zero.

  • So you'll notice that two zeros that zero and to one's that's also a zero.

  • And this could be generalized to be kind of like the parody of a of an input.

  • So if I give it like 100 zeros and then 11 it will also return one there variants on exclusive or kind of like applications of it that also carry these properties.

  • But we're gonna start with the simplest possible data set.

  • You'll notice we're ignoring like matching.

  • We're not going to split this into training and validation and testing.

  • We're not gonna do any of that.

  • There's no unlabeled data.

  • This is very much just like a toys set on.

  • We want to figure out how to build a neural net on top of that.

  • And speaking of neural, Net is kind of like this term that gets thrown around all the time.

  • It's not necessarily intuitive like what it is.

  • It's a network of some sort.

  • But even that's not super intuitively defined network meaning things can talk to each other on.

  • And there is some path by which they do that.

  • Now that path sounds kind of simple and this high level of review.

  • But as we go into it, you'll notice that there's a lot of math involved.

  • Andan, the neural part is where we'll start, which is, you know, what's a neuron, and it's modeled after the idea of you know, our heads in our heads.

  • We have this individuals like billions of neurons that I'll communicate with each other and they do something That was for a long time, thought to be very simple.

  • You gave them some sort of stimulus, and they either fired in action potential or didn't that's the biological neurons.

  • Now that's not quite true.

  • There actually is kind of a hidden complexity to neurons, where they actually have sub kind of like dendrites that poke out, and those can also fire many potentials.

  • And then certain accumulations of these potentials can then fire the overall neurons, action, potential and what that means.

  • And what's interesting is now we have a way to do non linear action potentials, weaken, do all sorts of wild things in our heads and partially also explains why our brains are so complex, or at least gives an understanding for the intuition.

  • Why now, the neurons that we're building our a lot simpler they're not.

  • They're modeled in the same concept, but it's not quite, uh, analogous.

  • So the girl with bigger just posted in our chance and my brain is a neural network.

  • It only contains zeros.

  • No one.

  • I'm sure that's not true, but a very funny joke.

  • So our neural network and neural networks in general they consist of each neuron is essentially one wait and one bias.

  • Now, to say it that way sounds a little funny.

  • And if you're familiar with this at all, hearing that is a little whack, but that's kind of the intuition is I have some weight, which is just a number that says, Hey, anytime I'm giving an input, I'm gonna apply this number two that input and I have some bias.

  • So I know somehow that I am off linearly from whatever the actual answer is.

  • If you're familiar like y equals MX plus be, that would be like the B term and then M might be your weight.

  • Now that's not quite how this works.

  • It's all done through matrix math, so things get generalized a little bit more broadly.

  • But that's kind of the intuition is every time a piece of our input feature is given three neuron that gets it says, I'm gonna multiply, you buy a weight out of bias, and I'm gonna pass you on to the next layer.

  • We're gonna build a single layer neural network, which some of you may go.

  • That's still E.

  • And it is, to a degree, a couple maybe, like 40 years ago, not so silly, but now it is kind of silly.

  • We have deep use.

  • We can calculate like hundreds of neurons in one layer at a time and things like that.

  • And this will not restrict us as faras number of neurons we have.

  • But it will restrict us in number of layers we have.

  • And the reason for that Is there a couple of algorithms that I want to very much like explain very clearly.

  • But if we were to generalize this to a general network with many layers, that might be some of that clarity might be lost in just the syntax of the code.

  • So I'm gonna try and avoid doing that.

  • So, yeah, What we're gonna do is we're gonna just fucking spell feel.

  • I've seen the strings before.

  • I can't spell for life.

  • May really stop.

  • It's the tenants.

  • The curse, the curse of livestream coating.

  • I'm gonna get through this window, actually, so we have some neuron.

  • I'm gonna call it neurons, actually.

  • And that's just the number of neurons in our hidden layer.

  • We have our like inputs dimension, which is our ex dot shape.

  • What is it?

  • Hidden layer.

  • So are Oh, right.

  • That's a great point.

  • So, in these networks, I forgot to explain the network part of network, the things that are communicating with each other are these different layers on.

  • So if we imagine a network I like to imagine it is like I'm standing on top of it looking down or I'm standing at the bottom looking up.

  • I don't know why.

  • That's what I imagine.

  • Usually I see it.

  • I was like a left to right.

  • Like you loathe vertical layers.

  • You know what?

  • Let's do that.

  • Let's zoom agin.

  • It may look like a picture.

  • I'm going.

  • Yeah, that's, uh let's illustrate this a little bit.

  • Very nice.

  • Uh, changing background, by the way.

  • Thank you.

  • Thank you.

  • I found it.

  • Just pretty pictures.

  • It helps me remember the world is normal outside school.

  • Uh, thing in school.

  • So a lot of people, I think, imagine it like this.

  • And this is a good image for what we're doing.

  • Because instead of having two hidden layers, be someone I see the head where is not anything that's not the employer output.

  • Exactly.

  • So from the outside observer you can think about is like the black box part.

  • I have to configure my thing toe work with your input layer, and I kind of think they I guess hidden layer would be your brain.

  • It's kind of like most of your brain and that what you think of is the output layer, right?

  • So it's kind of like as it goes through, you eventually then figure out something you like O and that you're out layers like trunk.

  • Yeah.

  • Yeah, players like track.

  • Except in this Well, yeah, we'll say that's roughly true.

  • These are often referred to as dense layers because and there is kind of a cool Facebook comment somewhere, Uh, I think that software engineer, probably an ML engineer who said that like there is no such thing, is a truly connect like a full, truthfully connected layer.

  • These are often referred to as fully connected layers.

  • There point is that like in order to do that would be almost impossible.

  • And, you know, you go through a lot if you'd like, but essentially it is referred to as a.

  • You're dense, just neuron layers.

  • There's nothing super crazy about them.

  • They basically you're just thieves like waits that can be applied, biases that could be applied on.

  • They all talk to one another, so every neuron from one layer influences the neurons from the next.

  • You condone change that concepts in order to get different behaviors.

  • There's all sorts of things.

  • This is just kind of like the very simple like, what exactly are we talking about?

  • So in this image you have each circle represents one neuron in the input layer.

  • Each circle does not represent a neuron.

  • Instead, each circle represents one in particular aspect or feature of the input on.

  • And then the circle of the output layer represents our like prediction.

  • Given some input, what do we want?

  • Class?

  • Do we think it's in?

  • Um, so going back to our gonna organize that a little bit better switch quickly between them.

  • Going back to this, we have our input dimensions.

  • We have our output dimensions, which is equal to our wideout shape.

  • And from there, all we're trying to do is figure out how to set up a hidden layer of some sort.

  • Now, initializing the waits for any sort of, like input, dimension, output, dimension, hidden layer, prediction layer.

  • Generally, there's all sorts of like not activations initialization is that you can use with those were going to use just numb peas random uniform to do that, eh?

  • So what I'm gonna do is I'm going to initialize my hidden waits.

  • Um is going to actually be equal to numb ph dot random.

  • I can never, never spell high was typed random waits That uniform spelling is just killing me today on believe it size yeah, size equals.

  • So when we have this, like hidden layers waits, we know we're going to multiply them by whatever the input is and our inputs shape, you'll notice is like an end by two.

  • Eso will probably transpose that in order to get it to work properly.

  • But let's imagine that we're actually doing the input layer a czar hidden layer, which is what we're going to kind of throw things into as some sort of number of neurons.

  • So this is just neurons by whatever the shape of our input is.

  • So this is input dimensions one, and there are better ways of generalizing this, but I don't want to spend too much time on like matrix math and drawing these out.

  • I also don't have the ability to draw so well in general.

  • I also can't draw food wife.

  • So we're gonna just pretend that this is how this is going to work on and Then we went.

  • Our output waits as well, and this is again going to be just a random uniform of random dot uniform with a size.

  • And again, we want to kind of, like consider the Matrix mouth of this.

  • It's a little bit tricky if you're not super familiar with how matrices work.

  • There's all sorts of great tutorials online.

  • Khan Academy has some awesome, like linear algebra tutorials I would recommend going through them.

  • Ah, lot of machine learning is basically just linear algebra, at least with, like, mathematical level.

  • But to say that it is only linear round of our only statistics misses out on the kind of emergent behavior and other properties that arise while you're performing machine learning.

  • So what we're gonna do here is because we're now most playing by whatever our neural, our input layer Sorry, hidden layer handed to us in order to outlet something is we're going to say OK, whatever our outputs, dimension should be again, one by the neurons.

  • And so now, whatever we kind of throw through this, we should be able to figure out the dimensions on Everything should roughly translate correctly for just kind of our own benefit.

  • I'm going to print these out.

  • They're not particularly wild upside down the leopard stub, but we'll see why they work.

  • Uh, okay, so we printed out the hidden weights, which is this part of our code, and then the other half is the output weights.

  • And essentially what we're looking at is we have something that is roughly four by two on.

  • Then we have something that is one layer by four.

  • So that should make sense.

  • We have four neurons.

  • If I were to change this to eight neurons, you'll see this grow on again.

  • We have this eight by two, which makes sense.

  • Our input shaped second half is going to be just two on.

  • Then our output shape has this one by eight and that should also make sense.

  • Tons cool.

  • So once we have all of these things kind of set up, we also need to initialize our biases and biases.

  • You don't start out biased Kind of been a weird metaphysical philosophy.

  • Lt's philosophical sense.

  • You aren't born biased.

  • At least don't Well, supposedly on DSO we're going to also initialize our biases to just zeros.

  • And the thing with bias is that essentially, you're adding bias to the product of the weights and whatever they're being multiplied with.

  • So if we think about this kind of matrix style are hidden waits have this shape of, um, I forgot of whatever our neurons number is by the input dimension.

  • So once you multiply that by the input dimension, we then have some resulting shape.

  • If you're familiar with a whole lot of matrix math, then if I take something that is will say four by two, and I multiply that by something that is to buy and then each kind of stacked I don't I don't know why I'm illustrating it like this, but I think that is a very convenient way to imagine how this work each element of the input, is then going to go down that matrix and be summed up into the corresponding column and rope hair.

  • If what I just said didn't make any sense, let me find an example to illustrate that.

  • Ah, it's it's a little tricky to see I would like an example out here.

  • Yeah, so I'm a very visual person.

  • This is not necessary.

  • The biggest image I could have found that might help.

  • Essentially, you're going to take a column from the thing being multiplied and multiply each corresponding entry by the row of the thing multiplying.

  • And then you're going to some of those up and put them in the corresponding.

  • Like I throw Jake column and you'll do that each time on DSO you might into it.

  • That's something that's I was going to end by M.

  • But that's really difficult to hear the difference.

  • All say end by P.

  • So let's say this is en Rose p columns and I multiply that by something that is P Rose.

  • You'll notice that P and P have to be the same in order for this to work p rose by Q.

  • Columns.

  • Then I'll end up with something that's end by Q and that hopefully make sense.

  • If not, uh, yeah, as talks man and I have mentioned go to Khan Academy, maybe practice a little bit with your matrix map, things like that.

  • So it should be seven plus 18 plus Davey three.

  • Uh, yes.

  • Uh, yeah, And so what we just did There is He said okay.

  • Seven times, +19 times to 11 times three.

  • And so on.

  • We're gonna kind of like a fiddle around with how these work.

  • It's not.

  • I even I started to, like, keep track of them in my head at all times.

  • It's not super intuitive.

  • It's not the way humans really think.

  • So Don't worry about it if it's something that bothers you, but you should always be able to write it out and kind of like, Imagine how these work.

  • I'm positive you will see me mess things up Pretty much at least three times on this stream may be more.

  • You know what?

  • We're going to play golf with this.

  • That's what makes it fun.

  • Yeah, it'll be kind of entertaining.

  • I'm sure somebody will be like Wait, no transpose that.

  • There's another thing is we use a lot of matrix terminology like a dot product, multiplication matrix, exponentially ation and transpose.

  • And I think that the 1st 2 are the really key ones like dot product and on transpose.

  • So if we were to transpose this matrix, then you can imagine you, like, flip it across its diagonal eso it would instead of being this like two by three matrix that would now be three by two and you were just kind of rotated around.

  • Now you will notice that if you transpose this first matrix, it is no longer possible to compute this multiplication the stop product, because the rows and columns do not match up so things won't work.

  • Symmetric matrices always retain the square.

  • Major sees always retained the dimensionality because they're in my end.

  • So if I transpose it still in my end, but they don't necessarily retain their properties because they're flipped.

  • But if it's a symmetric and square matrix, then transpose is equivalent to itself.

  • So transposition is the identity function for a symmetric square matrix.

  • What is the best book to learn?

  • JavaScript.

  • Oh, I don't think I haven't seen a recent book.

  • And Joe, I haven't read a recent book and JavaScript, but, um, there's a katana.

  • I feel like anybody now who's going to YouTube and probably find, like an eight hour video on like the basics of Java script.

  • Probably.

  • Yeah, especially, I recommend, probably looking into newer versions of job script like he has six against 2015 Java script because that's kind of becoming the norm.

  • Also apologized to now because our Facebook did not go live s.

  • So I was kind of, like typing and figuring stuff out on the side of units.

  • I missed a lot of what you were explaining.

  • No, it's like it on.

  • I'm sure we'll have to explain it a couple of times, actually, in this stream, eh?

  • So what we're doing right now is we're saying, Okay, we have these kind of our neural architecture are layers have these weights and these biases, and we need to initialize them, eh?

  • So we basically need to initialize all of our neurons the hidden layer on, actually, just in general, I've initialized them all to these uniform waits.

  • Eso Well, actually print or we've been printing out with those kind of look like they're just uniformly distributed numbers from 0 to 1.

  • Um, and essentially, they're they're generally speaking.

  • There are many ways to initialize this I could use, like, I think there's a guy out there.

  • I could use a gamma function here, but I could also use, like, I think it's normal.

  • Yeah, I could use a normal function and do this.

  • There are many different ways to do it.

  • Uniform is just a very kind of like vanilla.

  • If you don't really know how you're like your inputs gonna behave.

  • This is a pretty decent place to start is like a normal distribution.

  • Random number is this is just a uniform distribution, but you could use a normal distribution if you like on give it some other parameters.

  • Like if we wanted to this to be a normal distributed one.

  • I believe I have to give it a location and scale.

  • Oh, no.

  • They initialized to a standard normal so I could equivalently normalize or put these as normal distributions.

  • And then if I print them out, um and actually might be also informative is for me to print this.

  • Um, I'm so proud that music for Mr You know, I learned from this Not that waits, although it actually wouldn't matter.

  • And then if we hoops on the open to standard deviation on the same thing and peed STD, it's really confusing on dumpy and are used different these different abbreviations for standard deviation and it's always very, very confusing.

  • So you'll notice something like a pretty small dataset.

  • Mean is actually non trivially, distant from zero.

  • But sanity aviation is relative again, non trivially, distant from one.

  • But it's like pretty close, eh?

  • So essentially, the larger these become so if we were to use, like, 1600 of these neurons, I'm gonna print those weights anymore.

  • Huge.

  • You'll notice that I mean, is now very close to zero.

  • Eastern Division is now very close to one.

  • You take the stats class, there are ways of measuring what very close means.

  • But essentially, this should have a normal distribution eyes what were saying.

  • But if we switch this back to a uniform distribution, which I'm going to start with, but we can see how swapping these out might change things, although in this example might not be super easy of the expected value of a uniform, distribution from 0 to 1 should be about 10.5.

  • And we see that here, the standard deviation, I don't know off the top of my head, but it's around 0.33 Sounds reasonable to me.

  • So one of these weights eventually going to be used for right?

  • So these weights are our model s.

  • So if you are trying to say like, Oh, I'm saving my model or I want to use it later to make a prediction.

  • This is what we're using.

  • Eyes.

  • Essentially, I have some set of weights, some set of biases, and I want to know how do these How does my model think the input that it's given, given these weights and biases, should be classified.

  • And essentially, what it does is gonna multiply these weights by whatever the input is, it's going to pass them through the layers.

  • And then once we get to the end, it's going to say, Based on what I know and what I am currently set at, I'm going to spit out a number.

  • Okay, so we're reading it for then, eh?

  • So what we would end up wanting is some format of this.

  • We want either zero or one.

  • So it's a binary classifier in this case.

  • But more generally speaking, if we give it a set of classes, we wanted to spit out something that's close to one of those classes, and we'll pretend as if it is what it actually ends up.

  • Spitting out is a probability that it's either of those classes.

  • Andi, in this case, it actually spits out something a little strange, but we'll kind of see how that looks in a moment.

  • And these biases, how do these operate have?

  • What's the relevance of the bias is just basically, like a way to correct based on, like, I'm totally right, but I'm just like, off every time.

  • Like every time I answer, I got, like, the pattern, right, But I'm off by a little bit on, so I'm just gonna say, Okay, we're gonna add in this bias term that says you're biased.

  • Uh, and we're gonna just correct back towards whatever you should have been because you've got the pattern, right?

  • You're just consistently shifted by one or by some number.

  • And so that's what that bias does.

  • It kind of just correct you back towards wherever you should have been.

  • This is not to be confused with the regular riser, which says every time your model gets like to complex, we punish you more on we won't actually use a regular riser.

  • Not really in this case, but if you looked at any sort of, um, kind of introduction to ml, Usually people start with, like, linear regression, and then they hop into the ridge.

  • Regression and ridge aggression is literally just like a regularized version of linear regression.

  • So I say, Hey, my model gets too complex.

  • You should have a higher loss for that.

  • Um oh, loss is another key term we should define.

  • Loss is a way of measuring how poorly our model is doing.

  • The higher the loss, the worst were doing.

  • Loss in absolute terms like the number three years, is the number five.

  • It's not super meaningful, but in a relative sense, in a super meaningful.

  • So if I have a loss of five versus a loss of three vs loss of 0.1 then the ladder is the best version of our model.

  • Given our Donna lost deals a lot with, like a lot of the ways that people come up with these loss functions, either deal with, like, uh, um, white, maybe for a linear functions routine square there.

  • So I take the square root of the mean of the squared errors from some given knowledge that I have on this and that would be supervised learning.

  • I know the answers of some of the data set, and I can tell you how far off your model is on those answers.

  • There are other forms of lost like the one we use here, which is categorical cross entry.

  • Wow, that sounds awful when you say that love.

  • And that's not nearly as terrible sounding as well as it's not nearly as terrible as it sounds.

  • It basically just says Entropy is a measure of kind of disorder on DDE, in this sense can actually be interpreted as a form of error and categorical and cross mean that we have a categorical problem.

  • So we have several categories were trying to distinguish between them and then cross means across these categories.

  • What is the entropy is where that comes in.

  • And so we're essentially saying, given the categories that we have given all the light classes that were trying to form, then what is my like?

  • Set of mistakes are like How large ermine mistakes on the reason we don't use like a root, mean squared error is that we're essentially we're predicting on something that's nonlinear, so using some sort of non existent thing will not.

  • Trying to fit a win here function to something that is by nature, not linear, might result in us not having what's called a like convex function on.

  • And that's because we've been Can't minimize.

  • We can't come to some global minimum.

  • I think it's like the swag says I took too much Adderall.

  • Nope.

  • I actually don't even drink coffee.

  • I just have too much energy.

  • I wish I could drink coffee, because sometimes I get really tired, but well, um, no coffee for me when I do drink coffee.

  • I'm wild, like, going and nothing can stop me.

  • I just can't sleep so wild.

  • Anyway, I'm gonna just double check myself.

  • I think I'm right.

  • But I would like to just be sure.

  • Um cool.

  • And I call these other things in my reference sheet, but it's okay.

  • Um and yes, s R O C s.

  • Thank you.

  • Uh, they say in defense, things do involve a lot of the neural networks to involve a lot of math in the background of how they work.

  • Can't help it unless you want it to be a complete black box.

  • I thoroughly agree.

  • We're basically trying to avoid the pattern of a lot of tutorials on the Internet, which is like, just use tensorflow or caress or get away from, like, the math.

  • We're trying to show that the math is not necessarily super crazy, though it does require, like some algebra, some linear algebra and a little bit of like probability theory.

  • So you do need some of those things, but after that, it becomes relatively accessible.

  • So now that we have kind of our hidden waits, let's take a look at what it looks like to multiply things across.

  • So what I'm gonna do is I just want to print out what happens if I do like a hidden waits multiplied by our ex dot t.

  • So we know that X dot T is actually now to buy n where n is the number of samples.

  • It's two by four in this case because sports How many samples we have on dhe now, based on the facts that are hidden waits is excuse me four, which is the number of neurons by the input dimension.

  • This should match up, and we should get some reasonable number.

  • Now, this isn't necessarily exactly what we wanted.

  • Yeah, sorry.

  • Whoa.

  • That's a lot.

  • Oh, right.

  • I also put this up here.

  • Forget about that.

  • We're also going to actually just do nps dot method to kind of clarify what exactly we're using We don't want an element y's multiplication.

  • So this is the actual dot product of the two multiplication.

  • When NPR win Dumpy uses, it is actually just like an element y's multiplication.

  • So that's not what I wanted to dio the dot product is exactly what I wanted to do.

  • So I take my hidden weights and I say, Okay, now multiply those by whatever our ex our input is.

  • And once we have that, we can say OK, well, if we were to do this for just one X input, we should end up with something that's just one of those rows on dhe that's pretty reasonable from there.

  • We're gonna just use the rest of them.

  • David in the chat, having appearance on David.

  • Um, yeah.

  • So hopefully I almost slipped back into, like, TF mode where I was, like, and any questions and wait for just not since school so we can get this dot product.

  • It's not necessarily super meaningful to us, but we do have one now.

  • From here we're trying to do is then add in our bias, so adding in bias in this case isn't really gonna tell us a whole lot.

  • Um, it doesn't necessarily offset anything.

  • We don't gain anything from that.

  • Bias is already still zero.

  • So we're like, Well, cool without it.

  • Zero to nothing so that there's nothing interesting there.

  • But we do need to introduce something called like a nominee.

  • Garrity toe what we're doing.

  • And essentially, what that does is it helps us kind of in the linen non linearity functions were using.

  • It helps us distinguish more clearly between two classes, but also in general, the problem we're trying to solve is kind of, by definition, nonlinear.

  • So the features that were using we want to ensure that we understand something about their interaction rather than assuming that their interaction is linear.

  • I believe what I used in my references on rectify.

  • Oh, no, I used tan.

  • Okay.

  • Cool, eh?

  • So what I did was I said, and you can see this Here is I took the tan.

  • Oh, are the 10 h sorry.

  • The hyperbolic tan of this result, eh?

  • So what I'm gonna do is I'm gonna say Okay, tannic.

  • You'll notice that this comes up all the time when we're talking about machine learning stuff and all of it's like, you know, siblings and cousins come up too.

  • So Tan h is amongst a series of functions that uses were called activations.

  • What we're gonna do is implement those ourselves.

  • So we're gonna build Tan H.

  • It's not crazy.

  • None of these functions are particularly wild, but we do have to define it.

  • And so tan h is essentially just sign h over co sign a TSH.

  • And if you're familiar with the approximation of those, uh, your approximations to those functions, you can actually manipulate them.

  • Using their, uh, non really valued equivalents were like e to the I exercise equipment like co sign of X minus four plus I sine of exercise something along those lines on.

  • Because of that property, we can actually get a numerator for tan H that looks something like e to the very sideways NPD x p of, um, like two times X minus one.

  • And then we'll get a denominator that looks like something like this, two times X plus one, and then we'll return the numerator divided by the denominator.

  • Now, if we want to see what this function looks like, actually just comment this out for a second on going to graft with that function might look like I'm actually, I'm gonna have to do this several times.

  • So I'm actually just define a plot.

  • FX on this has given some FX than what I want to dio is our kind of X is equivalent to numb Peed out lin space first we said plant effects that they met like a sex that is very, very, very close.

  • Not quite on.

  • And then I'll do like a plot stop figure plots Dodds Scatter of x by fx of x was wild.

  • Uh, that was confusing.

  • Don't worry.

  • I confuse myself as well, and then I just need plots.

  • So we're gonna import Matt.

  • Plot live.

  • Thank you.

  • Oh, no.

  • So close as plot.

  • Okay, so now I can plant any function more or less mess anything up.

  • Now, I should be able to plot FX of tan h.

  • Let's see what it looks like.

  • So if I run this, something happened.

  • And here's what are tan h function.

  • Looks like I have definitely seen this kind of a curve.

  • Oh, yeah.

  • You'll notice.

  • This curve comes up all the time and activations.

  • It's actually really interesting because you might be wondering why is this curve so useful on?

  • It's because of this property, where it roughly approximates what the step function might look like.

  • Eso, let's say for everything less than zero.

  • I'm just at negative one and everything greater than zero.

  • I'm at one, and there's no intersection in between.

  • That would be kind of a perfect classifier, right?

  • You're either one or the other like, uh, I guess.

  • Binary logic Like circuits?

  • Exactly.

  • Yeah, it's literally where it comes from somewhere within this square way of lodging.

  • The problem is that that's not quite continuous.

  • I can't differentiate that conveniently now.

  • You might say, Oh, the differential of that is ex, but it's a little bit difficult to manipulate.

  • So we use this very nice, continuous function that I then understand the mathematical properties.

  • Now there are several of these.

  • Someone in the chat says it looks sigmoid.

  • You're right.

  • In fact, why don't we plot the sigmoid function just for fun?

  • Z's We'll need it later anyway.

  • So let's define the sigmoid of X is I think we could just return like one point divided by I don't know.

  • It's like one plus NPD x p of negative X correct me if I'm wrong, but I think that's about right On, def.

  • We plot that plot effects of sigmoid, then weaken Control Sierra that eyes forget Hi hi.

  • Plot doesn't really like control seeing eso.

  • Here's our tan age function and you'll notice that it looks very similar to our sigmoid function.

  • Now 10 h is a little bit stricter.

  • You'll notice I can't really switch back and forth between the two, but I'm sorry, uh, I didn't pluck him quite like that, but you'll notice that our tan H function had a very slick, harsh, almost vertical line in the middle.

  • Whereas our sigmoid function actually is a little bit softer, it's a little bit more generous, if you will, about how it plots things.

  • So because of that, it ends up being a little bit less convenient since we know we have exactly two classes, but you'll see sigmoid activations all the time.

  • These kinds of activation functions are useful because they push things in tow one or the other, except when it's uncertain and this uncertainty can kind of be quantified by where you are on this curve.

  • So, basically, if you're in an extreme case, you're just one of the other.

  • If you're not in an extreme case, then I can't necessarily tell.

  • And I might give you some probability using something called Soft Max on where you are Now, if that doesn't make any sense, sir, if all of these sorts of things look crazy than don't worry, you're welcome to go Google them.

  • They have very, like, similar.

  • They have the same names kind of all over the place, so don't worry about it, but thes they're kind of the two.

  • We're gonna use tan age for now, but we'll use other ones for later on now.

  • Oh, I was gonna say thanks to Graham Walton saying Good night, people see?

  • Well, tomorrow I want to say thanks for tuning in and some other people ask Annie Babic asked if it was some Oh Taylor Siri's they were talking about earlier.

  • Um, so bad ignite.

  • You're asking.

  • I'm not exactly sure which part you're asking is similar to a tailor.

  • Siri's for those that are familiar.

  • A tailor syriza's eh way of kind of approximating functions by expanding out their terms, for example, like the Taylor expansion of, like eat of the exes like one plus X plus x squared or something on.

  • And you can essentially just use the 1st 2 terms and it always works.

  • Not quite.

  • In this case, there is actually like a complex relation with E N H approximation.

  • Yeah.

  • So this is Oilers formula.

  • Oh, cool.

  • I got it right.

  • So it's either the i X equals, like, co sign of explosive.

  • I sign of X.

  • And essentially, when you're going into like tan H, you can use, I believe something like that Haven't quite seen 10 h in a long time, or at least the derivation of it.

  • How is your Wikipedia style differently?

  • Oh, yeah.

  • So I have a plug in for an extension.

  • I love extensions.

  • I customize everything.

  • My extension for chrome rehashes like reformat the like text over things.

  • That's fascinating.

  • Yeah, So there's all sorts of things you can do here.

  • They kind of walk through how, like hyperbolic sign could be approximated like this and co sign and so on.

  • And you divide one by the other s o.

  • I guess my oil is formula.

  • Usage was not quite here, although you could presumably also do that that way on and I think someone said in the chat that Taylor Siri's for Like, polynomial functions, you can use it for polynomial functions often used that way.

  • It also works totally fine as approximations for like Sinus oil functions or like lower order polynomial functions.

  • It works for general functions as well.

  • It becomes less useful in a lot of cases where the behavior changes away from zero.

  • But it is generally you're absolutely correct that it's generally used for polynomial functions, like usually if I have to get you to the axe ride like some weird polynomial, and I just want to approximate its behavior, you can do that.

  • You also like the MacLaurin series, just like a specific I was forget.

  • If it's specific case of Taylor the other way around, one of them is centered on Lee.

  • At zero, I believe Taylor centered at zero.

  • I maybe make Lauren is centered at zero.

  • Taylor has other centers on.

  • Essentially because of that, you can, like approximate the behavior of a function at any input on Ben.

  • You can generalize that to multidimensional inputs and also higher order functions, so you can use it for like tricking a metric functions as well.

  • It's not necessarily as like informative at all times, but it can be on then.

  • As Al Gore said, I never thought I'd say that.

  • Uh, said, If you use like a signal processing her systems class, they used for a A series, which is to approximately periodic functions and for a series of rage transforms have all sorts of beautiful properties such that like random noises, really high frequency.

  • And if I do a fast freight transform on an input signal of amplitude Sze, then I can kind of ignore the noise because it's going to be very high frequency and then the at least usually noises high frequency compared to the actual signal that I'm looking for.

  • And I can use that to clean signals.

  • There's all sorts of cool things you can do with those.

  • There's many different kinds of Siri's that could be used for a serious just happens to be very, very commonly used.

  • Signal processing is a very useful course to take.

  • I actually take it last semester for bio engineering, so I ended up absorbing lots and lots of information about a class that CS undergrad is typically take no it's not required for CS.

  • It's required for bio.

  • For all of our engineering concentrators, actually, maybe not all engineering, I think just by a week.

  • So yeah, um, having from there, we now have this kind of, like hidden.

  • We'll call it results, if you will on and that's equivalents to just whatever are hidden layer spits back at us now from there we want to do is then say OK, well, what is our output results?

  • What do we do with this?

  • These hidden layers pieces.

  • And I just kind of what I like to clarify which version I used.

  • Okay, I used a signal for that.

  • Cool.

  • So then here, when we want to see what our output.

  • And I'm gonna actually just call this a prediction.

  • This is going to be I'll do it the same way that I did the previous one.

  • We're going to again to adopt product and again of its weight.

  • Our output waits, But this time it's going to be of our hidden results on.

  • Then we're going to add in the output bias, and here we can print out what these predictions kind of look like.

  • Calling it prediction is a little misleading because we currently have not set up a full neural network.

  • But you'll notice this is slightly different from what we have before, but not terribly distinct.

  • And then we're gonna do is apply the sigmoid function to that.

  • And so again, it's nonlinear rised out activation and you'll notice that it also eliminates that, like, zero term here.

  • Sorry.

  • Sorry.

  • L C s says that ice cream will go about two hours, two hours, 15 minutes Going near my streams of general that pretty much exactly over radio and hopefully we'll get to the end.

  • We're related, and that was a tunnel work.

  • So next time we're gonna do the same thing.

  • We didn't like 50 lines code in, like, one using Ted's here flow.

  • But that is kind of the end in sight, if you will.

  • So, yeah, we've applied the sigmoid activation function to this and you might go Okay, that doesn't really look like a great prediction, but sure.

  • And what you'll then want thio have the ability to do is say, how do I, like get a probability of these?

  • And probabilities should only range from 0 to 1 and so on So what I'm going to do is called a soft max function.

  • What this does is it says, Okay, let's take either the x o.

  • That's just not that's how you meth, right?

  • That, uh, eat of the exes are numerator and denominator is actually just the sum of all of those.

  • S

All right.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it