Subtitles section Play video Print subtitles Classification lets us pick one or the other or some small number of labels for our data The problem is that real life doesn't fit into these neat little categories When we have label data there isn't yes or no or a B or C or some labels? Right, then we have what we call a regression problem. We're actually trying to predict actual outputs, right so given these inputs What's the temperature at which something will occur or? Given this movie on a streaming site and the attributes and the people that have watched it What amount of action is it right because that informs who should watch that movie There's lots of times when you don't want to say--but sees this and isn't this you want to say it's a little bit of this And a little bit of this and that's what regression is for and some of the algorithms we use for regression are actually quite similar to Classify. So for example, you can regress using a support vector machine or support vector of aggressor, right? But we also use other ones like so we're more likely to use things like linear regression and things like this So let's start off with perhaps for simplest form of regression. That's linear regression, right? It might not occur to people who use linear regression for actually what you're doing is machine learning But you are let's imagine we have just data that's got one input so one attribute attribute one and Our output which is why this is our table of data just like before and this is our instance data So we've got one two, three four like this so what we want to do is we want to input attribute one and We want to output Y which instead of being a yes or no is going to be some number on a scale Let's say between Norton one. So really what we're trying to do is we've got our graph here of our input variable Attribute one and we've got our Y output and these are our data points in our training set So here like this and they sort of go up like this what we're going to do using linear regression is fit a line through this data and a line is of the form y equals MX plus C so in this case M is going to be the gradient of our line and C is going to be B intercept so in this Case I guess something along the lines of this straight up like this So if our M was one in this case M Equals one or maybe equals one point two to make it slightly more interesting and then our C is going to be let's say C His naught point naught to these are the values that we're going to learn using linear regression So, how do we train something like this? What we're going to do is we want to find the values for our unknowns which are M and C Given a lot of x and y pairs, right? So we've got our x and y pairs here and we want to predict these values the optimal values for this data set So we're going to find values for M. And C where this distance the prediction error is minimized the better fit This line is the average prediction error is going to go down if this line is over here It's going to be a huge error. And so the hope is that if we predict this correctly and we have an M And we have a C then when we come up with a new Value that we're trying to predict we can pass it through this formula. We can multiply it by 1.2 and then add 0.02 and that will produce our prediction for y and hopefully that would be quite close to what it is So for example, let's imagine. We have a new value for attribute 1. Let's come in here We're gonna look up here and this is going to be the prediction for our Y and that's the output of our aggressor So this linear regression is capable of producing Predictions based on its attribute now if we have more than one attribute This is called multivariate linear regression and the principle is exactly the same is this we're going to have lots of these multiplier ends We could say something like Y is m1 x1 plus m2 x2 and so on for all of our different attributes so it's going to be a linear combination a bit like PCA a linear combination of These different attributes and it's obviously going to be multi-dimensional So one interesting thing about linear regression is but what it's going to do is predict us a straight line regardless of how many dimensions we've got now sometimes if we want to use this for a classification Purpose we still can all right Now I'm supposed to be talking about regression not classification But just briefly if you indulge me we can pass this function through something called a logistic function or in the sigmoid curve And we can squash it into something. There's this shape And now what we're doing is we're pushing our values up to 1 and down to 0 Right and that is our classification between 1 and 0 So it is possible to perform linear regression using this additional logistic function to perform Classification and this is called logistic regression. I Just what I mention, but that's something you will see being done on some data So let's talk a little bit about something more powerful That's artificial neural networks now Anytime in the media at the moment when you see the term AI what they're actually talking about is machine learning and what they're talking About is some large neural network. Now. Let's keep it a little bit smaller Let's imagine what we want to do is take item for attributes and map them to some prediction some regressed value, right? How are we going to do this? Well, what we can do is we can essentially combine a lot of different linear regressions through some nonlinear functions into a really powerful Regression algorithm, right. So let's imagine that we have some data which has got three inputs So we've got our instances and we've got our attributes a B and C. Our inputs are a B and C And then we have some hidden New Orleans right and I explained a neuron in a moment Then we have an output value that we'd like to address. This is where we're trying to predict the value So, you know how much disease does something have how hot is it these kind of things depending on our attributes? this is where we put in a this is where we put in B and this is where we put in C and Then we perform a weighted sum of all of these things for each of these neurons So for example this neurons going to have three inputs from these three here and this is going to have weight one This is going to be weight - this is going to be weight three And we're gonna do a weighted sum just like in linear regression So we're going to do weight one times a plus weight two times B plus weight three times c add them together and then we're going to add any bias that we want to so this is going to be plus some bias and that's Going to give us a value for this neuron, which let's call it hidden want right because this is generally speaking We don't look at these values. It's not too important. We're going to do a different sum for this one So I'm going to all them in different colors so we don't get confused. So this has got three weights So this is going to be a different way This is going to have another different weight And we're going to do this much times a Plus this much times B plus this much times C Add them all up add a plus a bias and we're going to get hidden - and we're going to do the same thing With these ones here like this This is going to be hidden three hidden for hidden five and so on for as far as we like to go All right now the nice thing about this is for each of these can calculate a different weighted sum now the problem is that if we just did This then what happens is we actually get a series of linear regressions All right because this is just multivariate linear regression and in the end our Algorithm doesn't end up any good right? If you combine multiple linear functions together, you just get one different linear function so we pass all of these hidden values through a nonlinear function like a sigmoid or Tan so a sigmoid goes between naught and 1 so this is not than 1 and a tan Hyperbolic tangent will go between minus 1 and 1 Things like this and what that will do is add a sufficiently complex Function that when we combined them all together We can actually get quite a powerful algorithm the way this works is we put in a B and C We calculate all the weighted sums through these functions into our hidden units and then we calculate another series of weighted sums so add together to be our final output and this will be our final output prediction Y now the way we train this is we're going to put in lots and lots of test data where we have the values for a b c and we know what the Output should have been we go through the network and then we say, well actually we were a little bit off So can we change all of these weights so that next time we're a little bit closer to the correct answer and let's keep doing this over and over again in a process called gradient descent and Slowly settle upon some weights where for the most part when we put in our a B and C We get what we want out the other side now, it's unlikely to be perfect but just like with the other machine learning as we've talked about we're going to be trying to make our Prediction on our testing set as good as possible All right So we've put in a lot of training data and hopefully when we take this network and apply it to some new data it also Performs. Well, let's look at an example We looked at credit checks in the previous video and we will classify whether or not someone should be given credit Well something that we cut we often calculate is credit rating which is a value from let's say naught to 1 of How good your credit score is so a could be how much money you have in your bank B could be whether you have any Loans and C could be whether you own a car and obviously there's going to be more of these because you can't make a decision On this those three things. So what we do is we get a number of people that we've already made decisions about right? so we know the person a has a bank account balance of five thousand two thousand in loans, and he does own a car and He has a credit rating of 75 or Northpoint 75 whatever your scale is So we put this in we sieze wait So but this is the correct Prediction and then hopefully when another person comes along with a different set of variables will predict the right thing for them So you can make this network as deep or as big as you want. We're typically Multi-layer perceptrons or artificial neural networks, like this won't be very deep one two three hidden units deep maybe but what's been shown in the literature is but actually If you have a sufficient number of hidden units You can basically model any function like this right as long as you've got sufficient training data to create it So we're going to use Weka again because Weka has lots of regression algorithms built-in like artificial neural networks And linear regression. So let's open up a data set. We're going to use this time So they said we've got is a data set on superconductivity right now Obviously my knowledge of physics is should we say average? But a superconductor is something that when you get it to a critical temperature it becomes it has no resistance Right, which is very useful for electrical circuits And so this is a data set about what are the properties of material and what is the critical temperature? Below, which it will be a superconductor Now, I'm sure there's going to be some physicists in the comments that might point out some areas of what I just said But we'll move on. So we're reading a file. This is quite a big data set So we have a lot of input attributes and then at the end we have this critical temperature that we're trying to predict this temperature if we look at this histogram goes from 0 to 185 if we look at some of the other things so for example, we've got this entropy atomic radius, which I can pretend I know what that is, which goes from naught to two point one four. Is that good? Right, what we're going to do is we're going to start by using Multivariate linear regression to try and predict this critical temperature as a combination of these input features So I'm going to go to classify. There's just one classified tab even for regression we're going to use our same percentage splitters before so 70% and We're going to use a simple linear regression function for this Let's go So we've trained our linear regression and what we want to do now is work out whether it's worked or not on our testing set