Placeholder Image

Subtitles section Play video

  • Classification lets us pick one or the other or some small number of labels for our data

  • The problem is that real life doesn't fit into these neat little categories

  • When we have label data there isn't yes or no or a B or C or some labels?

  • Right, then we have what we call a regression problem. We're actually trying to predict actual outputs, right so given these inputs

  • What's the temperature at which something will occur or?

  • Given this movie on a streaming site and the attributes and the people that have watched it

  • What amount of action is it right because that informs who should watch that movie

  • There's lots of times when you don't want to say--but sees this and isn't this you want to say it's a little bit of this

  • And a little bit of this

  • and that's what regression is for and some of the algorithms we use for regression are actually quite similar to

  • Classify. So for example, you can regress using a support vector machine or support vector of aggressor, right?

  • But we also use other ones like so we're more likely to use things like linear regression and things like this

  • So let's start off with perhaps for simplest form of regression. That's linear regression, right?

  • It might not occur to people who use linear regression for actually what you're doing is machine learning

  • But you are let's imagine we have just data that's got one input

  • so one attribute attribute one and

  • Our output which is why this is our table of data just like before and this is our instance data

  • So we've got one two, three four like this

  • so what we want to do is we want to input attribute one and

  • We want to output Y which instead of being a yes or no is going to be some number on a scale

  • Let's say between Norton one. So really what we're trying to do is we've got our graph here of our input variable

  • Attribute one and we've got our Y output and these are our data points in our training set

  • So here like this and they sort of go up like this

  • what we're going to do using linear regression is fit a line through this data and a line is of the form y

  • equals MX plus C

  • so in this case M is going to be the gradient of our line and C is going to be B intercept so in this

  • Case I guess something along the lines of this straight up like this

  • So if our M was one in this case M

  • Equals one or maybe equals one point two to make it slightly more interesting and then our C is going to be let's say C

  • His naught point naught to these are the values that we're going to learn using linear regression

  • So, how do we train something like this?

  • What we're going to do is we want to find the values for our unknowns which are M and C

  • Given a lot of x and y pairs, right?

  • So we've got our x and y pairs here and we want to predict these values the optimal values for this data set

  • So we're going to find values for M. And C where this distance the prediction error is minimized the better fit

  • This line is the average prediction error is going to go down if this line is over here

  • It's going to be a huge error. And so the hope is that if we predict this correctly and we have an M

  • And we have a C then when we come up with a new

  • Value that we're trying to predict we can pass it through this formula. We can multiply it by 1.2 and then add

  • 0.02 and that will produce our prediction for y and hopefully that would be quite close to what it is

  • So for example, let's imagine. We have a new value for attribute 1. Let's come in here

  • We're gonna look up here and this is going to be the prediction for our Y and that's the output of our aggressor

  • So this linear regression is capable of producing

  • Predictions based on its attribute now if we have more than one attribute

  • This is called multivariate linear regression and the principle is exactly the same is this we're going to have lots of these multiplier ends

  • We could say something like Y is

  • m1 x1 plus

  • m2 x2 and so on for all of our different attributes

  • so it's going to be a linear combination a bit like PCA a linear combination of

  • These different attributes and it's obviously going to be multi-dimensional

  • So one interesting thing about linear regression is but what it's going to do is predict us a straight line

  • regardless of how many dimensions we've got now sometimes if we want to use this for a classification

  • Purpose we still can all right

  • Now I'm supposed to be talking about regression not classification

  • But just briefly if you indulge me we can pass this function through something called a logistic function or in the sigmoid curve

  • And we can squash it into something. There's this shape

  • And now what we're doing is we're pushing our values up to 1 and down to 0

  • Right and that is our classification between 1 and 0

  • So it is possible to perform linear regression using this additional logistic function to perform

  • Classification and this is called logistic regression. I

  • Just what I mention, but that's something you will see being done on some data

  • So let's talk a little bit about something more powerful

  • That's artificial neural networks

  • now

  • Anytime in the media at the moment when you see the term AI what they're actually talking about is machine learning and what they're talking

  • About is some large neural network. Now. Let's keep it a little bit smaller

  • Let's imagine what we want to do is take item for attributes and map them to some prediction some regressed value, right?

  • How are we going to do this?

  • Well, what we can do is we can essentially combine a lot of different linear regressions through some nonlinear functions into a really powerful

  • Regression algorithm, right. So let's imagine that we have some data which has got three inputs

  • So we've got our instances and we've got our attributes a B and C. Our inputs are a B and C

  • And then we have some hidden New Orleans right and I explained a neuron in a moment

  • Then we have an output value that we'd like to address. This is where we're trying to predict the value

  • So, you know how much disease does something have how hot is it these kind of things depending on our attributes?

  • this is where we put in a this is where we put in B and this is where we put in C and

  • Then we perform a weighted sum of all of these things for each of these neurons

  • So for example this neurons going to have three inputs from these three here and this is going to have weight one

  • This is going to be weight - this is going to be weight three

  • And we're gonna do a weighted sum just like in linear regression

  • So we're going to do weight one times a plus weight two times B plus weight three times c add

  • them together and then we're going to add any bias that we want to so this is going to be plus some bias and that's

  • Going to give us a value for this neuron, which let's call it hidden want right because this is generally speaking

  • We don't look at these values. It's not too important. We're going to do a different sum for this one

  • So I'm going to all them in different colors so we don't get confused. So this has got three weights

  • So this is going to be a different way

  • This is going to have another different weight

  • And we're going to do this much times a Plus this much times B plus this much times C

  • Add them all up add a plus a bias and we're going to get hidden - and we're going to do the same thing

  • With these ones here like this

  • This is going to be hidden three hidden for hidden five and so on for as far as we like to go

  • All right

  • now

  • the nice thing about this is for each of these can calculate a different weighted sum now the problem is that if we just did

  • This then what happens is we actually get a series of linear regressions

  • All right

  • because this is just multivariate linear regression and in the end our

  • Algorithm doesn't end up any good right? If you combine multiple linear functions together, you just get one different linear function

  • so we pass all of these hidden values through a nonlinear function like a sigmoid or

  • Tan so a sigmoid goes between naught and 1 so this is not than 1 and a tan

  • Hyperbolic tangent will go between minus 1 and 1

  • Things like this and what that will do is add a sufficiently complex

  • Function that when we combined them all together

  • We can actually get quite a powerful algorithm the way this works is we put in a B and C

  • We calculate all the weighted sums through these functions into our hidden units and then we calculate another series of weighted sums

  • so add together to be our final output and this will be our final output prediction Y now the way we train this is

  • we're going to put in lots and lots of test data where we have the values for a b c and we know what the

  • Output should have been we go through the network and then we say, well actually we were a little bit off

  • So can we change all of these weights so that next time we're a little bit closer to the correct answer and let's keep doing

  • this over and over again in a process called gradient descent and

  • Slowly settle upon some weights where for the most part when we put in our a B and C

  • We get what we want out the other side now, it's unlikely to be perfect

  • but just like with the other machine learning as we've talked about we're going to be trying to make our

  • Prediction on our testing set as good as possible

  • All right

  • So we've put in a lot of training data and hopefully when we take this network and apply it to some new data it also

  • Performs. Well, let's look at an example

  • We looked at credit checks in the previous video and we will classify whether or not someone should be given credit

  • Well something that we cut we often calculate is credit rating

  • which is a value from let's say naught to 1 of

  • How good your credit score is so a could be how much money you have in your bank B could be whether you have any

  • Loans and C could be whether you own a car and obviously there's going to be more of these because you can't make a decision

  • On this those three things. So what we do is we get a number of people that we've already made decisions about right?

  • so we know the person a has a bank account balance of five thousand two thousand in loans, and he does own a car and

  • He has a credit rating of 75 or Northpoint 75 whatever your scale is

  • So we put this in we sieze wait

  • So but this is the correct

  • Prediction and then hopefully when another person comes along with a different set of variables will predict the right thing for them

  • So you can make this network as deep or as big as you want. We're typically

  • Multi-layer perceptrons or artificial neural networks, like this won't be very deep

  • one two three hidden units deep maybe but what's been shown in the literature is but actually

  • If you have a sufficient number of hidden units

  • You can basically model any function like this right as long as you've got sufficient training data to create it

  • So we're going to use Weka again

  • because Weka has lots of

  • regression algorithms built-in like artificial neural networks

  • And linear regression. So let's open up a data set. We're going to use this time

  • So they said we've got is a data set on superconductivity right now

  • Obviously my knowledge of physics is should we say average?

  • But a superconductor is something that when you get it to a critical temperature it becomes it has no resistance

  • Right, which is very useful for electrical circuits

  • And so this is a data set about what are the properties of material and what is the critical temperature?

  • Below, which it will be a superconductor

  • Now, I'm sure there's going to be some physicists in the comments that might point out some areas of what I just said

  • But we'll move on. So we're reading a file. This is quite a big data set

  • So we have a lot of input attributes and then at the end we have this critical temperature that we're trying to predict this

  • temperature if we look at this histogram goes from 0 to

  • 185 if we look at some of the other things so for example, we've got this entropy atomic radius, which I can pretend

  • I know what that is, which goes from naught to two point one four. Is that good?

  • Right, what we're going to do is we're going to start by using

  • Multivariate linear regression to try and predict this critical temperature as a combination of these input features

  • So I'm going to go to classify. There's just one classified tab even for regression

  • we're going to use our same percentage splitters before so 70% and

  • We're going to use a simple linear regression function for this

  • Let's go

  • So we've trained our linear regression and what we want to do now is work out whether it's worked or not on our testing set