B1 Intermediate 6 Folder Collection
After playing the video, you can click or select the word to look it up in the dictionary.
Report Subtitle Errors
Classification lets us pick one or the other or some small number of labels for our data
The problem is that real life doesn't fit into these neat little categories
When we have label data there isn't yes or no or a B or C or some labels?
Right, then we have what we call a regression problem. We're actually trying to predict actual outputs, right so given these inputs
What's the temperature at which something will occur or?
Given this movie on a streaming site and the attributes and the people that have watched it
What amount of action is it right because that informs who should watch that movie
There's lots of times when you don't want to say--but sees this and isn't this you want to say it's a little bit of this
And a little bit of this
and that's what regression is for and some of the algorithms we use for regression are actually quite similar to
Classify. So for example, you can regress using a support vector machine or support vector of aggressor, right?
But we also use other ones like so we're more likely to use things like linear regression and things like this
So let's start off with perhaps for simplest form of regression. That's linear regression, right?
It might not occur to people who use linear regression for actually what you're doing is machine learning
But you are let's imagine we have just data that's got one input
so one attribute attribute one and
Our output which is why this is our table of data just like before and this is our instance data
So we've got one two, three four like this
so what we want to do is we want to input attribute one and
We want to output Y which instead of being a yes or no is going to be some number on a scale
Let's say between Norton one. So really what we're trying to do is we've got our graph here of our input variable
Attribute one and we've got our Y output and these are our data points in our training set
So here like this and they sort of go up like this
what we're going to do using linear regression is fit a line through this data and a line is of the form y
equals MX plus C
so in this case M is going to be the gradient of our line and C is going to be B intercept so in this
Case I guess something along the lines of this straight up like this
So if our M was one in this case M
Equals one or maybe equals one point two to make it slightly more interesting and then our C is going to be let's say C
His naught point naught to these are the values that we're going to learn using linear regression
So, how do we train something like this?
What we're going to do is we want to find the values for our unknowns which are M and C
Given a lot of x and y pairs, right?
So we've got our x and y pairs here and we want to predict these values the optimal values for this data set
So we're going to find values for M. And C where this distance the prediction error is minimized the better fit
This line is the average prediction error is going to go down if this line is over here
It's going to be a huge error. And so the hope is that if we predict this correctly and we have an M
And we have a C then when we come up with a new
Value that we're trying to predict we can pass it through this formula. We can multiply it by 1.2 and then add
0.02 and that will produce our prediction for y and hopefully that would be quite close to what it is
So for example, let's imagine. We have a new value for attribute 1. Let's come in here
We're gonna look up here and this is going to be the prediction for our Y and that's the output of our aggressor
So this linear regression is capable of producing
Predictions based on its attribute now if we have more than one attribute
This is called multivariate linear regression and the principle is exactly the same is this we're going to have lots of these multiplier ends
We could say something like Y is
m1 x1 plus
m2 x2 and so on for all of our different attributes
so it's going to be a linear combination a bit like PCA a linear combination of
These different attributes and it's obviously going to be multi-dimensional
So one interesting thing about linear regression is but what it's going to do is predict us a straight line
regardless of how many dimensions we've got now sometimes if we want to use this for a classification
Purpose we still can all right
Now I'm supposed to be talking about regression not classification
But just briefly if you indulge me we can pass this function through something called a logistic function or in the sigmoid curve
And we can squash it into something. There's this shape
And now what we're doing is we're pushing our values up to 1 and down to 0
Right and that is our classification between 1 and 0
So it is possible to perform linear regression using this additional logistic function to perform
Classification and this is called logistic regression. I
Just what I mention, but that's something you will see being done on some data
So let's talk a little bit about something more powerful
That's artificial neural networks
Anytime in the media at the moment when you see the term AI what they're actually talking about is machine learning and what they're talking
About is some large neural network. Now. Let's keep it a little bit smaller
Let's imagine what we want to do is take item for attributes and map them to some prediction some regressed value, right?
How are we going to do this?
Well, what we can do is we can essentially combine a lot of different linear regressions through some nonlinear functions into a really powerful
Regression algorithm, right. So let's imagine that we have some data which has got three inputs
So we've got our instances and we've got our attributes a B and C. Our inputs are a B and C
And then we have some hidden New Orleans right and I explained a neuron in a moment
Then we have an output value that we'd like to address. This is where we're trying to predict the value
So, you know how much disease does something have how hot is it these kind of things depending on our attributes?
this is where we put in a this is where we put in B and this is where we put in C and
Then we perform a weighted sum of all of these things for each of these neurons
So for example this neurons going to have three inputs from these three here and this is going to have weight one
This is going to be weight - this is going to be weight three
And we're gonna do a weighted sum just like in linear regression
So we're going to do weight one times a plus weight two times B plus weight three times c add
them together and then we're going to add any bias that we want to so this is going to be plus some bias and that's
Going to give us a value for this neuron, which let's call it hidden want right because this is generally speaking
We don't look at these values. It's not too important. We're going to do a different sum for this one
So I'm going to all them in different colors so we don't get confused. So this has got three weights
So this is going to be a different way
This is going to have another different weight
And we're going to do this much times a Plus this much times B plus this much times C
Add them all up add a plus a bias and we're going to get hidden - and we're going to do the same thing
With these ones here like this
This is going to be hidden three hidden for hidden five and so on for as far as we like to go
All right
the nice thing about this is for each of these can calculate a different weighted sum now the problem is that if we just did
This then what happens is we actually get a series of linear regressions
All right
because this is just multivariate linear regression and in the end our
Algorithm doesn't end up any good right? If you combine multiple linear functions together, you just get one different linear function
so we pass all of these hidden values through a nonlinear function like a sigmoid or
Tan so a sigmoid goes between naught and 1 so this is not than 1 and a tan
Hyperbolic tangent will go between minus 1 and 1
Things like this and what that will do is add a sufficiently complex
Function that when we combined them all together
We can actually get quite a powerful algorithm the way this works is we put in a B and C
We calculate all the weighted sums through these functions into our hidden units and then we calculate another series of weighted sums
so add together to be our final output and this will be our final output prediction Y now the way we train this is
we're going to put in lots and lots of test data where we have the values for a b c and we know what the
Output should have been we go through the network and then we say, well actually we were a little bit off
So can we change all of these weights so that next time we're a little bit closer to the correct answer and let's keep doing
this over and over again in a process called gradient descent and
Slowly settle upon some weights where for the most part when we put in our a B and C
We get what we want out the other side now, it's unlikely to be perfect
but just like with the other machine learning as we've talked about we're going to be trying to make our
Prediction on our testing set as good as possible
All right
So we've put in a lot of training data and hopefully when we take this network and apply it to some new data it also
Performs. Well, let's look at an example
We looked at credit checks in the previous video and we will classify whether or not someone should be given credit
Well something that we cut we often calculate is credit rating
which is a value from let's say naught to 1 of
How good your credit score is so a could be how much money you have in your bank B could be whether you have any
Loans and C could be whether you own a car and obviously there's going to be more of these because you can't make a decision
On this those three things. So what we do is we get a number of people that we've already made decisions about right?
so we know the person a has a bank account balance of five thousand two thousand in loans, and he does own a car and
He has a credit rating of 75 or Northpoint 75 whatever your scale is
So we put this in we sieze wait
So but this is the correct
Prediction and then hopefully when another person comes along with a different set of variables will predict the right thing for them
So you can make this network as deep or as big as you want. We're typically
Multi-layer perceptrons or artificial neural networks, like this won't be very deep
one two three hidden units deep maybe but what's been shown in the literature is but actually
If you have a sufficient number of hidden units
You can basically model any function like this right as long as you've got sufficient training data to create it
So we're going to use Weka again
because Weka has lots of
regression algorithms built-in like artificial neural networks
And linear regression. So let's open up a data set. We're going to use this time
So they said we've got is a data set on superconductivity right now
Obviously my knowledge of physics is should we say average?
But a superconductor is something that when you get it to a critical temperature it becomes it has no resistance
Right, which is very useful for electrical circuits
And so this is a data set about what are the properties of material and what is the critical temperature?
Below, which it will be a superconductor
Now, I'm sure there's going to be some physicists in the comments that might point out some areas of what I just said
But we'll move on. So we're reading a file. This is quite a big data set
So we have a lot of input attributes and then at the end we have this critical temperature that we're trying to predict this
temperature if we look at this histogram goes from 0 to
185 if we look at some of the other things so for example, we've got this entropy atomic radius, which I can pretend
I know what that is, which goes from naught to two point one four. Is that good?
Right, what we're going to do is we're going to start by using
Multivariate linear regression to try and predict this critical temperature as a combination of these input features
So I'm going to go to classify. There's just one classified tab even for regression
we're going to use our same percentage splitters before so 70% and
We're going to use a simple linear regression function for this
Let's go
So we've trained our linear regression and what we want to do now is work out whether it's worked or not on our testing set
We've got the variables. We wanted Y and we've got the variables that have been predicted Y hat and
Hopefully they're exactly the same if they're exactly the same then they're going to be on a straight line like this
So we were hoping to get a why down here and we it now, of course this won't actually happened
What will happen is these wines are ever so slightly different than the Y's
we were expecting so you might see a bit of noise around the center like this and
The way we would normally measure this is something called mean absolute error or mean squared error or root mean squared error
Which all very similar ways to measure the same thing
It's to measure what is the average distance between what we wanted and what we got
so if we were hoping to get away of North Point - but we actually got a Y of North Point for then our
Mistake was we were not point to too high
And so for every single instance in our test set we can sum up all of the areas we've got and we can work out
What the average error was right. So we have a hundred in our test set
We sum up the errors and we divide by a hundred and that tells us I mean error was a certain amount
What will sometimes happen is your predictions will be above or below right?
and so your actual mean error might be zero because half a time you predicted too high half a time you predicted too low and
So on average, you've got it exactly right. Obviously, that's not correct
So what we tend to do is calculate something called mean absolute error
So essentially if you're too small, we just remove the minus sign and call call it an error of that amount
All right
So if your mean absolute error is nour point four then what that's saying is but on average you're naught point far away
Live above or below than where you were hoping to be
It's also quite common to see similar measures like root mean squared error for every instance
We take our error we square it we sum them all up and then right at the end
We take a square root, right?
And again, this is a very similar measure to mean absolute error like the squaring. We move our negative symbols for us
It's also quite common particularly in
fields like biology and medicine to see something called R squared or the R squared coefficient and this is essentially the
Correlation squared it's a measure of how well or how tightly correlated our
Predictions and our ground truth were for example
This would be a pretty good correlation if maybe naught point eight or nor point nine if these were our points like this and were
Absolutely. Perfect. That would be an R squared of one if our points were everywhere
That will be an R squared of 0 and what I saying is it's a value between 0 and 1 that tells
How well we predicted zero means you basically didn't predict anything at all
It was completely random output one means you predicted everything exactly, correct
Now, of course that's unlikely to happen or a test set
What you'll find is you'll hope to get some number but somewhere around point seven point eight, right?
But it will depend on how difficult your problem is to solve
So maybe on a really difficult problem an r-squared of 0.5 is actually pretty good, right?
So it's just going to depend on the situation. So we've got our linear regression trained up
We know that the correlation coefficient is 0.85. We know that the mean absolute error for example is 13 degrees
What we haven't done is visualize cyst sometimes a simplify to do
This is just to plot a scatter plot of what we wanted and what we actually got from our predictor
So I'm going to right click on linear regression. I'm going to say visualize
classify errors
it's going to be a scatter plot of the
Expected value and the prediction we actually got from our networks so you can see generally speaking. It's not too bad
Obviously the data set is quite bunched up in some of these areas which means that it's sometimes harder to predict
But we've got a general upward trend which is exactly what we wanted
You can see that the prediction around zero is not good at all
The x-axis in this instance is the actual critical temperature of that particular substance
The y-axis is what the linear regression actually predicted
You can see that the range here is from about zero to about
136 on our actual values and the predicted values are from about minus 30 which doesn't really make sense to 131, but they're pretty close
most of the ones that caused a problem with a very low values right because you've essentially got lots and lots of values that have
a very small critical temperature on this scale, but different attributes that's been hard to fit a line to
something more powerful for example a
Multi-layer perceptron, you know an artificial neural network might do a better job of those kind of instances
But you can see that there's a general
Upward slope in this particular scatter plot be larger X's represent a larger error so you can see this is of a line
We're actually trying to fit here down here with all these small X's and there's quite a few of them on there
So actually for a lot of these substances the prediction even by linear regression has been pretty good
regression algorithms
Let us predict real scalar data out of our input variables and then this can be really useful in a huge array of different
Situations where we want to predict some it doesn't fit neatly into a yes-or-no answer or an ABC category label
We've looked at linear regression and artificial neural networks, and obviously neural networks get pretty deep these days
But these are a great starting point, so
Thanks very much for watching. I hope you enjoyed this series on data analysis something a little bit different from computerphile
I wanted to thank my colleague. Dr. Mercedes Torres Torres for helping me design the content
Please let us know what you liked what you didn't like let us know in the comments what you'd like to see more of
And we'll see you back again next time
    You must  Log in  to get the function.
Tip: Click on the article or the word in the subtitle to get translation quickly!


Data Analysis 9: Data Regression - Computerphile

6 Folder Collection
林宜悉 published on March 28, 2020
More Recommended Videos
  1. 1. Search word

    Select word on the caption to look it up in the dictionary!

  2. 2. Repeat single sentence

    Repeat the same sentence to enhance listening ability

  3. 3. Shortcut


  4. 4. Close caption

    Close the English caption

  5. 5. Embed

    Embed the video to your blog

  6. 6. Unfold

    Hide right panel

  1. Listening Quiz

    Listening Quiz!

  1. Click to open your notebook

  1. UrbanDictionary 俚語字典整合查詢。一般字典查詢不到你滿意的解譯,不妨使用「俚語字典」,或許會讓你有滿意的答案喔