Placeholder Image

Subtitles section Play video

  • ANNOUNCER: The following program is brought to you by Caltech.

  • YASER ABU-MOSTAFA: Welcome back.

  • Last time, we introduced the learning problem.

  • And if you have an application in your domain that you wonder if machine

  • learning is the right technique for it, we found that there are three

  • criteria that you should check.

  • You should ask yourself: is there a pattern to begin

  • with that we can learn?

  • And we realize that this condition can be intuitively met in many applications,

  • even if we don't know mathematically what the pattern is.

  • The example we gave was the credit card approval.

  • There is clearly a pattern-- if someone has a particular salary, has

  • been in a residence for so long, has that much debt, and so on, that this

  • is somewhat correlated to their credit behavior.

  • And therefore, we know that the pattern exists in spite of the fact

  • that we don't know exactly what the pattern is.

  • The second item is that we cannot pin down the pattern mathematically, like

  • the example I just gave.

  • And this is why we resort to machine learning.

  • The third one is that we have data that represents that pattern.

  • In the case of the credit application, for example, there are historical

  • records of previous customers, and we have the data they wrote in their

  • application when they applied, and we have some years' worth of record of

  • their credit behavior.

  • So we have data that are going to enable us to correlate what they wrote in the

  • application to their eventual credit behavior, and that is what we are

  • going to learn from.

  • Now, if you look at the three criteria, basically there are two that

  • you can do without, and one that is absolutely essential.

  • What do I mean?

  • Let's say that you don't have a pattern.

  • Well, if you don't have a pattern, then you can try learning.

  • And the only problem is that you will fail.

  • That doesn't sound very encouraging.

  • But the idea here is that, when we develop the theory of learning, we will

  • realize that you can apply the technique regardless of whether there

  • is a pattern or not.

  • And you are going to determine whether there's a pattern or not.

  • So you are not going to be fooled and think, I learned, and then give the

  • system to your customer, and the customer will be disappointed.

  • There is something you can actually measure that will tell you whether you

  • learned or not.

  • So if there's no pattern, there is no harm done in trying machine learning.

  • The other one, also, you can do without.

  • Let's say that we can pin the thing down mathematically.

  • Well, in that case, machine learning is not the recommended technique.

  • It will still work.

  • It may not be the optimal technique.

  • If you can outright program it, and find the result perfectly, then why

  • bother generate examples, and try to learn, and go through all of that?

  • But machine learning is not going to refuse.

  • It is going to learn, and it is going to give you a system.

  • It may not be the best system in this case, but it's a system nonetheless.

  • The third one, I'm afraid you cannot do without.

  • You have to have data.

  • Machine learning is about learning from data.

  • And if you don't have data, there is absolutely nothing you can do.

  • So this is basically the picture about the context of machine learning.

  • Now, we went on to focus on one type, which is supervised learning.

  • And in the case of surprised learning, we have a target function.

  • The target function we are going to call f.

  • That is our standard notation.

  • And this corresponds, for example, to the credit application.

  • x is your application, and f of x is whether you are a good credit risk or

  • not, for the bank.

  • So if you look at the target function, the main criterion about the target

  • function is that it's unknown.

  • This is a property that we are going to insist on.

  • And obviously, unknown is a very generous assumption, which means that

  • you don't have to worry about what pattern you are trying to learn.

  • It could be anything, and you will learn it-- if we manage to do that.

  • There's still a question mark about that.

  • But it's a good assumption to have, or lack of assumption, if you will,

  • because then you know that you don't worry about the environment that

  • generated the examples.

  • You only worry about the system that you use to implement machine learning.

  • Now, you are going to be given data.

  • And the reason it's called supervised learning is that you are not only

  • given the input x's, as you can see here.

  • You're also given the output--

  • the target outputs.

  • So in spite of the fact that the target function is generally unknown,

  • it is known on the data that I give you.

  • This is the data that you are going to use as training examples, and that you

  • are going to use to figure out what the target function is.

  • So in the case of supervising learning, you have the targets

  • explicitly.

  • In the other cases, you have less information than the target, and we

  • talked about it-- like unsupervised learning, where you don't have

  • anything, and reinforcement learning, where you have partial information,

  • which is just a reward or punishment for a choice of a value of y that

  • may or may not be the target.

  • Finally, you have the solution tools.

  • These are the things that you're going to choose in order to solve the

  • problem, and they are called the learning model, as we discussed.

  • They are the learning algorithm and the hypothesis set.

  • And the learning algorithm will produce a hypothesis--

  • the final hypothesis, the one that you are going to give your customer, and

  • we give the symbol g for that.

  • And hopefully g approximates f, the actual target function,

  • which remains unknown.

  • And g is picked from a hypothesis set, and the general the symbol for

  • a member of the hypothesis set is h.

  • So h is a generic hypothesis.

  • The one you happen to pick, you are going to call g.

  • Now, we looked at an example of a learning algorithm.

  • First, the learning model-- the perceptron itself, which is a linear

  • function, thresholded.

  • That happens to be the hypothesis set.

  • And then, there is an algorithm that goes with it that chooses which

  • hypothesis to report based on the data.

  • And the hypothesis in this case is represented by the purple line.

  • Different hypotheses in the set H will result

  • in different lines.

  • Some of them are good and some of them are bad, in terms of separating

  • correctly the examples which are the pluses and minuses.

  • And we found that there's a very simple rule to adjust the current

  • hypothesis, while the algorithm is still running, in order to get a better

  • hypothesis.

  • And once you have all the points classified correctly, which is

  • guaranteed in the case of the perceptron learning algorithm if the

  • data was linearly separable in the first place,

  • then you will get there, and that will be the g that you are going to report.

  • Now, we ended the lecture on sort of a sad note, because after all of this

  • encouragement about learning, we asked ourselves: well,

  • can we actually learn?

  • So we said it's an unknown function.

  • Unknown function is an attractive assumption, as I said.

  • But can we learn an unknown function, really?

  • And then we realized that if you look at it, it's really impossible.

  • Why is it impossible?

  • Because I'm going to give you a finite data set, and I'm going to give you

  • the value of the function on this set.

  • Good.

  • Now, I'm going to ask you what is the function outside that set?

  • How in the world are you going to tell what the function is outside, if the

  • function is genuinely unknown?

  • Couldn't it assume any value it wants?

  • Yes, it can.

  • I can give you 1000 points, a million points, and on the million-and-first point,

  • still the function can behave any way it wants.

  • So it doesn't look like the statement we made is feasible in terms of

  • learning, and therefore we have to do something about it.

  • And what we are going to do about it is the subject of this lecture.

  • Now, the lecture is called Is Learning Feasible?

  • And I am going to address this question in extreme detail from

  • beginning to end.

  • This is the only topic for this lecture.

  • Now, if you want an outline--

  • it's really a logical flow.

  • But if you want to cluster it into points--

  • we are going to start with a probabilistic situation, that is a very

  • simple probabilistic situation.

  • It doesn't seem to relate to learning.

  • But it will capture the idea--

  • can we say something outside the sample data that we have?

  • So we're going to answer it in a way that is concrete, and where the

  • mathematics is very friendly.

  • And then after that, I'm going to be able to relate that probabilistic

  • situation to learning as we stated.

  • It will take two stages.

  • First, I will just translate the expressions into something that

  • relates to learning, and then we will move forward and make it correspond to

  • real learning.

  • That's the last one.

  • And then after we do that, and we think we are done, we find that there is

  • a serious dilemma that we have.

  • And we will find a solution to that dilemma, and then declare victory-- that

  • indeed, learning is feasible in a very particular sense.