Placeholder Image

Subtitles section Play video

  • YOSHUA BENGIO: [INAUDIBLE].

  • Thank you [INAUDIBLE].

  • So I'll talk about [INAUDIBLE].

  • I'll talk about representations and learning

  • representations.

  • And the word deep here, I'll explain what it means.

  • So my goal is to contribute to building intelligent machines,

  • also known as AI.

  • And how do we get a machine to be smart--

  • to take good decisions?

  • Well, it needs knowledge.

  • [INAUDIBLE]

  • [? researchers ?]

  • from the early days--

  • '50s, '60s, '70s--

  • tried to give the knowledge to the machine--

  • the knowledge we have exclusively.

  • And it didn't work quite as well as was hoped.

  • One reason is that a lot of our knowledge is not something

  • we can communicate verbally and we can

  • write that in a program.

  • So that knowledge has to be taken somewhere else.

  • And basically what we have found is you can get that

  • knowledge through observing the world around us.

  • That means learning.

  • OK-- so we need learning for AI.

  • What is learning?

  • What is machine learning?

  • It's not about learning things by heart.

  • That's just a fact.

  • What it is about is generalizing from the examples

  • you've seen to new examples.

  • And what I like to tell my students is it's taking

  • probability mass-- that is, on the training examples and

  • somehow guessing where it should go-- which new

  • configurations of the things we see make

  • sense or are plausible.

  • This is what learning is about.

  • It's guesswork.

  • At first we can measure [INAUDIBLE] we can guess.

  • And I'll mention something about dimensionality and

  • geometry that comes up when we think about this [INAUDIBLE].

  • And one of the messages will be that we can maybe fight

  • this [? dimensionality ?]

  • problem by allowing the machine to discover underlying

  • causes-- the underlying factors that explain the data.

  • And this is a little bit like [INAUDIBLE] is about.

  • So let's start from learning, an easy [INAUDIBLE]

  • of learning.

  • Let's say we observe x,y pairs where x is a number--

  • y is a number.

  • And the stars here represent the examples we've seen of x,y

  • configurations.

  • So we want to [? generalize ?] for new configurations.

  • In other words, for example, in this problem, typically we

  • want to predict a y given a new x.

  • And there's an underlying relationship between y and x,

  • meaning the expected value of the y given x, which is given

  • with this purple curve.

  • But we don't know it.

  • That's the problem with machine learning.

  • We're trying to discover something

  • we don't know already.

  • And we can guess some function.

  • This is the predicted or learned function.

  • So how could we go about this?

  • One of the most basic principles by which machine

  • learning algorithms are able to do this is assume something

  • very simple about the world around us-- about the data

  • we're getting or the function we're trying to discover.

  • It's just assuming that the function we're trying to

  • discover is smooth, meaning if I know the value of the

  • function that's come from the x, and I want to know the

  • value at some nearby point x prime, then it's reasonable to

  • assume that the value x prime of the function we want to

  • learn is close to the value of x.

  • That's it.

  • I mean, you can formalize that and [INAUDIBLE] in many

  • different ways and exploit it in many ways.

  • And what it means here is if I ask you why

  • should we at this point--

  • what I'm going to do is look up the value of y that I

  • observed at nearby points.

  • And combining these--

  • make a reasonable guess like this one.

  • And if I do that on problems like this, it's actually going

  • to work quite well.

  • And a large fraction of the applications that we're

  • sharing use this principle.

  • And [INAUDIBLE]

  • enough of just this principle.

  • But if we only rely on this principle virtualization,

  • we're going to be in trouble.

  • That's one of the messages I want to explain here.

  • So why are we going to be in trouble?

  • Well, basically we're doing some kind of interpolation.

  • So if I see enough examples--

  • the green stars here-- to cover the ups and down of the

  • function I'm trying to learn, then I'm going to be fine.

  • But what if the function I want to learn has many more

  • ups and downs than I can possibly observe through data?

  • Because even Google has a finite number of examples.

  • Even if you have millions or billions of examples, the

  • functions we want to learn for AI are not like this one.

  • They have--

  • the number of configurations of articles of interest-- that

  • may be exponentially large.

  • So something maybe bigger than the number of

  • items in the universe.

  • So there's no way we're going to have enough examples to

  • cover all the configurations.

  • For example, think of the number of different English

  • sentences, which is something that Google is interested in.

  • And this problem is illustrated by the so-called

  • curse of dimensionality where you consider what happens when

  • you have not just one variable but many variables and all of

  • their configurations.

  • How many configurations of [? N ?] variables do you have?

  • Well, you have an exponential number of configurations.

  • So if I wanted to learn about a single

  • variable, I can just divide--

  • [? it ?]

  • [? takes ?] a real variable.

  • And I divide its value into intervals.

  • And I count how many of those bins I've seen in my data.

  • I can estimate probability of different intervals coming up.

  • So that's easy Because i only want to know about a small

  • number of different configurations.

  • But if I'm looking at two variables, then the number of

  • configurations may be [INAUDIBLE]

  • [? square ?]

  • [? bigger, ?] and [? it'd ?] have [? 390-- ?] even more.

  • But typically, I'm going to have hundreds-- if you're

  • thinking about images, it's thousands-- tens of

  • thousands-- hundreds of thousands.

  • So it's crazy how many configurations there are.

  • So how do we possibly generalize to new

  • configurations?

  • We cannot just break up this space into small cells and

  • count how many things happen in each cell because the new

  • examples that we want to [? carry-- ?] new

  • configurations that [INAUDIBLE] asked about might

  • be in some region where we hadn't [INAUDIBLE].

  • So that's the problem of generalizing [INAUDIBLE].

  • So there's one thing that can help us, but it's not going to

  • be sufficient.

  • It's something that happens with the iPhones.

  • It's very often [INAUDIBLE] vision, [INAUDIBLE]

  • processing and understanding and many other problems where

  • the set of configurations of variables that are plausible--

  • that can happen in the real world--

  • occupy a very small volume of all this set of possible

  • configurations.

  • So let me give an example.

  • In images, if I choose the pixels in an image randomly--

  • in other words, if I sample an image from completely uniform

  • distribution, I'm going to get things like this.

  • Just [INAUDIBLE].

  • And I can repeat this for eons and eons.

  • And I'm never going to assemble something that looks

  • like a face.

  • So what it means is that faces--

  • images of faces--

  • are very rare in the space of images.

  • They occupy a very small volume, much less than what

  • this picture would suggest.

  • And so this is a very important hint.

  • It means that actually the task is to find out where this

  • distribution concentrates.

  • I have another example here.

  • If you take the image of a four like this one and you do

  • some geometry transformations to it like rotating it,

  • scaling it, you get slightly different images.

  • And if at each point, you allow yourself to make any of

  • these transformations, you can create a so-called manifold--