Name: How Does Deep Learning Work? | Two Minute Papers
Uploaded: 2017-04-15T04:08:08.000Z
Duration: 5 min 28 s
Description: Thousands of YouTube videos with English-Chinese subtitles! Now you can learn to understand native speakers, expand your vocabulary, and improve your pronunciation...

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér.

Adverbs of degree

A neural network is a very loose model of the human brain that we can program in a computer,

Basic passive voice

or it's perhaps more appropriate to say that it is inspired by our knowledge of the

Now, let's note that artificial neural networks have been studied for decades by experts,

and the goal here is not to show all aspects, but one intuitive, graphical aspect that is

Take a look at these curves on a plane. These curves are a collection of points, and these

points you can imagine as images, sounds, or any kind of input data that we try to learn.

The red and the blue curves represent two different classes - the red can mean images

of trains, and the blue, for instance, images of bunnies.

Now, after we have trained the network from this limited data, which is basically a bunch

of images of of trains and bunnies, we will get new points on this plane, new images,

and we would like to know whether this new image looks like a train or a bunny. This

And this we call a classification problem, to which a simple and bad solution would be

simply cutting the plane in half with a line. Images belonging to the red regions will be

classified as the red class, and the blue regions as the blue class. Now, as you see,

the red region cuts into the blue curve, which means that some trains would be misclassified

It seems that if we look at the problem from this angle, we cannot separate the two classes

However, if we use a simple neural network, it will give us this result. Hey! But that's

cheating, we were talking about straight lines. This is anything but a straight line.

A key concept of neural networks is that they create an inner representation of the data

model and try to solve the problem in that space. What this intuitively means, is that

the algorithm will start transforming and warping these curves, where their shapes start

changing, and it finds, that if we do well with this warping step, we can actually draw

a line to separate these two classes. After we undo this warping and transform the line

here back to the original problem, it will look like a curve. Really cool, isn't it?

So these are lines, only in a different representation of the problem. Who said that the original

representation is the best way to solve a problem?

Take a look at this example with these entangled spirals. Can we separate these with a line?

Not a chance. But the answer is - not a chance with this representation. But if one starts

warping them correctly, there will be states where they can easily be separated.

However, there are rules in this game - for instance, one cannot just rip out one of the

spirals here and put it somewhere else. These transformations have to be homeomorphisms,

which is a term that mathematicians like to use - it intuitively means that that the warpings

are not too crazy - meaning that we don't tear apart important structures, and as they

remain intact, the warped solution is still meaningful with respect to the original problem.

Now comes the deep learning part. Deep learning means that the neural network has multiple

of these hidden layers and can therefore create much more effective inner representations

of the data. From an earlier episode, we've seen in an image recognition task that as

we go further and further into the layers, first we'll see an edge detector, and as a

combination of edges, object parts emerge, and in the later layers, a combination of

Let's take a look at this example. We have a bullseye here if you will, and you can see

that the network is trying to warp this to separate it with a line, but in vain.

However, if we have a deep neural network, we have more degrees of freedom, more directions

and possibilities to warp this data. And if you think intuitively, if this were a piece

of paper, you could put your finger behind the red zone and push it in, making it possible

to separate the two regions with a line. Let's take a look at a 1 dimensional example to

see better what's going on. This line is the 1D equivalent of the original problem, and

you see that the problem becomes quite trivial if we have the freedom to do this transformation.

We can easily encounter cases where the data is very severely tangled and we don't now

how good our best solution can be. There is a very heavily academic subfield of mathematics,

called knot theory, which is the study of tangling and untangling objects. It is subject

to a lot of snarky comments for not being well, too exciting or useful. What is really

mind blowing is that knot theory can actually help us study these kinds of problems and

it may ultimately end up being useful for recognizing traffic signs and designing self-driving

Now, it's time to get our hands dirty! Let's run a neural network on this dataset.

If we use a low number of neurons and one layer, you can see that it is trying ferociously,

but we know that it is going to be a fruitless endeavor. Upon increasing the number of neurons,

magic happens. And we now know exactly why! Yeah!

Thanks so much for watching and for your generous support. I feel really privileged to have

supporters like you Fellow Scholars. Thank you, and I'll see you next time!

Subtitles ListPlay Video

How Does Deep Learning Work? | Two Minute Papers

ultimately

equivalent

aspect

recognize