Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • SERGIO GUADARRAMA: Today, we are going

  • to talk about reinforcement learning, how you can apply

  • to many different problems.

  • So hopefully, by the end of the talk,

  • you will know how to use reinforcement learning

  • for your problem, for your applications, what

  • other things we are doing at Google with all

  • these new technology.

  • So let me go a little bit--

  • do you remember when you try to do something difficult

  • that was hard that you need to try a lot?

  • For example, when you learned how to walk, do you remember?

  • I don't remember.

  • But it's pretty hard because nobody tells you

  • exactly how to do it.

  • You just keep trying.

  • And eventually, you're able to stand up, keep the balance,

  • wobble around, and start walking.

  • So what if we want to teach this cute little robot how to walk?

  • Imagine-- how you will do that?

  • How would you tell this robot how to walk?

  • So what we are going to do today is

  • learn how we can do that with machine learning.

  • And the reason for that is because if we

  • want to do this by calling a set of rules,

  • it will be really hard.

  • What kind of rules would we put in code that can actually

  • make this robot the walk?

  • We have to do coordination, balance.

  • It's really difficult. And then they probably

  • would just fall over.

  • And we don't know what to change in the code.

  • Instead of that, we're going to use machine

  • learning to learn from it.

  • So the agenda for today is going to be this.

  • We are going to cover very quickly what

  • is supervised learning, reinforcement learning, what

  • is TF-Agents, these things we just talk about it.

  • And we will go through multiple examples.

  • So you can see we can build up different pieces to actually

  • go and solve this problem, teach this robot how to walk.

  • And finally, we will have some take home methods that you

  • can take with you all today.

  • So how many of you know what is supervised learning?

  • OK.

  • That's pretty good.

  • For those of you who don't know, let's go

  • to a very simple example.

  • So we're going to have some inputs, in this case,

  • like an image.

  • And we're going to pass through our model,

  • and we're going to put in some outputs.

  • In this case, there's going to be a cat or the dog.

  • And then, we're going to tell you what is the right answer.

  • So that's the key aspect.

  • In supervising learning, we tell you the label.

  • What is the right answer?

  • So you can modify your model and learn from these mistakes.

  • In this case, you might use a neural net.

  • We have a lot of ways that you can learn.

  • And you can modify those connections

  • to basically learn over time what is the right answer.

  • The thing that supervised learning need

  • is a lot of labels.

  • So many of you probably heard about IMAGENET.

  • It's a data set collected by Stanford.

  • It took like over two years and $1 million

  • to gather all this data.

  • And they could annotate millions of images with labels.

  • Say, in this image, there's a container received.

  • There's a motor scooter.

  • There's a leopard.

  • And then, you label all these images

  • so your model can learn from it.

  • And that worked really well where

  • you can have all these labels, and then you

  • can train your model from it.

  • The question is like, how will you provide the labels

  • for this robot?

  • What is the right actions?

  • I don't know.

  • It's not that clear.

  • What will be the right answer for this case?

  • So we are going to take a different approach, what

  • is like reinforcement learning.

  • Instead of trying to provide the right answer--

  • like in a classical setting, you will go to class,

  • and they tell you what is the right answers.

  • You know, you study, this is the answer for this problem.

  • We already know what is the right answer.

  • In reinforcement learning, we assume

  • we don't know what is the right answer.

  • We need to figure it out ourselves.

  • It's more like a kid.

  • It's playing around, putting these labels together.

  • And eventually, they're able to stack it up together, and stand

  • up.

  • And that gives you like some reward.

  • It's like, oh, you feel proud of it, and then you keep doing it.

  • Which are the actions you took?

  • Not so relevant.

  • So let's formalize a little more what reinforcement learning is

  • and how you can actually make these

  • into more concrete examples.

  • Let's take a simpler example, like this little game

  • that you're trying to play.

  • You want to bounce the ball around, move the pile

  • at the bottom left or right, and then you

  • want to hit all these bricks, and play this game,

  • clear up, and win the game.

  • So we're going to have this notion of an agent

  • or program that's going to get some reservation.

  • In this case, a friend is going to look at the game.

  • What is the ball, where are the brakes, what is the puzzle,

  • and take an action.

  • I'm going to move to the left or I'm going to move to the right.

  • And depending where you move, the ball will drop,

  • or you actually start keeping the ball bouncing back.

  • And we're going to have this notion of reward,

  • what is like when you do well, we

  • want you to get positive reward, so you reinforce that behavior.

  • And when you do poorly, you will get negative reward.

  • So we can define simple rules and simple things

  • to basically call this behavior as a reward function.

  • Every time you hit a brick, you get 10 points.

  • Which actions do you need to do to hit the brick?

  • I don't tell you.

  • That's what you need to learn.

  • But if you do it, I'm going to give the 10 points.

  • And if you clear all the bricks, I'm

  • going to give you actually a hundred

  • points to encourage you to actually play

  • this game very well.

  • And every time the ball drops, you

  • lose 50 points, which means, probably not

  • a good idea to do that.

  • And if you let the ball drop three times, game is over,

  • you need to stop the game.

  • So the good thing is about the reinforcement learning,

  • you can apply to many different problems.

  • And here are some examples that over the last year people

  • have been applying reinforcement learning.

  • And it goes from recommender instance in YouTube, data

  • set to cooling, real robots.

  • You can apply to math, chemistry,

  • or a cute little robot in the middle, and things

  • as complex as they go.

  • Like DeepMind applied to AlphaGo and beat

  • the best player in the world by using reinforcement learning.

  • Now, let me switch a little bit to TF-Agents and what it is.

  • So main idea of TF-Agents like doing reinforcement learning

  • is not very easy.

  • It requires a lot of tools and a lot of things

  • that you need to build on your own.

  • So we built this library that we use at Google,

  • and we open source so everybody can

  • use it to make reinforcement learning a lot easier to use.

  • So we make it very robust.

  • It's scalable, and it's good for beginners.

  • If you are new to RL, we have a lot

  • of notebooks, example documentation

  • that you can start working on.

  • And also, for complex problems, you

  • can apply to real complex problems

  • and use it for realistic cases.

  • For people who want to create their own algorithm,

  • we also make it easy to add new algorithms.

  • It's well tested and easy to configure.

  • And furthermore, we build it on top of TensorFlow 2.0

  • that you probably heard over at Google I/O before.

  • And we make it in such a way so it's developing and debugging

  • is a lot easier.