Placeholder Image

Subtitles section Play video

  • Hi everybody, and welcome to this episode of TensorFlow Meets.

  • I'm delighted to be chatting with Sergio Guadarrama.

  • You're from the TensorFlow Agents team, right?

  • Now, you did a talk at the TensorFlow Developer Summit

  • around TF Agents,

  • and could you tell us all about what TF Agents is and what it does?

  • Oh, definitely.

  • So, TF Agents is a reinforcement learning library for TensorFlow

  • that we have created inside Google

  • to solve many of the problems that we have.

  • We were struggling to get all these RL algorithms

  • that they're publishing every day,

  • to get it right, all the little details.

  • So we decided to build this library with a lot of tests

  • to make sure everybody can use it.

  • Okay, so anybody can download TensorFlow Agents,

  • play around with all that kind of stuff.

  • Now, you mentioned it's about reinforcement learning.

  • To most of us, we kind of know a little bit about reinforcement learning,

  • but could you tell us what it really is and what it's all about?

  • So, the main idea behind reinforcement learning

  • is like when you interact with its own environment,

  • or some game or some task,

  • you're going to play different actions and then you're going to get a reward

  • when you do the things correctly, and then you're going to get

  • a negative reward when you think the things incorrectly.

  • And from that you can learn.

  • Basically, on that reward, you can learn.

  • So, almost like the way a real person learns.

  • It's kind of like a person learns when you get rewarded,

  • and things like that.

  • It's actually inspired from that.

  • I see. Cool. So, now, one of the things

  • that you spoke about in your presentation--

  • and we have that on YouTube for people to watch--

  • but one of the things you spoke about that I thought was really cool,

  • and you showed a Breakout-style game

  • where there's the wall, then there's the bat,

  • then there's the bouncing ball,

  • how does that work from a reinforcement learning perspective?

  • So, in that case, what happens,

  • the agent will look at the environment,

  • in this case, the game, see where the bricks are,

  • where the ball is, and make a decision,

  • like where should I move the paddle-- to the left or to the right?

  • Make sure the ball doesn't fall.

  • And then by playing multiple times,

  • sometimes it will fall.

  • Eventually it will learn when you let it fall,

  • you don't win, you lose.

  • So you have to keep the ball moving up

  • and breaking all these bricks on top.

  • Right. Now, how does that work from a TF Agent's perspective.

  • Is the environment there, the game board or--

  • Yeah, it's already predefined, you can load it.

  • We have already a lot of environments defined for you,

  • so you can just load all the Atari games, OpenAI, Deep Mind Control,

  • many of those things are ready to go.

  • But you can also define your own environment.

  • When you have a specific task, we make it very easy to bring it in

  • and define your own task to solve.

  • Now, in something like the Breakout game, for example,

  • the reward is the score, right?

  • So as you knock off a brick, your score goes up,

  • so how does it see that, how is it that getting labeled?

  • Is it reading the raw pixels on the screen,

  • or is it-- how does that actually work?

  • So, in that case, it's actually given from the game.

  • In other cases, more complicated, like in a recommender system,

  • it would be based on the interaction with the user, for example.

  • I see. Okay, cool. Wow, interesting stuff.

  • Now, this is all open source that you said, right?

  • So it's on github.com/tensorflow/agents?

  • That's it, that's correct.

  • Now I noticed when I was poking around in there

  • that there's a bunch of Colab notebooks.

  • Do you have any that you'd recommend people to play with, any favorites?

  • I think the best one to start is the DQN Cartpole.

  • It's a full example-- you can go through all the steps,

  • and you can see the videos, you can play around with it,

  • you can modify it, see what happens,

  • and how it solves to keep like a small cartpole,

  • keeping the balance.

  • Interesting. How long does it take to train that?

  • That one takes a few minutes.

  • - Oh really, just a few minutes? - Yeah.

  • Wow, so reinforcement learning in a notebook with TF Agents,

  • and it takes a few minutes to help me predict a cartpole.

  • - Yeah. - Wow. Okay.

  • Any others that people should check out?

  • Yeah, the other thing is we are looking forward to the community

  • to contribute new environments, new tasks, or new algorithms

  • for people who have new ideas to contribute,

  • and we are looking forward to pull requests or GitHub issues.

  • Okay, nice.

  • Have you seen any scenarios that excite you?

  • Oh yeah, we applied it to some of the robotics tasks.

  • It was very interesting to see a robot actually learning how to grasp objects,

  • how to move around.

  • The first time you see it actually doing the task

  • that you try, is very rewarding.

  • Cool, I'll look out for that.

  • So if somebody wants to get started with this,

  • where should they go?

  • So, they can go to github.com/tensorflow/agents

  • and over there they can find all Colabs, examples, and everything,

  • and download the code from there.

  • Okay, so they can just kick the tires and have fun with it, right?

  • - Oh yeah, definitely. - Cool, I'd love to see

  • what kind of things people will produce.

  • Yeah, me too.

  • Already the agent plays Breakout better than I do. (laughs)

  • So, thanks so much, Sergio. This has been a lot of fun.

  • Thank you.

  • And thanks everybody for watching this episode of TensorFlow Meets.

  • If you have any questions for me or any questions for Sergio,

  • please leave them in the comments below,

  • and be sure to check out his talk at the TF Dev Summit

  • that you'll see also on this channel.

  • So thank you.

  • ♪ (music) ♪

Hi everybody, and welcome to this episode of TensorFlow Meets.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it