Placeholder Image

Subtitles section Play video

  • ALEX PASSOS: Hi, my name is Alex, and I'm here again,

  • this time to talk about the TensorFlow eager execution

  • runtime.

  • This is a very broad topic and there

  • are lots and lots of things we could cover,

  • so I'm going to lightly graze many, many different parts

  • of our code base.

  • I'll give you a lot of function names

  • and file names and things like that

  • that you can use to familiarize yourself with something,

  • but if you're in the room right now, by all means,

  • ask questions.

  • Like, this is-- there's some buffer time at the end

  • to account for variability here.

  • And I think the more we can maximize shared understanding

  • of this stuff, the better.

  • So the way I thought we could go about this

  • is to do a very, very deep dive on what

  • actually happens in TensorFlow, starting from TF2--

  • when you type a very simple line of code, in this case

  • a tf.nn.relu off some Python numbers.

  • And I think if you were to start doing this, probably

  • the first thing you'd do is graph the TensorFlow code

  • base to find where would we define ReLU.

  • And if you do that, you will find

  • that we have some definitions of ReLU in Keras,

  • but you won't find a single definition of ReLU itself

  • in core TensorFlow, and that might be

  • a little surprising at first.

  • It might put a damper on this whole,

  • let's find out what actually happens when we run ReLU

  • business, but the way ReLU comes from is because it's enough

  • that we've implemented in C++ and we didn't need to put

  • a complicated Python API around it.

  • We can just generate the Python code to call ReLU.

  • So the way it's actually defined,

  • it's defined using the same mechanism

  • we use to register all the ops in TensorFlow.

  • So for every core operation in TensorFlow

  • that's visible to the runtime, we

  • have a registration that looks like this--

  • REGISTER_OP.

  • It takes a name and you say how many inputs, how many outputs,

  • what attributes it has.

  • If you want to know more about attributes-- what things

  • are allowed in there-- they are how

  • we can make our ops polymorphic and have the same operation

  • have different types of outputs, so different numbers of outputs

  • and things like that.

  • There is a lot of documentation about this in TensorFlow.org,

  • if you search for how to define a new op.

  • And another interesting thing in there

  • is that we also register a shape_inference function.

  • ReLU, thankfully, is one of the simplest

  • ops we have-- it just has one input, one output,

  • they have the same shape.

  • So we can use a pre-built shape_inference function

  • that just says the shape does not change.

  • Other ops will have vastly more complicated

  • shape_inference functions.

  • And the nice thing is that we can

  • run these functions offline for our building graphs

  • without actually having the values of any tensors

  • and still be able to prove things

  • about the shapes of intermediate tenses and outputs

  • of your computation.

  • This is maybe the best tool we have for catching bugs now.

  • So if you want to look at the shape_inference code,

  • that's where you'd hook into.

  • Now that we have that registration,

  • we run some complicated code that generates Python code

  • to actually call ReLU.

  • And if you look in bazel-genfiles, files,

  • you will find a file named gen_nn_ops.py and this file has

  • the actual def ReLU that we call.

  • And as you can see, it's not pretty

  • and there's a lot of stuff going in there.

  • The first line deals are dispatching

  • so that we can define ReLU not just for normal tensors,

  • but also optionally for sparse tensors

  • and ragged tensors and other composite types.

  • The second line has tf_export and what this does

  • is define the TensorFlow public API.

  • Every symbol that you get when you are using TensorFlow by tf

  • dot something is defined somewhere

  • from a tf_export decorator like this one.

  • There will be a future video on how exactly this works

  • and why we do things this way instead

  • of relying on Python's normal, you know,

  • name spacing mechanism.

  • But you can probably guess that it's because TensorFlow

  • is very complicated.

  • But essentially, you'll see this,

  • and this generated code for ReLU has a bunch of cases in it.

  • That are roughly four.

  • You have an eager fast path, an eager slow path,

  • you have a graph mode path, and kind of a side hook

  • for the symbolic execution.

  • But here, let's focus on the eager paths.

  • In the first one, the first thing

  • that we're actually doing here is

  • we're checking to see if we're in eager mode or not.

  • And to do that, we look at this context thing.

  • This context thing is part of the core of the TensorFlow v 2

  • runtime.

  • It's the moral equivalent to the session,

  • but it's longer lived than the session

  • and represents more things.

  • So what is it?

  • From Python, the context is this class

  • that's defined in a file called [? context.ui. ?]

  • And it's a collection of a lot of things

  • that your Python program needs to be aware to connect

  • to the TensorFlow runtime.

  • It stores things like, am I in eager mode or in graph mode?

  • Or if someone used a with tf.device decorator, what

  • device am I supposed to be executing code in?

  • And it stores things like, what's

  • your name scope and many other things.

  • And all of this information that the context

  • stores, in general--

  • the things that can change during the execution

  • of a program--

  • they're all stored in ThreadLocal stacks.

  • Usually stacks, because we have these nested things

  • like with tf.device, with tf.device, with tf.device,

  • so you'd like to be able to pop the stack

  • to go back to where you were.

  • And ThreadLocal because it's very important to us

  • that a TensorFlow runtime itself be thread agnostic,

  • so that if you write two threads and one is doing

  • a reinforcement learning learner and the other's doing

  • an agent that's talking to some game, when the agent wants

  • to use its GPU, it shouldn't necessarily make the learner

  • use the U, and vice versa.

  • Providing some kind of resolution between the threads

  • is what we felt like was the right way,

  • so that at least each single thread

  • can feel like it's its own single-threaded Python program.

  • We use this a lot in distribution strategies,

  • like MirroredStrategy uses a lot of threads under the hood,

  • so it's really important that things are thread-safe.

  • In the Python context, essentially it mostly is

  • a wrapper around the C++ context.

  • It's available for the TensorFlow C API.

  • And this is the core thing.

  • It has a bunch more methods than just these--

  • like, you can do a lot more things

  • than just listing devices.

  • One thing I'd like to call out is that right now,

  • there are some things that are done in Python, like storing,

  • whether in eager mode and graph mode,

  • and some things that are done in C++.

  • And it's not the set of things that are done Python and set

  • of things that are done in C++ are likely to change.

  • I think as TensorFlow evolves, more and more things should

  • migrate from the Python context to the C++ context,

  • which will make things more language agnostic,

  • but also faster.

  • And you know, if everything in the context was in C++,

  • then all the generated Python code could have just been C++

  • code and it would have to--

  • we'd be able to get out of the overhead of executing Python

  • much sooner and remove performance problems

  • in our APIs.

  • So once you know you're in eager mode,

  • we try to do this fast path execution.

  • In this, the fast path is some complicated C code

  • that mostly does the same things that the fallback case is

  • trying to do.

  • So I don't think it's necessarily worth reading that.

  • I would rather look at the simpler

  • code and the fallback path.