Placeholder Image

Subtitles section Play video

  • FRANCOIS CHOLLET: Hello, everyone.

  • I'm Francois.

  • And I work on the Keras team.

  • I'm going to be talking about TensorFlow Keras.

  • So this talk will mix information

  • about how to use the Keras API in TensorFlow

  • and how the Keras API is implemented under the hood.

  • So we'll cover another view of the Keras architecture.

  • We'll do a deep dive into the layer class and the model

  • class.

  • We'll have an overview of the functional API

  • and a number of features that are

  • specific to functional models.

  • We'll look at how training and inference work.

  • And finally, we'll look at custom losses and metrics.

  • So this is the overview of the Keras architecture and all

  • the different submodules and the different classes

  • you should know about.

  • The core of the Keras implementation is the engine

  • module, which contains the layer class--

  • the base layer class from which all layers inherit,

  • as well as the network class, which is--

  • it's basically kind of modeled its directed acyclic graph

  • layers--

  • as well as the model class which basically takes the network

  • class but adds training and evaluation and sitting on top

  • of it, and also the sequential class, which is, again,

  • another type of model which just wraps a list of layers.

  • Then we have the layers module, where all the action

  • instances-- usable instances of layers go.

  • We have losses and metrics with a base class for each,

  • and a number of concrete instances

  • that you can use in your models.

  • We have callbacks, optimizers, regularizers,

  • and constraints, which are much like the modules.

  • So in this presentation, we go mostly

  • over what's going on in the Engine module, and also losses

  • and metrics, not so much callbacks, optimizers,

  • regularizers, and constraints.

  • So in general, for any of these topics,

  • you could easily do a one-hour talk.

  • So I'm just going to focus on the most important information.

  • So let's start with the layer class.

  • So the layer is the core abstraction in the Keras API.

  • I think if you want to have a simple API,

  • then you should have one abstraction

  • that everything is centered on.

  • And in the case of Keras, it's a layer.

  • Everything in Keras pretty much is a layer or something

  • that interacts closely with layers,

  • like models and instances.

  • So a layer has a lot of responsibilities,

  • lots of built-in features.

  • At its core, a layer is a container for some computation.

  • So it's in charge of transforming a batch of inputs

  • into a batch of outputs.

  • Very importantly, this is batchwise computation,

  • meaning that you expect N samples as inputs,

  • and you're going to be returning N output samples.

  • And the computation should typically not

  • see any interaction between samples.

  • And so it's meant to work with eager execution, also

  • graph execution.

  • All the built-in layers in Keras support both.

  • But user-written layers could be only eager, potentially.

  • We support having layers that have two different modules--

  • AUDIENCE: So this would mean that using different layers

  • can support either in graph or on eager?

  • FRANCOIS CHOLLET: Yes.

  • AUDIENCE: Yeah, OK.

  • FRANCOIS CHOLLET: That's right.

  • And typically, most layers are going to be supporting both.

  • If you only support eager, it typically

  • means that you're doing things that

  • are impossible to express as graphs,

  • such as recursive layers, such as SEMs.

  • This is actually something that we'll

  • cover in this presentation.

  • So, yeah, so layers also support two modes-- so a training mode,

  • an inference mode--

  • to do different things.

  • And each mode, which is something

  • like dropout layer or the batch normalization layer.

  • There's a support for built-in masking, which

  • is about specifying certain features of timestamps

  • and inputs that you want to ignore.

  • This is very useful, in particular,

  • if you're doing sequence processing with sequences where

  • you have padded time steps or where

  • you have missing time steps.

  • A layer is also container for state, meaning variable.

  • So, in particular, a trainable state--

  • the trainable weights on the layer,

  • which is what parametrizes the computation of the layer

  • and that you update during back propagation;

  • and the nontrainable weights, which

  • could be anything else that is manually managed by the layer

  • implementer.

  • It's also potentially a container

  • that you can use to track losses and metrics that you define

  • on the fly during computation.

  • This is something we'll cover in detail.

  • Layers can also do a form of static type checking.

  • So they can check--

  • there is infrastructure that's built in

  • to check the assumptions that the layer is making

  • about its inputs that we can raise nice and helpful error

  • messages in case of user error.

  • We support state freezing for layers,

  • which is useful for things like fine-tuning,

  • and transfer learning, and GANs.

  • You have infrastructure for serializing and deserializing

  • layers and saving and loading a state.

  • We have an API that you can use to build directed

  • acyclic graphs of layers.

  • It's called a functional API.

  • We'll cover it in detail.

  • And in the near future, layers will also

  • have built-in support for mixed precision.

  • So layers do lots of things.

  • They don't do everything.

  • They have some assumptions.

  • They have some restrictions.

  • In particular, gradients are not something

  • that you specify on the layer.

  • You cannot specify custom a backwards pass on that layer,

  • but this is something we're actually considering adding,

  • potentially, something like a gradient method on the layer.

  • So it's not currently a feature.

  • They do not support most low-level considerations,

  • such as device placement, for instance.

  • They do not generally take into account distribution.

  • So they do not include distribution-specific logic.

  • At least, that should be true.

  • In practice, it's almost true.

  • So they're as distribution agnostic as possible.

  • And very importantly, they only support batchwise computation,

  • meaning that anything a layer does

  • should start with a tensor containing--

  • or a nested structure of tensors containing N samples

  • and should output also N samples.

  • That means, for instance, you're not

  • going to do non-batch computation, such as bucketing

  • samples of the same length.

  • When you're doing time-switch processing,

  • you're not going to process [INAUDIBLE] data

  • sets with layers.

  • You're not going to have layers that don't have an input

  • or don't have an output outside of a very specific case, which

  • is the input layer, which we will cover.

  • So this is the most basic layer.

  • You could possibly write it as a constructor in which you

  • create two-tier variables.

  • And you say these variables are trainable.

  • And you assign them as attributes on the layer.

  • And then it has a call method, which essentially contains

  • the batch of inputs to batchify this computation, in this case,

  • just w x plus b.

  • So what happens when you instantiate

  • this layer is that it's going to create these two variables,

  • set them as attributes.

  • And they are automatically tracked into this list,

  • trainable_weights.

  • And when you call the layer using __call operator,

  • it's just going to pass.

  • So it's going to defer to this call method.

  • So in practice, most layers you're going to write

  • are going to be a little bit more refined.

  • They're going to look like this.

  • So this is a lazy layer.

  • So in the constructor, you do not create weights.

  • And the reason you do not create weights

  • is because you want to be able to instantiate

  • your layer without knowing what the input shape is going to be.

  • Whereas in the previous case, here--