Subtitles section Play video
FRANCOIS CHOLLET: Hello, everyone.
I'm Francois.
And I work on the Keras team.
I'm going to be talking about TensorFlow Keras.
So this talk will mix information
about how to use the Keras API in TensorFlow
and how the Keras API is implemented under the hood.
So we'll cover another view of the Keras architecture.
We'll do a deep dive into the layer class and the model
class.
We'll have an overview of the functional API
and a number of features that are
specific to functional models.
We'll look at how training and inference work.
And finally, we'll look at custom losses and metrics.
So this is the overview of the Keras architecture and all
the different submodules and the different classes
you should know about.
The core of the Keras implementation is the engine
module, which contains the layer class--
the base layer class from which all layers inherit,
as well as the network class, which is--
it's basically kind of modeled its directed acyclic graph
layers--
as well as the model class which basically takes the network
class but adds training and evaluation and sitting on top
of it, and also the sequential class, which is, again,
another type of model which just wraps a list of layers.
Then we have the layers module, where all the action
instances-- usable instances of layers go.
We have losses and metrics with a base class for each,
and a number of concrete instances
that you can use in your models.
We have callbacks, optimizers, regularizers,
and constraints, which are much like the modules.
So in this presentation, we go mostly
over what's going on in the Engine module, and also losses
and metrics, not so much callbacks, optimizers,
regularizers, and constraints.
So in general, for any of these topics,
you could easily do a one-hour talk.
So I'm just going to focus on the most important information.
So let's start with the layer class.
So the layer is the core abstraction in the Keras API.
I think if you want to have a simple API,
then you should have one abstraction
that everything is centered on.
And in the case of Keras, it's a layer.
Everything in Keras pretty much is a layer or something
that interacts closely with layers,
like models and instances.
So a layer has a lot of responsibilities,
lots of built-in features.
At its core, a layer is a container for some computation.
So it's in charge of transforming a batch of inputs
into a batch of outputs.
Very importantly, this is batchwise computation,
meaning that you expect N samples as inputs,
and you're going to be returning N output samples.
And the computation should typically not
see any interaction between samples.
And so it's meant to work with eager execution, also
graph execution.
All the built-in layers in Keras support both.
But user-written layers could be only eager, potentially.
We support having layers that have two different modules--
AUDIENCE: So this would mean that using different layers
can support either in graph or on eager?
FRANCOIS CHOLLET: Yes.
AUDIENCE: Yeah, OK.
FRANCOIS CHOLLET: That's right.
And typically, most layers are going to be supporting both.
If you only support eager, it typically
means that you're doing things that
are impossible to express as graphs,
such as recursive layers, such as SEMs.
This is actually something that we'll
cover in this presentation.
So, yeah, so layers also support two modes-- so a training mode,
an inference mode--
to do different things.
And each mode, which is something
like dropout layer or the batch normalization layer.
There's a support for built-in masking, which
is about specifying certain features of timestamps
and inputs that you want to ignore.
This is very useful, in particular,
if you're doing sequence processing with sequences where
you have padded time steps or where
you have missing time steps.
A layer is also container for state, meaning variable.
So, in particular, a trainable state--
the trainable weights on the layer,
which is what parametrizes the computation of the layer
and that you update during back propagation;
and the nontrainable weights, which
could be anything else that is manually managed by the layer
implementer.
It's also potentially a container
that you can use to track losses and metrics that you define
on the fly during computation.
This is something we'll cover in detail.
Layers can also do a form of static type checking.
So they can check--
there is infrastructure that's built in
to check the assumptions that the layer is making
about its inputs that we can raise nice and helpful error
messages in case of user error.
We support state freezing for layers,
which is useful for things like fine-tuning,
and transfer learning, and GANs.
You have infrastructure for serializing and deserializing
layers and saving and loading a state.
We have an API that you can use to build directed
acyclic graphs of layers.
It's called a functional API.
We'll cover it in detail.
And in the near future, layers will also
have built-in support for mixed precision.
So layers do lots of things.
They don't do everything.
They have some assumptions.
They have some restrictions.
In particular, gradients are not something
that you specify on the layer.
You cannot specify custom a backwards pass on that layer,
but this is something we're actually considering adding,
potentially, something like a gradient method on the layer.
So it's not currently a feature.
They do not support most low-level considerations,
such as device placement, for instance.
They do not generally take into account distribution.
So they do not include distribution-specific logic.
At least, that should be true.
In practice, it's almost true.
So they're as distribution agnostic as possible.
And very importantly, they only support batchwise computation,
meaning that anything a layer does
should start with a tensor containing--
or a nested structure of tensors containing N samples
and should output also N samples.
That means, for instance, you're not
going to do non-batch computation, such as bucketing
samples of the same length.
When you're doing time-switch processing,
you're not going to process [INAUDIBLE] data
sets with layers.
You're not going to have layers that don't have an input
or don't have an output outside of a very specific case, which
is the input layer, which we will cover.
So this is the most basic layer.
You could possibly write it as a constructor in which you
create two-tier variables.
And you say these variables are trainable.
And you assign them as attributes on the layer.
And then it has a call method, which essentially contains
the batch of inputs to batchify this computation, in this case,
just w x plus b.
So what happens when you instantiate
this layer is that it's going to create these two variables,
set them as attributes.
And they are automatically tracked into this list,
trainable_weights.
And when you call the layer using __call operator,
it's just going to pass.
So it's going to defer to this call method.
So in practice, most layers you're going to write
are going to be a little bit more refined.
They're going to look like this.
So this is a lazy layer.
So in the constructor, you do not create weights.
And the reason you do not create weights
is because you want to be able to instantiate
your layer without knowing what the input shape is going to be.
Whereas in the previous case, here--