Subtitles section Play video

Keywords

Exam Level

All

TOEIC
Level

All

A1
A2
B1
B2
C1
C2
Saved

Searched

JIRI SIMSA: Hi, everyone.
My name is Jiri.
I'm a software engineer on the TensorFlow team.
And today, I'm going to be talking to you about tf.data
and tf.distribute, which are TensorFlow's APIs for input
pipeline and distribution strategy, respectively.
To set the stage for what I'm going to be talking about,
let's think about what are the basic building
blocks for a machine learning workflow?
Machine learning operates over data.
It runs some computation.
And it uses some sort of hardware to do this task.
This hardware can either be a single CPU on your laptop.
Or possibly it can be on your workstation that
has either one or multiple accelerators,
either GPUs or TPUs, attached to it.
But you can also run the computation
across a large number of machines
that each have one or multiple accelerators attached to it.
Now, let's talk about the how the machine learning building
blocks are being served or reflected in the APIs
that TensorFlow provides.
So for the data handling part of the machine learning task,
TensorFlow provides a tf.data API.
It's the input pipeline API for TensorFlow.
For the computation itself, such as supervised learning,
TensorFlow offers a number of different both high level
and low level APIs.
You might be familiar with Keras or Estimators--
they've been mentioned in earlier talks today--
as well as lower level APIs for building custom training loops.
And finally, to hide the hardware details
of your computation, TensorFlow provides a tf.distribute API,
which allows you to create your input pipeline
and model in a way that's agnostic to the environment
in which it's going to execute.
So kind of thinking that your program is
going to run, perhaps, on a single device,
and with minimal changes being able to deploy it
on a large set of different devices, a possibility
of different machine learning architectures.
In this talk, I'm going to talk about the tf.data input
pipeline API.
And then, in the second part, I'm
also going to talk about the tf.distribute, the distribution
strategy API.
I'm not going to talk about Keras, and Estimator,
and other APIs for the modeling itself,
as that has been covered in previous talks.
So without further ado, let's get
started with tf.data, which is TensorFlow input pipeline API.
So let's ask ourselves a question.
Why do we need an input pipeline API in the first place?
Why don't we just load the data in memory,
maybe in our Python program as a non py array,
and pass it into a Keras model?
Well, there is actually a number of good reasons
why we need an API or why using one will benefit us.
First of all, the data might not fit into memory.
For example, the ImageNet data set
is 140 gigabytes of data, which do not necessarily
fit into memory on every laptop or workstation.
The data itself might also require randomized
preprocessing, which means that we cannot preprocess everything
ahead of time offline and then have the data to be ready
for training.
We actually need to have an input pipeline that
performs the preprocessing, such as, in the case of ImageNet,
perhaps image cropping or randomized image distortions
or transformations on the fly as we're
running dimensional learning computation.
Having an input pipeline API as an abstraction might also
allow us to, in the runtime of this API,
implement things in a way that allows the computation
to efficiently utilize the underlying hardware.
And I'm actually going to spend a fair amount of the first part
of my talk talking about how to efficiently utilize
the hardware through the tf.data input pipeline abstraction.
Last, but not least, which is something that ties the tf.data
API to the tf.distribution API, using an input pipeline API
abstraction allows us to decouple
the task of loading and preprocessing of the data
from the task of distributing the computation.
We are using the abstraction, which
allows you to create your input pipeline assuming
it's going to run on one place.
And then the distribution strategy
will somehow distribute the data without you
having to worry about the fact that the input pipeline might
actually be evaluated in multiple places in parallel.
So for those reasons, we created tf.data, TensorFlow's input
pipeline API.
And the way I like to think about tf.data is an input
pipeline API's created through tf.data--
it's an ETL process.
What I mean by that is the E, T, and L
stand for different parts of the input pipeline stages.
E stands for Extract.
This is the stage in which we read the data,
either from a memory or local or remote storage.
And we possibly parse the file format
that the data is stored in.
Perhaps it's compressed.
Then the T, the Transform stage, in this stage,
we perform either domain specific or domain
agnostic transformations.
So the domain specific transformations
are specific to the type of data we're dealing with.
So, for instance, text vectorization,
image transformation, or temporal video sampling
are examples of domain specific transformations.
While domain agnostic transformations
include things like shuffling of your data
during training or batching.
That is combining multiple elements
into a single higher dimensional element.
And, finally, the last stage of the input pipeline, Loading,
pertains to efficiently transferring
the data onto the accelerator, which is either a GPU or TPU.
What I should point out here is that, traditionally, the input
pipeline portion of your machine learning computation
happens on a CPU.
Because some of the operations are naturally
only possible on the CPU, which leaves the GPU and TPU
resources available for your machine
learning specific computations, such as your map models.
This makes-- this puts an extra pressure
on the efficiency with which the input pipeline performs.
And the reason for that is--
which is what I'm trying to illustrate here
with the graph--
is that over time the rate at which CPU performs
has plateaued, while the computational power of GPUs
and TPUs, thanks to recent hardware advances,
continues to accelerate at an exponential rate, which
opens up this performance gap between a raw CPU
and GPU/TPU processing power available in a single machine.
And that can-- the consequence of this
could be that the CPU part of your machine learning
computation, namely the input pipeline,
can be a bottleneck of your computation.
So it's really important that the CPU input pipeline performs
as efficiently as it can.
So let's take a look at an example of what a tf.data-based
input pipeline actually looks like.
Here, I'm using an example for a common image,
or how a common image processing input pipeline would look like.
We're first creating a data set using the TFRecordDataset
operation.
It's a data set constructor that takes a set of file names
or a set of file patterns and produces
elements that are stored in those files
in a sequence-like manner.
And once you create a data set, you
can chain transformations onto the data set,
thus creating new types of data sets.
A very common one and very powerful
one is the map transformation, which
allows you to apply an arbitrary processing on the elements
of the data set.
And this preprocessing can be expressed
as a function that ends up being traced
using the mechanisms available in TensorFlow,
meaning this function that is being used to transform
elements of the data set is executed as a data flow
graph, which has important implications
for the performance and how the runtime can actually
execute this function.
And the last thing that I illustrate here
is the batch transformation, which
combines multiple elements of the input data set
and produces a single element as an output that
has a higher dimension, which is a common practice for training
efficiency.
Now one thing that's not illustrated here,
but it actually does happen under the hoods inside
of tf.data runtime is that for certain combinations
of transformations, a. tf.data provides more efficient
fused implementations.
For instance, if a map transformation is followed
by a batch transformation, we actually have a highly
efficient C++ based implementation
for the combination of the two that can give you up to 2x
speed up in the performance of your input pipeline.
And that happens kind of magically behind the scenes.
And the important bit that I want to highlight here
is that the user doesn't need to worry about it.
The user here doesn't really need
to do anything with respect to optimizing the performance.
They focus on creating an input pipeline with the functional
preprocessing in mind.
And once you create the data set that you would like,
you can pass it into TensorFlow high level API such as Keras
or Estimator, which all support data set abstraction
as an input for the data.
So let's talk a bit more about the input pipeline performance.
If you were to implement the input pipeline
in naive fashion using CPU for the input pipeline processing
or data preparation and the GPU and TPU for the training
computation, you might end up in a situation
like is illustrated on the slide where
at any given point in time you're
only utilizing one of two resources available to you.
And you could probably tell that this seems rather inefficient.
Well, a common technique that can
be used to make this style of computation more efficient
is called software pipeline.
And the idea is that while you're
working on the current element for training step
on a GPU and a TPU, you're already
started preprocessing data for the next training
step on a CPU.
And thus, you overlap the computation
that happens on the two devices or two
resources available to you.
To achieve that, the effect of software pipelining in tf.data
is pretty straight forward.
All you do is you chain a .prefetch transformation
to a particular point in your input pipeline.
And the effect of doing that will
be that the producer of the data up to that point
will be decoupled from the consumer of the data,
in this case, the Keras model.
And the two will be operating independently,
coordinating through an internal buffer.
And this will have the desired effect of software
pipelining that I illustrated in the previous slide.
Another opportunity for improving
the performance of your input pipeline
is to parallelize the transformation.
So the top part of this diagram illustrates
that we're using sequential processing for applying the map
transformation of the individual elements of the batch
that we are then going to create.
But there is no reason that you need
to do that unless there would, in effect, be some sort of data
or control dependency.
But commonly, there is not.
An in that case, you can parallelize
and overlap the preprocessing of all the individual elements
for which we're going to create the batch out of.
So let's take a look at how we would
do that using the tf.data API.
And similar to the software pipelining idea,
this is pretty straightforward.
You simply add a single argument,
num_parallel_calls, to the map transformation,
which indicates to the tf.data runtime
that it should, in fact, preprocess
elements of the input data set in parallel.
Important bit here is that the user doesn't really
need to worry about the threading
or multiprocessing and use complicated Python APIs
or be aware of things like the global interpreter log.
It just happens inside of the tf.data runtime,
which is implemented in C++.
And thus, it sidesteps the complexities
that the user would need to go through.
And a last best practice for optimizing
the performance of an input pipeline
is that of parallel extraction.
So similar to the parallel transformation,
where in that case the sequential mapping of the data
might have been the bottleneck, another potential source
of a bottleneck of your input pipeline
is the sequential nature with which date is being read.
If you're just reading elements from a file one file at a time,
the I/O could actually be a bottleneck
of your input pipeline.
And the answer to that, well, you
don't have to do that sequentially.
You can do it in parallel.
And to do that using the tf.data API, well,
this time it's not a one line change.
It's a two line change.
And so what changes is that we're
going to replace the TFRecordDataset
source with two lines.
The first line uses a list_files transformation,
which creates a data set that is going to contain all the file
names to which the particular pattern that we specify
evaluates to.
And then we're going to apply the interleave transformation
to this data set, which takes a user defined function,
which is a data set factoring operating on the inputs--
in this case, file names--
and producing data sets-- in this case,
TF record data sets for that particular file name.
And specifying the num_parallel_calls protocols
argument will determine how many files should we
be reading in parallel at any given point in time?
Now, I kind of cheated up to this point in my presentation.
Because I said, well, the user doesn't really
have to worry about performance and the aspects
of their environment.
And it turns out that in order to choose
optimal values for these num_parallel_calls arguments
or the buffer size for prefetch, you actually
have to understand your environment.
At least that's how it used to be, historically, when
this API was first introduced.
And over the past year or so, we actually
worked on lifting this restriction
and making the performance of tf.data great out of the box.
And the way this is achieved by is
instead of specifying manually what the right buffer
size or the right number of parallel calls
should be for these different transformations,
you can actually specify this special constant called
tf.data.experimental.AUTOTUNE.
And if you do that, this will indicate to the tf.data runtime
that you want to delegate the task of choosing
the optimal level of parallelism or buffer size
to the tf.data runtime.
And it will do that on your behalf.
I should mention that auto tuning, at this point,
is enabled by default. But you still
have to specify the constant if you actually
want to indicate which of these knobs should be autotuned.
You can also disable autotuning if you would like
to try to do this manually.
And the mechanism for disabling autotuning is tf.data.Options.
The tf.data.Options is an object that
specifies global options that should be used for your input
pipeline.
And besides controlling autotuning,
it can also be used to control things
like static optimizations that are not
enabled by default, because they are not always
a win, such as map vectorization or map parallelization,
or, for instance, specifying whether your input pipeline is
allowed to produce elements out of order, which, by default,
your input pipeline will be deterministic.
The options object also allows you to, for example,
collect statistics about data in your input pipeline.
And for the performance experts in the audience,
it also allows you to fine tune threading parameters of tf.data
internals.
And the way you would use tf.data.Options
is that once you create your data set,
you also create an instance of the options object
and set whatever options that you're interested in.
In this example, I'm setting the a map_parallelization
optimization on.
And then, importantly, you associate
the options object with the data set
using the with.options transformation, which,
similar to all the other transformations that I talked
about up to this point, returns back a new data set that
now has the options applied.
Last thing pertaining to tf.data that I would like to talk about
is the TensorFlow data sets project.
So up to this point, I've been talking about just core tf.data
API, which can be used by our users to create input pipeline
using--
starting from raw data.
However, for a lot of common existing data sets,
this is a repetitive task.
And especially machine learning learners or novice users
do not necessarily want to do that to get
started with machine learning.
And to address or make it easier to onboard new users,
as well as make it easier to use existing data sets,
the TensorFlow data set projects provides canned data
sets that are ready to be used with the rest of TensorFlow.
The way you could use TensorFlow data sets
project is once you import it as a module,
you can, for example, list, through the call list-builders,
the set of available data sets.
And I think, at this point, there is something like 60
plus different data sets spanning text, image, audio,
and video, that are supported through the TensorFlow data
sets project.
Then, through the load command, using the name
as the identifier of the data set
you would like to load and optionally
the split argument, which you can use to identify whether you
want the training or the test portion of the data
set, you get back an instance of a tf.data data set
that can be immediately used with your model.
Or you could, because it's a tf.data data set instance,
you can optionally apply some custom transformations to it,
such as, in this case, shuffling and batching.
Or, if you would like to just inspect
what's inside of the data set, you
could do so using a simple Python-like iteration where
you can print the elements of the data set.
So this concludes the first part of my talk
in which I talked about tf.data.
And in the second part of my talk,
we're going to talk about the distribution strategy API.
So similar to the first part our talk, where we asked ourselves,
why do we need an input pipeline API?
Let's start by asking ourselves, why
do we need to do distributed training?
Why do we need distribution strategy API?
Well, it turns out that if we do training in one machine,
on one device, it can take a pretty long time.
This graph illustrates that by showing the accuracy achieved
by the ResNet model over time on the ImageNet
data set using a single GPU.
And you can see that it takes close to 90 hours
to get to accuracy around 75%, while the most performant
implementations of the same model, or deployments
of the same model actually take less than 10 minutes using
an amazing amount of resources parallelizing this computation.
What going down from 87 hours to 10 minutes enables
is that you can actually experiment
with ideas very quickly as opposed
to starting an experiment and waiting for one or two
days before you can do the next iteration.
And I think this is game changing.
So I hope I convince you that distributing your computation,
if it takes a very long time, is a very good idea.
So let's talk about how you do that with TensorFlow's
distribution strategy API.
There is three main goals that the distribution strategy
API has.
First of all, it should be easy to use.
What this means is that it should be possible for you
to create your input pipeline and your model assuming
that it's going to run on one device and then,
with minimal code changes, be able to deploy
to different architectures, either multiple GPUs
on your workstation or possibly even a cluster of workstations
that either have GPUs or TPUs attached to it.
It should also provide great out of box performance.
This means that the performance that you get out
of using distribution strategy should
be close to the performance you would get if you were manually
targeting a specific architecture
with your implementation.
And finally, it should be versatile.
So it should support different types of architectures,
different types of hardware, and different types of APIs
for your input pipeline or model.
The use cases for the distribution strategy API
can be roughly categorized as follows,
ranging from the simplest to perhaps the most advanced.
So the simplest one is you have a model that
uses either the Keras or Estimator API.
And you would like to distribute it.
And this is what we are going to cover in this talk.
The second one is you have a model that you
used lower level TensorFlow APIs to create a custom training
loop.
And you would like to distribute it.
And we're also are going to cover that in this talk.
Now, the last two, the more advanced ones,
namely making a layer, library, or infrastructure
distribution-aware-- so, for example, how
would you make something like Keras distribution aware?
[INAUDIBLE] how would you make a new strategy,
where strategy is something that is an abstraction that
hides or decouples the model and input
pipeline from the particular architecture?
Those two use cases we will not cover in this talk.
But they're covered by guides and tutorials
on the TensorFlow web site.
So in case that you would like to learn more,
I direct your attention to the TensorFlow website.
So let's start by talking about the use case
where you have a model that's created
either and Keras and Estimator.
And you would like to distribute it using the distribution
strategy.
And in this section, I'm also going
to introduce the distribution strategies that are actually
available in TensorFlow.
So the first the strategy that's available that's called
mirrored strategy is one that allows you to distribute
your program across multiple GPUs attached
to a single worker.
And the particular implementation
of this strategy using something called
all-reduce synchronous training, where the synchronous part
means that all of the devices will be performing steps
in a lock-step, so in a coordinated fashion.
While the all-reduce portion pertains
to how the different devices exchange information
about the local updates that they collect in each step.
To shed a little more light onto how the all-reduce algorithm
works, on this slide, I illustrate
what happens in the all-reduce algorithm
when you have three GPUs that each perform a single step that
updates a mirrored version of three variables.
So each of the boxes, the blue, the green, and the pink,
corresponds to a variable that, in a single step,
receive different updates on different devices.
And once the step is performed on all the devices,
we can propagate the updates in a circular fashion
between the different devices.
And at that point, all of the devices
will have all of the updates from all the devices for all
of the variables, requiring N minus one transfers for N
devices.
And then, once all the updates have been collected,
a reduce function can be used to combine
the updates to a single global value
where the common reduced functions are either
a sum or an average of those updates.
And with that knowledge in mind, a single step
of a synchronous training can be illustrated
on this example, where let's assume we
have a model with two layers.
And each layer has two variables.
And the variables are mirrored on each device.
We have two devices.
Now, in the forward pass, data is
propagated through the layers.
And then in a backward pass, the gradients for the variables
are computed.
And at that point, the updates to the two variables
on the different devices might be different.
Because we actually use two different pieces
of data on each device.
And at that point, it's where we use the all-reduce algorithm
to share the updates on each device with each other,
and thus achieving a global state across the two devices.
And this is what synchronous training refers to.
Now, to-- let's take a look at how you would actually
go about using a mirrored strategy with the Keras
and the Estimator APIs.
So to create an instance of a mirrored strategy, you can--
there's a couple of different factories, a default one or one
where you can explicitly name the devices
that you would like to create the mirrored strategy for.
I believe that the default is if you don't specify it,
it's going to be all the GPUs attached to your worker.
And you can also optionally specify arguments
for the all-reduce algorithm through the cross device
of argument of MirroredStrategy constructor.
Now, how would we use mirrored strategy or any other strategy,
for that matter, with Keras API?
Well, here's a common or a simple example
of a Keras model for ResNet 50 with a stochastic gradient
descent optimizer.
We create the model.
We specify the optimizer.
And then we use the compile and fit APIs to perform training
over our training data set, which is an instance of tf.data
data set.
Now, this runs on a single machine,
possibly using a local GPU.
In case we have multiple GPUs, we
can simply define an instance of MirroredStrategy
and then make sure that all of the model creation
is wrapped inside of a strategy.scope.
And with these two lines, your program
will now be able to run on all the GPUs
available on the worker.
And the key here is that the strategy.scope
will take care of variable creation inside of your model,
making sure that all the variables are mirrored
on the different GPU devices.
And the body of the strategy.scope
will be distribution aware.
So recall that one of the goals for the distribution strategy
API was that it provides great out of box performance.
So on this slide, I would like to convince you
that it does, at least for mirrored strategy
on the ResNet 50.
So what we're looking at here is the performance
of a ResNet 50 based training using Keras,
running TensorFlow 2.0 on Google Cloud.
The vertical axis of the graph plots images per second.
And the horizontal axis ranges the number of GPUs from one
to two to eight.
And we can see that using mirrored strategy
achieves close to linear scaling,
starting with a single GPU achieving roughly 1,250
images per second to eight GPUs achieving close to 10,000
images per second.
Now, we've covered the Estimator-- sorry,
the Keras API usage with distribution strategy.
Let's also cover the Estimator API usage.
So this is a common example of how you would use the Estimator
API for your training.
Namely, you define a classifier using the Estimator constructor
that you provision with a model function.
And then, through the train call,
you specify an input function, which can return, for instance,
a tf.data data set.
And it performs the training.
In order to parameterize the Estimator API with a strategy,
all you need to do is, again, to create
an instance of the strategy, in this case, MirroredStrategy,
and pass it in through the RunConfig option
into the Estimator API.
And once that happens, the RunConfig
will actually-- with a strategy, will make sure
that the model function is created once per replica.
And replica, in this case, refers to the GPU.
So you're going to have copies of the model on each GPU
as well as of all the variables inside of the model.
And you will perform the all-reduce synchronous training
across the multiple GPUs.
So distributing your computation across multiple GPUs
on a single machine can get you up to N,
where N is the number of accelerators attached
to your machine, speed up.
But there is a physical limit to how many
accelerators you can have.
And to go beyond that limit, the natural next thing
is to actually use multiple machines with each one
[? are ?] multiple accelerators.
And that's what the multi-worker mirrored strategy
is intended to help you with.
And it's very similar to the mirrored strategy.
The only difference is that instead of distributing
your computation over GPUs on a single machine,
it distributes the computation over GPUs on many machines.
And it performs the all-reduce computation
not just across GPUs on a single workstation,
but across the GPUs on all the different workstations.
And it does so through TensorFlow collective ops,
which allows you to actually send
data in a broadcast fashion between the different
TensorFlow workers.
The way you would use this API is
similar to the mirrored strategy API.
So you can create a default instance
of a MultiWorkerMirroredStrategy.
Or you can specify a specific CollectiveCommunication
algorithm to be used.
Unlike the mirror strategy, you also
need to specify information about the different workers
that are participating inside of your computation.
And this is done so through a JSON encoded string that
identifies the host and ports of your different workers
as well as task types.
The third strategy that I'm going to talk about,
and that's available in the TensorFlow distribution
strategy API is the TPU strategy.
And this one is very similar to the mirrored strategy.
The main difference is that it allows
you to perform the all-reduce synchronous training on TPUs,
which are the hardware accelerators made by Google
specifically for TensorFlow.
But at this point, there are also
other frameworks that are capable of leveraging them.
And you can do so through the Google Cloud platform.
And unlike the mirrored strategy,
it uses the cross_replica_sum to perform the all-reduce
on TPUs, which is something that's
a difference between GPUs and TPUs.
And you can use this strategy for training
on a single TPU or an entire pod,
which that's a term that refers to a set of TPU cores
in a topology.
To use a TPU strategy is a little more complicated.
And it's also somewhat of an area of active development,
which the experimental portions of the API refer to.
But the high level idea is that you create a TPU cluster
resolver, which allows you to gather information
about your TPU hardware.
And then you create the TPU strategy
with this cluster resolver argument,
which then allows the TPU strategy to be
aware of the TPU hardware location and specifics.
And so up to this point, I've been
talking about synchronous training, where
all the devices in your training loop
are performing or operating in a lock-step, one step at a time.
An alternative to synchronous training,
which might be suitable for certain types of machine
learning tasks, is so-called asynchronous training,
where the different devices or different workers
in your set of workers performing your computation
are actually running at different rates.
And one of the architectures that
enables asynchronous training is a so-called parameter server
and worker architecture where your machines
have one of two roles, parameter server tasks or worker tasks.
The parameter server tasks is where global variable state
is stored and either updated or fetched
from by the individual workers.
While the workers perform a dimension learning
computation one step at a time, but not necessarily
at the same rate.
And this architecture can be targeted
for your machine learning program using the parameter
server strategy.
You create it using this factoring.
And similar to the multi-worker strategy,
you need to specify information about the workers and the types
of tasks that the worker machines should play,
namely the worker task or the parameter server task.
And again, this is done so through the TF_CONFIG
environment variable.
And a last strategy that I want to talk about
and that's available through the distribution strategy API
is the central storage strategy.
This is a special case of the parameter server strategy
where there is a single parameter server.
And its role is being fulfilled by a CPU
of the machine on which the other devices reside.
And the benefit of this strategy is
that any single GPU might not be able to fit
all the embeddings, all the variable states inside of them.
But the CPU might.
And in cases where this is a good fit,
the central storage strategy is available.
And this is how you would create one.
And that concludes the part talking about the Keras
and Estimator API support, as well as
the enumeration of the different types of strategies
that are available in the tf.distribution strategy API.
And in the last part of my talk, I'm
going to talk about how would you
go about distributing a model that you created using a custom
training loop, which is effectively a model
created out of lower level TensorFlow APIs?
The prerequisite for your custom training loop
to be distributable using the distribution strategy API
is that it has to adhere to the following programming model.
In particular, as far as data sources are concerned,
your variables may be read from any replica.
But the input data that's used for training
will be sharded, meaning divided into disjoint sets that
will be accessed exclusively by one replica.
Each replica performs computation on its sources.
And then the computation is combined using a reduction.
So, in essence, this programming model
is that of all-reduce synchronous training.
But it can be implemented using lower level TensorFlow APIs.
So let's take a look at how an example of a custom training
loop distributed through a distribution strategy
would look like.
So we create an instance of a distribution strategy.
And then we create a data set using your own create data set
method that takes a batch size.
The important bit here is that the batch size
should be the global batch size, that is a batch size that you
choose independently of the number of replicas or devices
on which you are going to run your computation on.
And it's going to be the responsibility
of the distribution strategy API to actually divide
this global batch size into per replica batch sizes.
And this is done through the experimental_distribute_dataset
invocation, which wraps the, quote unquote, sequential, data
set in what's called a distributed data set.
But as far as the custom training loop is concerned,
there is no difference between the two.
And your model, similar to the Keras API usage,
should be created under the strategy.scope, which
means that all the variables must be created
under this scope so that they're properly mirrored
across the different replicas.
As an alternative to delegating the distribution
of your data set to TF distribution strategy,
you can also use an alternative API, distribute dataset
from function, which gives the user the control
to decide what portions of the data set
should be distributed on which replica
and how by, instead of providing a data set,
you provide a data set factoring, which can input
the distribution strategy context,
which has information such as the particular replica
index or the total number of replicas.
And then in the rest of my presentation,
we're going to take a look at how you would actually
build a custom training loop in a kind of bottom up fashion.
So the first thing, the lowest building block
is the logic that performs a single training
step on a replica.
And here's an example of how you would do that.
So you would use a GradientTape, perform some computation,
and then with the [INAUDIBLE] of with the tape,
compute gradients, apply them to model variables,
and return the loss.
And this is something that happens on a single replica.
Now, to tie this computation across--
that happens on different replicas
together in a single training epoch,
you can enumerate the individual elements of the data set
using Python iteration.
And then use the run API of the distribution strategy
with the replica step function and the input
to collect the loss for that particular replica.
And combine the individual losses
using the reduce call with a particular reduce operation.
And at that point, you could do any per-step processing
inside of this for loop.
For an example, you could print the loss.
But you could do other types of computations here as well.
Now, one thing you might notice is
that this train_epoch function has a tf.function decorator.
The effect of this decorator is that TensorFlow will interpret
this Python function as a graph computation,
optionally using autograph to convert Python idioms,
such as the Python iteration of our data set,
into equivalent graph building methods.
And the reason we recommend using tf.function decorator
for your train_epoch here is that it will generally result
in much better performance.
Because the entire training epoch
will be executing as a data flow graph as
opposed to a Python function.
Now, the last step of your custom training loop
is the iteration over multiple epochs, which
is pretty straightforward.
And this just illustrates how you
do that, and optionally inserting per epoch processing
inside of the outer loop, such as checkpointing your model
or running an eval of the model.
So before I end, I want to give you
an overview of what's supported in TF 2.0 beta
as far as distribution strategy is concerned.
And this is a screenshot from the TensorFlow website.
So you can either take a picture now,
or you can also go to the website.
In the first column, we see the three types
of model building APIs, namely Keras, Estimator,
and custom training loop.
And then on the top row, we have the different types
of strategies.
And, as you can see, the Estimator API
is well supported across different types of strategies
while the other combinations are supported or on the way.
Most of them are targeting the RC release candidate of 2.0
for availability.
And that brings me to the end of my talk.
Thank you very much for your attention.
In case-- so throughout the talk,
I've been sharing links to different tutorials.
All the tutorials can be found on the TensorFlow web site
under the resources link shown here.
And in case you have any questions
or you would like to request a feature or report issues,
our GitHub repository is the correct forum for that.
So thank you very much for your attention.
[APPLAUSE]
[MUSIC PLAYING]

JIRI SIMSA: Hi, everyone.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it

B1 data strategy api pipeline tf data tf

Inside TensorFlow: tf.data + tf.distribute

818 83

林宜悉 posted on 2020/04/04

Video vocabulary

Keywords

Exam Level

All

TOEIC

Level

All

Saved

Searched

Keywords

No related vocabulary under this filter

specific

Share vocabulary

US /spɪˈsɪfɪk/

・

UK /spəˈsɪfɪk/

adjective
Relating to a particular species, structure, etc.
Precise; particular; just about that thing
Relating to a particular thing.
Clearly defined or identified.
Stated clearly and in detail, leaving no room for confusion or doubt.
Concerning one particular thing or kind of thing

individual

Share vocabulary

US /ˌɪndəˈvɪdʒuəl/

・

UK /ˌɪndɪˈvɪdʒuəl/

adjective
Made for use by one single person
Relating to, or characteristic of, a single person or thing.
Single; separate.
Having a striking or unusual character; original.
Made for or relating to a single person or thing.
Having a distinct manner different from others
Relating to, or characteristic of, a single person or thing.
Single; separate.
Having a striking personal quality or style.

noun
Single person, looked at separately from others
A particular person or thing distinguished from others of the same kind.
A person, especially one of specified character.
A person, especially one of a specified kind.
A single thing or item, especially when part of a set or group.
A single human being as distinct from a group.
A competition for single people.

process

Share vocabulary

US /ˈprɑsˌɛs, ˈproˌsɛs/

・

UK /prə'ses/

verb
To organize and use data in a computer
To deal with official forms in the way required
To prepare by treating something in a certain way
To adopt a set of actions that produce a result
To convert by putting something through a machine

noun
A series of actions or steps taken in order to achieve a particular end.
A summons or writ to appear in court or before a judicial officer.
A systematic series of actions directed to some end
Dealing with official forms in the way required
Set of changes that occur slowly and naturally
A series of actions or steps taken in order to achieve a particular end.

other
To perform a series of operations on (data) by a computer.
To deal with (something) according to a particular procedure.
Deal with (something) according to a set procedure.
To perform a series of mechanical or chemical operations on (something) in order to change or preserve it.
To perform a series of mechanical or chemical operations on (something) in order to change or preserve it.
Take (something) into the mind and understand it fully.

other
Deal with (something, especially unpleasant or difficult) psychologically in order to come to terms with it.

A2 TOEIC

strategy

Share vocabulary

US /ˈstrætədʒi/

・

UK /'strætədʒɪ/

noun
Careful plan or method for achieving a goal
A plan of action designed to achieve a long-term or overall aim.

other
Branch of military dealing with command

A2 TOEIC

multiple

Share vocabulary

US /ˈmʌltəpəl/

・

UK /ˈmʌltɪpl/

adjective
Having or involving more than one of something
Capable of handling more than one task or user at a time.
Consisting of or involving more than one.
Affecting many parts of the body.
More than one; many.
Having or involving several parts, elements, or members.

noun
Number produced by multiplying a smaller number
A ratio used to estimate the total value of a company.
A number of identical circuit elements connected in parallel or series.
A number that can be divided by another number without a remainder.

pronoun
More than one; several.

instance

Share vocabulary

US /ˈɪnstəns/

・

UK /'ɪnstəns/

other
At the request of.

noun
A single occurrence of a program or object in a computer system.
An example of something; case
An occurrence of something.

verb
To give as an example of something else

other
To cite as an example.
To cite as an example; to mention as an instance.

A2 TOEIC

aware

Share vocabulary

US /əˈwɛr/

・

UK /əˈwɛə/

adjective
Knowing or feeling that something exists

A2 TOEIC

common

Share vocabulary

US /ˈkɑmən/

・

UK /'kɒmən/

noun
Area in a city or town that is open to everyone
A piece of open land for public use.
A piece of open land for public use.
Field near a village owned by the local community

adjective
Lacking refinement; vulgar.
Occurring, found, or done often; prevalent.
(of a noun) denoting a class of objects or a concept as opposed to a particular individual.
Without special rank or position; ordinary.
Shared; Belonging to or used by everyone
Typical, normal; not unusual
Lacking refinement; vulgar.
Found all over the place.

achieve

Share vocabulary

US /əˈtʃiv/

・

UK /ə'tʃi:v/

verb
To succeed in doing good, usually by working hard
To successfully bring about or accomplish a desired result or aim.

other
To succeed in reaching a particular goal, status, or standard, often after effort or perseverance.

other
To successfully bring about or accomplish a desired result or aim.

A2 TOEIC

default

Share vocabulary

US /dɪˈfɔlt/

・

UK /dɪ'fɔ:lt/

noun
Victory or success due to an opponent's failure to compete or appear.
Automatic setting when no indicated preference
Failure to fulfill an obligation, especially to repay a loan or appear in a legal action.
Failure to fulfill an obligation, especially a financial one.
Victory or success due to an opponent's failure to compete or appear.
A failure to appear in court.
Failure to meet an agreement or make a payment
A pre-selected option or setting in a computer program or other system.
A preselected option or setting in a computer program or other system.

other
To fail to appear or compete, resulting in a loss or forfeit.
To fail to fulfill an obligation, especially to repay a loan or appear in a legal action.
To fail to fulfill an obligation, especially a financial one.
To lose or win by default.
To revert to a pre-selected option or setting.
To revert to a preselected option or setting.

verb
To fail to meet as agreed; failure to pay
To return to a previously determined state

B2 TOEIC

portion

Share vocabulary

US /ˈpɔrʃən, ˈpor-/

・

UK /'pɔ:ʃn/

noun
A part or share of something.
A person's destiny or lot in life.
An amount of food served to one person.
Serving of food that is intended for one person
A person's share of an inheritance.
A dowry.
Part of something shared that belongs to a whole

other
To divide into portions; distribute.

verb
To separate something to divide among people
To divide into portions; distribute.

algorithm

Share vocabulary

US /ˈælɡəˌrɪðəm/

・

UK /'ælɡərɪðəm/

noun
A step-by-step problem-solving feature

function

Share vocabulary

US /ˈfʌŋkʃən/

・

UK /'fʌŋkʃn/

noun
A social event or ceremony.
A routine that performs a specific task.
Social event, or party such as a wedding
Mathematical operation used in calculations
A relationship or expression involving one or more variables.
The way in which something works or operates.
What something is intended to be used for; purpose

other
To operate or perform in a specified way.
To work or operate in a proper or particular way.

verb
To serve a certain purpose or role
To be operating, working or achieving its purpose

A2 TOEIC

task

Share vocabulary

US /tæsk/

・

UK /tɑ:sk/

noun
Big or small piece of work someone has to do

verb
To be given something to do, e.g. wash dishes

A2 TOEIC

similar

Share vocabulary

US /ˈsɪməlɚ/

・

UK /ˈsɪmələ(r)/

adjective
Nearly the same; alike

A1 TOEIC

perform

Share vocabulary

US /pɚˈfɔrm/

・

UK /pə'fɔ:m/

verb
To carry out an action well or successfully
To entertain an audience by dancing, singing etc.

other
To present a form of entertainment to an audience.
To function or operate in a certain way.

other
To carry out, accomplish, or fulfill an action, task, or function.

A1 TOEIC

refer

Share vocabulary

US /rɪˈfɚ/

・

UK /rɪ'fɜ:(r)/

verb
To talk about or write about something
To direct someone to a source for information.
To give information about something
To send someone to a specialist for further treatment or advice.
To mention or speak about someone or something.
To talk about a person without giving many details
To send a patient, client to a specialist

other
To direct someone to a source for information or help.
To mention or allude to something or someone.
To send someone to another person or place for assistance or treatment.

A2 TOEIC

parallel

Share vocabulary

US /ˈpærəˌlɛl/

・

UK /'pærəlel/

adjective
Happening at the same time or in a similar way.
(of a computer process) performed simultaneously using multiple processors.
Arranged side by side; relating to a parallel circuit.
Being in direct correspondence; analogous
Extending in the same direction, equidistant at all points, and never converging or diverging.
(Of two lines) at equal distance from each other
(Of computer operation) happening at the same time
Very similar and often occurring at the same time

verb
To be equal to, or like, something else
To compare

other
Be similar or analogous to.
To be similar or analogous to.
Be similar or analogous to.

noun
A similarity; a comparison.
A parallel device or circuit.
A line on a map a set distance from the equator
Each of the imaginary parallel circles of constant latitude on the earth’s surface.
A line of latitude.
A similarity; a comparison.

factor

Share vocabulary

US /ˈfæktɚ/

・

UK /'fæktə(r)/

noun
Something that influences a result

verb
To consider or include in an estimate or judgment
To find the two numbers a number can be divided by

A2 TOEIC

effect

Share vocabulary

US /ɪˈfɛkt/

・

UK /ɪ'fekt/

noun
An advantage, benefit
The power to produce a result; influence.
Change brought about by a cause; result

other
To cause (something) to happen; bring about.

A1 TOEIC

architecture

Share vocabulary

US /ˈɑrkɪˌtɛktʃɚ/

・

UK /ˈɑ:kɪtektʃə(r)/

noun
Design and construction of buildings
The design and structure of a computer system or other complex system.
The art or practice of designing and constructing buildings.
The art or practice of designing and constructing buildings.
The complex or carefully designed structure of something.
A style or method of building.
The style or design of a building or buildings.

apply

Share vocabulary

US /əˈplaɪ/

・

UK /ə'plaɪ/

verb
To spread a substance or liquid over a surface
To be relevant or applicable
To commit your time and effort to doing something
To make something useful in a certain situation
To be relevant for a situation
To make a formal application or request
To ask formally for (job, permission etc.)
To put or spread (something) on a surface

A1 TOEIC

performance

Share vocabulary

US / pɚˈfɔrməns/

・

UK /pə'fɔ:məns/

noun
Act of doing something
The action of performing; an act of staging or presenting a play, concert, or other form of entertainment.
Activity done to entertain an audience
The action or process of carrying out or accomplishing an action, task, or function.
The action or process of carrying out or accomplishing an action, task, or function.
The capability of a machine, product, or person.
The operating speed, efficiency, or capability of a computer or other machine.
The financial result of the activities of a company or organization over a period of time.
An act of presenting a play, concert, or other form of entertainment.
The action of performing, executing, or fulfilling an action, promise, or duty.
The action of performing a task, duty, or function.
An act of staging or presenting a play, concert, or other form of entertainment.
The manner in which or the efficiency with which something reacts or fulfills its intended purpose.
The action or manner of performing in a sporting event.
The action or manner of performing in a sporting event.
A public presentation or exhibition.
The way in which someone or something functions or operates.

other
The action or process of carrying out or accomplishing an action, task, or function.
The capabilities or functioning of a machine, product, or system.
The operating speed or efficiency of a computer or other machine.
A measure of how efficiently a computer or other system operates.
The financial results of a company or investment over a period of time.
The financial result of a company, fund, or investment over a period of time.
The action of performing, executing, or fulfilling an action, promise, or duty.
The act of performing; the state of being performed.
The action of performing a task, duty, or function.
The manner in which or the efficiency with which something reacts or fulfills its intended purpose.

A2 TOEIC

create

Share vocabulary

US /kriˈet/

・

UK /krɪ'eɪt/

verb
To make, cause, or bring into existence

other
To cause something to happen; to give rise to a particular situation or state.
To invent or design something new
To bring something into existence; to make or produce something new.

illustrate

Share vocabulary

US /ˈɪləˌstret, ɪˈlʌsˌtret/

・

UK /ˈɪləstreɪt/

other
To furnish (a book, magazine, etc.) with drawings, pictures, or other artwork.
To explain or make (something) clear by using examples, charts, pictures, etc.

verb
To supply pictures to go along with words
To be an example that explains or proves something

A2 TOEIC

device

Share vocabulary

US /dɪˈvaɪs/

・

UK /dɪˈvaɪs/

noun
A piece of hardware or equipment used in computer systems.
A technique or figure of speech used to achieve a particular effect in writing or speech.
Object, machine, or equipment for a specific use
A mechanism or piece of machinery.
Method of doing something; a way
A plan or strategy used to achieve a particular aim.
A tool or piece of equipment made for a specific purpose.

A2 TOEIC

scope

Share vocabulary

US /skoʊp/

・

UK /skəʊp/

verb
to look at especially for the purpose of evaluation —usually used with out

noun
Opportunity for action or thought
Range of things included or dealt with
Instrument that you use to look at things

B1 TOEIC

stage

Share vocabulary

US /stedʒ/

・

UK /steɪdʒ/

verb
To organize an event to gain public interest
To be fake or not real
To produce and perform a play or concert

noun
Place where actors or musicians perform for others
Particular point during development or growth

A1 TOEIC

loop

Share vocabulary

US /lup/

・

UK /lu:p/

noun
A circle or curved shape, as when you tie a lace
A circle of rope or string to put around something

verb
To put a circle of rope around something

B2 TOEIC

distribution

Share vocabulary

US /ˌdɪstrəˈbjuʃən/

・

UK /ˌdɪstrɪˈbju:ʃn/

noun
Dividing or spreading out to others
Act of sending products to stores to sell
Act of spreading something over an area
An arrangement of values showing frequency
The act of sharing things among a number of recipients; the way in which something is spread or shared out.
The natural geographic range of an organism or species.
A particular version of a software system, including the core software and other added software.
A payment made to shareholders or investors from a company's profits or assets.
The way in which something is spread or shared out over a geographic area.
The process of making a product or service available for the consumer or business user who needs it.
The way in which something is spread or shared out.

adjective
Sending products to stores to sell

domain

Share vocabulary

US /doʊˈmeɪn/

・

UK /dəˈmeɪn/

noun
A field of knowledge, thought or influence
Area that a ruler or government controls

single

Share vocabulary

US /ˈsɪŋɡəl/

・

UK /'sɪŋɡl/

noun
One run in cricket or a hit baseball
An individual song from a CD or album
Person who is not married or in a relationship

adjective
Being one only, without others
Only; merely
Not married or in a relationship

batch

Share vocabulary

US /bætʃ/

・

UK /bætʃ/

noun
Amount of something that is produced at one time
A group of things made or done together

verb
To combine or arrange together into a group

learn

Share vocabulary

US /lɚn/

・

UK /lɜ:n/

verb
To get knowledge or skills by study or experience
To gain knowledge or skill by studying, from experience, or by being taught.

other
To gain knowledge or skill by studying, from experience, or by being taught.

other
To gain knowledge or skill by studying, from experience, or by being taught.
To find out something.
To find out something.

support

Share vocabulary

US /səˈpɔrt, -ˈport/

・

UK /səˈpɔ:t/

noun
Assistance or advice given to someone
Evidence that helps prove something is true
A thing to hold up or prevent from falling down

verb
To give assistance or advice to someone
To help prove or show that something is true
To hold up or prevent from falling down

A1 TOEIC

usage

Share vocabulary

US /ˈjusɪdʒ, -zɪdʒ/

・

UK /ˈju:sɪdʒ/

noun
Amount something is used
Way something is actually used

program

Share vocabulary

US /ˈproˌɡræm, -ɡrəm/

・

UK /'prəʊɡræm/

noun
A broadcast on television or radio.
A planned series of events or activities.
A set of instructions that tells a computer what to do.
Small book of events in a play, concert or movie
A plan or schedule of events
A plan of action designed to achieve a specific goal.
A structured set of activities designed to help someone recover from an illness or addiction.
Series of classes at a college or university
A computer application
TV show

other
To write a set of instructions for a computer to follow.
To arrange according to a plan or schedule.

verb
To make someone act or think in a certain way
To write computer code for a piece of software

A1 TOEIC

build

Share vocabulary

US /bɪld/

・

UK /bɪld/

noun
Your physical shape; physique
A version of a software program.
The process of construction.

other
To increase or strengthen confidence.
To construct (something) by putting parts or materials together.
To increase or develop something over time.
To create or establish something gradually.
To use something as a base or foundation for further development.
To create or strengthen a relationship.

verb
To construct (something) by putting parts or materials together.
To develop or establish (something) gradually.
To establish e.g. a reputation, over time
To construct a house, office, factory

other
To work towards a particular goal or outcome.
To increase or intensify, especially referring to weather conditions.

A1 TOEIC

point

Share vocabulary

US /pɔɪnt/

・

UK /pɔɪnt/

noun
A decimal point.
An item to be discussed
Small spot or dot
Fact or detail
Feature of something
The size of the text on a page or screen
A particular idea or argument.
A certain time or stage in a process
A particular location or position.
Certain position in time or space
The purpose or aim of something.
Idea or fact you try to convince people about
Measurement of scores in a game, sport
A stitch in sewing or needlework.
Sharp end of something, e.g. a pen or pin
A specific moment in time.

other
To direct someone's attention to something by extending a finger or object.
To indicate or suggest a particular fact or conclusion.

verb
To face a certain direction, e.g. north
To indicate something with your finger to others

A1 TOEIC

step

Share vocabulary

US /stɛp/

・

UK /step/

noun
Movement done as part of a particular dance
Distance covered by one movement of a leg; stride
One part or stage in a process
Sound made by the feet while walking; footstep
Flat horizontal piece that forms stairs

verb
To raise and moving the foot to put it down

A1 TOEIC

set

Share vocabulary

US /sɛt/

・

UK /set/

verb
To put something in a particular place or position.
To fix or direct one's mind or attention on something.
To put a broken bone into the correct position for healing.
To put a broken bone into the correct position for healing.
To make a clock state or ring at a particular time
To decide upon or choose something
To start a journey or activity.
To make something ready for use (e.g. table)
To establish or decide on something.
To decide on a price for something
To become hard (of glue, concrete etc.)
To become firm or hard.
To be located in a specific place or time
To put words to music.
To prepare something for use.
To arrange type for printing.
To put (e.g. a vase) down carefully in a place
To put someone in a certain condition or state
(Of the sun) to go lower than the horizon

noun
A determined or resolute attitude.
Complete group of something
Place where a television show or movie is filmed
The scenery and props used for a play or film.
A television receiver.
Group of games in a tennis match
Device that receives radio or television signals

adjective
Prepared for something; ready
Fixed; not able to be changed.

A1 TOEIC

replica

Share vocabulary

US /ˈrɛplɪkə/

・

UK /ˈreplɪkə/

noun
An exact copy of something

iteration

Share vocabulary

US /ˌɪtəˈreʃən/

・

UK /ˌɪtəˈreɪʃn/

noun
The repetition of a process or procedure, especially in a computer program.
Single execution of a repeated set of instructions

buffer

Share vocabulary

US /ˈbʌfɚ/

・

UK /ˈbʌfə(r)/

noun
Data in computer memory allowing fast access
Protection to prevent things mixing

verb
To store data in special memory for fast access
To protect a thing by separating it from

datum

Share vocabulary

US /ˈdetəm, ˈdætəm, ˈdɑtəm/

・

UK /ˈdeɪtəm/

noun
Item of factual information

pipeline

Share vocabulary

US /ˈpaɪpˌlaɪn/

・

UK /ˈpaɪplaɪn/

noun
Line of pipes used for carrying gases or liquids

verb
To send through a series of pipes

transformation

Share vocabulary

US /ˌtrænsfərˈmeɪʃn/

・

UK /ˌtrænsfəˈmeɪʃn/

noun
A complete change in shape or form of something
A complete change in the appearance or character of something or someone.
A complete change in appearance or character.

parameter

Share vocabulary

US /pəˈræmɪtɚ/

・

UK /pəˈræmɪtə(r)/

noun
A characteristic or constant factor; a limit

B1 TOEIC

agnostic

Share vocabulary

US /æɡˈnɑstɪk/

・

UK /ægˈnɒstɪk/

adjective
Being undecided about the existence of God

noun
Person who is undecided about the existence of God

bottleneck

Share vocabulary

US /ˈbɑ:tlnek/

・

UK /ˈbɒtlnek/

noun
Become narrow, like a bottleneck

epoch

Share vocabulary

US /ˈɛpək, ˈiˌpɑk/

・

UK /'i:pɒk/

noun
Specific period of history, e.g. a remarkable one
A period of time in history or a person's life, typically one marked by notable events or particular characteristics.