Placeholder Image

Subtitles section Play video

  • ROHAN JAIN: Hi, all.

  • I'm Rohan, and I'm here to talk to you

  • about how you can scale up or input data processing

  • with tf.data.

  • So let's start with a high-level view of your ML training job.

  • Typically, your ML training step will have two phases to it.

  • The first is data preprocessing, where

  • you're going to look at the input files

  • and do all kinds of transformations on them

  • to make them ready for the next phase, which

  • is model computation.

  • While you're doing data preprocessing, which

  • happens in the CPU, you might be doing

  • some kind of things such as--

  • for images, you're cropping them.

  • For videos, you may be sampling them and whatnot.

  • So if your training speed is slow,

  • you could have a bottleneck in either one of these two places.

  • And I hope that the talk on profiling

  • would give you an indication on how

  • to figure out which one of the two phases

  • you're getting slow at.

  • And I'm here to talk to you about the first kind

  • of preprocessing bottleneck-- the bottleneck which

  • is data preprocessing.

  • So let's try to look into what this bottleneck really is.

  • So in the last few years we've done a fantastic job

  • making accelerators which do the ML operations really fast.

  • And so the amount of time it takes

  • us to do a matrix operation and all

  • the linear algebra our operations is a lot smaller.

  • But the hosts and the CPUs that feed the data

  • to these accelerators have not been able to keep up with them,

  • and so there ends up being a bottleneck.

  • We thought that we could mitigate this

  • by making the models more complex,

  • but what happens is that the accelerators have constraints

  • on how much RAM they have, and, more importantly,

  • where you deploy these models tends

  • to be something like a mobile device or something like that,

  • which tends to restrict the amount of complexity

  • you can introduce into your model.

  • So that hasn't really panned out.

  • The second approach people take is that they

  • try to turn larger batch sizes.

  • But larger batch sizes require a larger amount

  • of preprocessing to assemble the batch,

  • so then that puts further pressure on them.

  • So that's why this is becoming an increasingly larger problem

  • within Alphabet and even externally.

  • And I'm going to talk to you about how

  • you can solve it using tf.data.

  • tf.data is TensorFlow's data preprocessing framework.

  • It's fast, it's flexible, and it's easy to use.

  • And you can learn more about it at our guide.

  • For background for the rest of the talk,

  • I think I'm going to go through a typical tf.data pipeline,

  • and that'll help us in the later stages.

  • So suppose you have some data in some tf.data record files which

  • are your training data.

  • So you can now start off with the TF record

  • data set with that data.

  • And then after that, you start doing your preprocessing.

  • This is typically the bulk of the logic.

  • So if it's images, you're doing cropping, maybe flipping,

  • all sorts of things there.

  • After that, you shuffle the data so

  • that you don't train to the order in which you

  • see the examples and the input.

  • And that helps you with their training accuracy.

  • And after that, we will batch it so that the accelerator can now

  • make use of vectorized computations.

  • Finally, you want to do some software pipelining so that you

  • ensure that while the model is off

  • working on one batch of data, the preprocessing side can

  • produce the next batch so that everything

  • works very efficiently.

  • Finally, you can then feed this tf.data dataset

  • to a Keras model, so that you can now

  • start doing your training.

  • So given that sort of basic pipeline,

  • and suppose you have a bottleneck,

  • the first thing I'd recommend you to do

  • is to go through our single host performance guide,

  • and try to utilize every trick and transformation that

  • is available in tf-data to be able to extract

  • the maximum possible performance,

  • so that you're using all the [INAUDIBLE] and whatever.

  • There's excellent information at the guide that we have here.

  • And [INAUDIBLE] did a great talk at the ML Tokyo

  • Summit, which you can take a look at to learn more

  • about this.

  • So that's the first thing I'd recommend you do.

  • But suppose you have done that and you've

  • tried all the different recommendations that we have

  • here, but you're still bottlenecked on that data

  • preprocessing part.

  • And don't worry, you're not alone.

  • This is very common.

  • We've increasingly seen this with a lot

  • of internal customers.

  • And so now I'm very pleased to present a couple of solutions

  • that we've been working on on the team

  • to help you solve that problem.

  • So the first idea is that why don't we

  • just reuse the computation?

  • So suppose you're playing around with different model

  • architectures.

  • Your input pre-processing sort of part

  • kind of remains the same.

  • And if it's expensive and time-consuming, why don't we

  • just do it once, save it, and then

  • every subsequent time, we just read from it,

  • and do that quickly?

  • So we noticed a bunch of internal customers, teams

  • within Alphabet, who were trying to do this

  • on their own outside of tf.data, and we

  • decided to bring it in to tf.data

  • and make it incredibly fast, flexible, and easy to use.

  • And so this is what we call Snapshot.

  • The idea is what I explained to you.

  • You materialize the output of your data pre-processing once,

  • and then you can use it many, many times.

  • This is incredibly useful for playing around

  • with different model architectures

  • and if you settle down on an architecture doing

  • hyperparameter tuning.

  • And so you can get that speed up using Snapshot.

  • Next, I'm going to go through the pipeline

  • that we talked about before and see how you can add Snapshot

  • to it to make it faster.

  • So that's the original pipeline that we had.

  • And so notice that there's this pre-processing step, which

  • is expensive.

  • So now with Snapshot, you just add a snapshot transformation

  • right after that with a directory [INAUDIBLE]..

  • And with this, everything that is before the snapshot will now

  • be written to disk the first time it's run.

  • And then every subsequent time, we will just read from it.

  • And we would go through the rest of the steps as usual.

  • One thing I'd like to point out is

  • that we place the snapshot at a particular location

  • before the shuffle, because if it's after the shuffle,

  • everything gets frozen.

  • So all the randomization that you get out

  • of shuffle you lose, because every subsequent time,

  • you're just going to be reading the same exact order again

  • and again.

  • So that's why we introduce it at that stage in the pipeline.

  • So Snapshot, we developed it internally.

  • There are internal users and teams

  • that are using it and deriving benefit out of it.

  • And now we're bringing it to the open source world.

  • We published an RFC, which has more information about it

  • and some other technical details.

  • And this will be available in TensorFlow 2.3,

  • but I believe it will be available in the [INAUDIBLE]

  • shortly.

  • So remember, I talked about two ideas.

  • So the second idea is that, now, not all computation

  • is reusable, so because suppose you had someone randomized

  • crops in there.

  • And if you wrote that to disk and read them back,

  • you'd, again, lose that randomization.

  • And so a snapshot is probably not

  • applicable in that scenario.

  • So the second idea is to be able to distribute the computation.

  • So the initial setup is that you have one host CPU, which

  • is driving a bunch of these accelerators,

  • but now you can offload this computation

  • from this host to maybe a cluster.

  • And now you can utilize the ability

  • and the computational power that you

  • have for all these different