Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • CLEMENS MEWALD: My name is Clemens.

  • I'm the product lead for TensorFlow Extended,

  • the end-to-end machine learning platform

  • that we built for TensorFlow.

  • And we have a lot of exciting announcements,

  • so let's jump right in.

  • A lot of you may be familiar with this graph.

  • We published this in a paper in 2017.

  • And the main point that I usually make on this graph

  • is that there's more to machine learning than just the training

  • part.

  • In the middle, the trainer piece,

  • that's where you train your machinery model.

  • But if you want to do machine learning in production

  • reliably and in a robust way, you actually

  • need all of these other components

  • before and after, and in parallel, to the training

  • algorithm.

  • And often I hear, sometimes from researchers, well,

  • I really only do research.

  • I only care about training the machine learning model

  • and I don't really need all of these upstream and downstream

  • things.

  • But what I would argue is that research often

  • leads to production.

  • And what we want to avoid is researchers

  • having to re-implement their hard work,

  • in a model that they've built, when they want to put

  • the model into production.

  • That's actually one of the main reasons

  • why we open sourced TensorFlow because we really

  • wanted the research community to build the models in a framework

  • that we can then use and actually move into production.

  • A second comment that I hear often

  • is, well, I only have a very small data set

  • that fits in a single machine.

  • And all of these tools are built to scale up

  • to hundreds of machines.

  • And I don't really need all of these heavy tools.

  • But what we've seen time and time again at Google

  • is that small data today becomes large data tomorrow.

  • And there's really no reason why you

  • would have to re-implement your entire stack just

  • because your data set grew.

  • So we really want to make sure that you

  • can use the same tools early on in your journey

  • so that the tools can actually grow with you and your product,

  • with the data, so that you can scale the exact same code

  • to hundreds of machines.

  • So we've built TensorFlow Extended as a platform

  • at Google, and it has had a profound impact

  • to how we do machine learning and production

  • and into becoming an AI-first company.

  • So TFX really powers some of our most important Alphabet

  • companies.

  • Of course, Google is just one of the Alphabet companies.

  • So TFX is used at six different Alphabet companies.

  • And within Google, it's really used

  • with all of the major products.

  • And also, all of the products that

  • don't have billions of users [INAUDIBLE] this slide.

  • And I've said before that we really

  • want to make TFX available to all of you

  • because we've seen the profound impact it

  • has had on our business.

  • And we're really excited to see what

  • you can do with the same tools in your companies.

  • So a year ago we talked about the libraries

  • that we had open sourced at that point in time.

  • So we talked about TensorFlow Transform, the training

  • libraries, Estimators and Keras, TensorFlow Model Analysis,

  • and TensorFlow Serving.

  • And I made the point that, back then, as today, all of these

  • are just libraries.

  • So they're low-level libraries that you still

  • have to use independently and stitch together

  • to actually make work and train for your own use cases.

  • Later that year, we added TensorFlow Data Validation.

  • So that made the picture a little more complete.

  • But we're still far away from actually being done yet.

  • However, it was extremely valuable to release

  • these libraries at that point in time

  • because some of our most important partners

  • externally has also had a profound impact with some

  • of these libraries.

  • So we've just heard from our friends at Airbnb.

  • They use TensorFlow Serving in that case study

  • that they mentioned.

  • Our friends at Twitter just published this fascinating blog

  • post of how they used TensorFlow to rank tweets

  • on their home timeline.

  • And they've used TensorFlow Model Analysis to analyze

  • that model on different segments of the data

  • and used TensorFlow Hub to share some of the word embeddings

  • that they've used for these models.

  • So coming back to this picture.

  • For those of you who've seen my talk last year,

  • I promised everyone that there will be more.

  • Because, again, this is only the partial platform.

  • It's far away from actually being an end-to-end platform.

  • It's just a set of libraries.

  • So today, for the very first time,

  • we're actually sharing the horizontal layers

  • that integrate all of these libraries

  • into one end-to-end platform, into one end-to-end product,

  • which is called TensorFlow Extended.

  • But first, we have to build components out

  • of these libraries.

  • So at the top of this slide, you see in orange, the libraries

  • that we've shared in the past.

  • And then in blue, you see the components

  • that we've built from these libraries.

  • So one observation to be made here is that, of course,

  • libraries are very low level and very flexible.

  • So with a single library, we can build many different components

  • that are part of machine learning pipeline.

  • So in the example of TensorFlow Data Validation,

  • we used the same library to build

  • three different components.

  • And I will go into detail on each one of these components

  • later.

  • So what makes a component?

  • A component is no longer just a library.

  • It's a packaged binary or container

  • that can be run as part of a pipeline.

  • It has well-defined inputs and outputs.

  • In the case of Model Validation, it's

  • the last validated model, a new candidate model,

  • and the validation outcome.

  • And that's a well-defined interface

  • of each one of these components.

  • It has a well-defined configuration.

  • And, most importantly, it's one configuration model

  • for the entire pipeline.

  • So you configure a TFX pipeline end to end.

  • And some of you may have noticed,

  • because Model Validation needs the last validated model,

  • it actually needs some context.

  • It needs to know what was the last model that was validated.

  • So we need to add a metadata store that actually provides

  • this context, that keeps a record of all

  • of the previous runs so that some of these more advanced

  • capabilities can be enabled.

  • So how does this context get created?

  • Of course, in this case, the trainer produces new models.

  • Model Validator knows about the last validated model

  • and the new candidate model.

  • And then downstream from the Validator,

  • we take that new candidate model and the validation outcome.

  • And if the validation outcome is positive,

  • we push the model to the serving system.

  • If it's negative, we don't.

  • Because usually we don't want to push

  • a model that's worse than our previous model

  • into our serving system.

  • So the Metadata Store is new.

  • So let's discuss why we need this

  • and what the Metadata Store does.

  • First, when most people talk about machine learning

  • workflows and pipelines, they really

  • think about task dependency.

  • They think there's one component and when that's finished,

  • there's another component that runs.

  • However, all of you who actually do machine learning

  • in production know that we actually need data dependency,

  • because all of these components consume artifacts and create

  • artifacts.

  • And as the example of Model Validation has showed,

  • it's incredibly important to actually know

  • these dependencies.

  • So we need a system that's both task and data aware so

  • that each component has a history of all

  • of the previous runs and knows about all of the artifacts.