Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • TRIS WARKENTIN: Hi, everyone.

  • I'm Tris Warkentin and I'm a product manager

  • on TensorFlow Extended, or TFX.

  • ZHITAO LI: Hi, my name is Zhitao Li.

  • I'm a tech lead manager from TFX Open Source.

  • TRIS WARKENTIN: At Google, putting machine learning models

  • into production is one of the most important things

  • our engineers and researchers do.

  • But to achieve this global reach and production readiness,

  • a reliable production platform is

  • critical to Google's success.

  • And that's the goal of TensorFlow Extended--

  • to create a stable platform for production ML at Google,

  • and a stable platform for you to build production-ready ML

  • systems, too.

  • So how does that work?

  • Our philosophy is to take modern software engineering

  • and combine it with what we've learned about machine learning

  • development at Google.

  • So what's the difference between writing code and doing machine

  • learning engineering?

  • In coding, you might build something

  • that one person can create end to end.

  • You might have untested code, undocumented code,

  • and code that's hard to reuse.

  • In modern software engineering, we

  • have solutions for all of those problems-- test-driven

  • development, modular designs, scalable performance

  • optimization, and much more.

  • So how is that different in machine learning development?

  • Well, a lot of the problems from coding still apply to ML,

  • but we also have a variety of new problems.

  • We have no problem statements.

  • We might need some continuous optimization.

  • We might need to understand when changes in data

  • will result in different shapes of our models.

  • We've been doing this at Google for a long time.

  • In 2007, we launched Sibyl, which

  • was our production scalable platform for production ML

  • here at Google.

  • And since 2016, we've been working on TFX,

  • and last year we open sourced it to make

  • it even easier for you to build production

  • ML in your platforms.

  • What does it look like in practice?

  • Well, the entirety of TFX as an end to end platform

  • runs from best practices all the way through to end to end.

  • From best practices, you don't even

  • have to use a single line of Google developed code

  • in order to get some of the best of TFX, all the way

  • through to end to end pipelines that allow you to deploy

  • scalable production scale ML.

  • This is what a pipeline might look like.

  • On the left side of the screen, you'll see data intake,

  • then it runs through the pipeline,

  • doing things like data validation, schema generation,

  • and much more in order to make sure

  • that you're doing things in a repeatable, testable,

  • consistent way and producing ML production results.

  • So it's hard to believe that we've only

  • been one year in open source for our end

  • to end pipeline offering, but we have

  • a lot of interesting things that we've done in 2019,

  • including building the foundations of metadata,

  • building basic 2.0 support for things like estimators,

  • as well as launches of Fairness Indicators and TFMA.

  • But we're definitely not done.

  • In 2020, we have a wide variety of interesting developments

  • coming, including NativeKeras on TFX,

  • which you'll hear more about from Zhitao later,

  • as well as TensorFlow Lite trainer rewrite

  • and some warm starting, which can make your machine learning

  • training a hundred times faster by using caching.

  • But we have something really exciting

  • that we're announcing today, which

  • you may have heard from Megan about in the keynote, which

  • is end to end ML pipelines.

  • These are our Cloud AI Platform Pipelines.

  • We're really excited about these, because they combine

  • a lot of the best of Google AI Platform with TFX

  • to create Cloud AI Platform Pipelines, available today.

  • Please check out our blog for more information.

  • You should be able to find it if you just Google

  • "Cloud API Platform Pipelines."

  • And now, can we please cut to the demo?

  • ZHITAO LI: So I'll be giving an explanation about this demo.

  • This is the Cloud AI Platform Pipelines page.

  • As you see, you can see all your existing Cloud AI Pipeline

  • clusters in this page.

  • We've already created one, and this page

  • can be found at AI Platforms Pipelines tab

  • from the left of the Google Cloud Console.

  • If you don't have any pipelines cluster created yet,

  • you can use the New Instance button to create a new one.

  • This gives you a click button experience

  • while creating clusters, which is usually

  • one of the difficult jobs in the past.

  • You can use the Config button to create a Cloud AI

  • Pipelines on Google Cloud.

  • This gives you Cloud AI Pipelines space on Kubenetes.

  • Runs on Google's GKE.

  • You can choose a class it will run it from, choose

  • the namespace where you want to create a class for inside,

  • and choose a name of a cluster.

  • Once you are done, you can simply click Deploy and Done.

  • Since I already have a cluster, I

  • will open up the Pipeline dashboard here.

  • In this page, you can see a list of demo pipelines

  • that you can play with.

  • You can see tutorials about creating pipelines and doing

  • various techniques, and you can use the Pipelines tab

  • from the left to view all your existing pipelines here.

  • Since this class is newly created,

  • there is no TFX pipelines in it yet.

  • We are going to use the newly launched TFX

  • templates to create the Cloud AI Pipelines in this cluster.

  • This is the Cloud AI Notebook.

  • I'm pretty much using this as a Python shell

  • to write some simple Python commands.

  • First, you set up your environments

  • and then making sure TFX is properly installed, together

  • with some other dependencies.

  • Making sure you have environment variables

  • like Path Properties set up, and the TFX version is up to date.

  • Now you're making sure you have a Google Cloud project--

  • perfect config.

  • In the Config, the Cloud AI Pipelines cluster endpoint.

  • Simply copy that from the URL into the Notebook shell.

  • Now, we're also making sure we create the Google Container

  • Image repo so that we can upload our containers, too.

  • Once that is done, we config the pipeline name and the project

  • directory.

  • Now we can use the Template Creation

  • to create a new template.

  • Since they are created, I'm going

  • to show the content created by the templates.

  • As you see, there are pipeline code in the pipeline.py file.

  • This includes our classical taxi pipeline

  • from TFX with all the components necessary to do

  • production machine learning.

  • There is configs.py, with some configurations

  • related to Google Cloud as well as some configuration about TFX

  • itself.

  • Once that is done, we enter the Templates Directory,

  • making sure all the templates valid are there.

  • You can even run some pre-generated unit test

  • on the features to making sure the configuration looks right.

  • Once that's done, you can then use TFX CLI command

  • to create a TFX pipeline on Google Cloud Pipelines page.

  • This will create a temporary image

  • with all your code and the dependencies.

  • Upload them to DCR, then create a pipeline using this container

  • image on the Pipelines page.

  • As we see, the pipeline compiles and the creation is successful,

  • and we go back to the Pipeline page.

  • Click on Refresh.

  • Boom-- we have our new pipeline.

  • Now, if we click through the pipeline,

  • you are going to see all the TFX components here

  • readily available.

  • We can create a test to run on this one.

  • And click on the Run.

  • We are going to see each of the components.

  • When they run, they will gradually

  • show up on the web page.

  • The first component should be ExampleGen.

  • So it has to be ExampleGen. Yes, it is there.

  • This component has started running.

  • You can click on it.

  • On the tab, you can look at the artifacts, inputs and outputs,

  • what Kubernetes' volumes used for the component, manifest,

  • and you can even inspect the logs of a component run.

  • We call this ExampleGen, StatsGen, SchemaGen.

  • And now the pipeline enters feature transform

  • and the example validation at the same time.

  • So now all the data preparation is finished.

  • The pipeline enters into a training stage,

  • which is producing a TensorFlow model.

  • If we click on the trainer component,

  • we can even inspect these logs.

  • Now, once trainer is complete, we

  • do some model validation and evaluation

  • using TFX components.

  • And once all the model evaluation is done,

  • we use Pusher to push the generated model

  • onto external serving system.

  • So everything-- so you have a model

  • ready to use in production.

  • You can also use the tabs on the left

  • to navigate on existing experiments, artifacts,

  • and executions.

  • We are going to take a look at the artifacts generated

  • from this pipeline using the Artifacts tab.

  • So here, you can see you have a pipeline.

  • If you click on the Model Output Artifacts from Trainer,

  • that represents a TensorFlow model.

  • This is the artifact ML metadata.

  • We can see it's a model artifact produced by trainer.

  • And this is a lineage view of the model--

  • this explains what components produced this model

  • from what input artifacts and how this artifact is further

  • used by other components which takes this one's input

  • and what outputs are generated from the downstream components.

  • OK this is all of the demo.

  • Now I'm going to talk about another important development

  • in TFX, which is supporting NativeKeras.

  • For those of you who are not very familiar with TensorFlow

  • 2, let me capture a little of the history.

  • TensorFlow 2 was released in Q3 2019

  • with a focus on providing a more Pythonic experience.

  • That includes supporting the Keras

  • API, eager execution by default, and the Pythonic execution.

  • So this is a timeline of how TFX Open Source has been working

  • on supporting all of this.

  • We released the first version with TFX Open Source

  • in the last Dev Summit, which only

  • supports estimator-based TensorFlow training code.

  • Back to last TensorFlow World, TensorFlow 2.0 was launched

  • and we started working on supporting the Keras API.

  • Previous slide, please?

  • Previous slide, please?

  • Back to Q4 2019.

  • We released the basic TensorFlow 2.0 support.

  • In TFX 0.20-- in that version, we supported TensorFlow 2.0

  • package end to end with a limited Keras support with

  • Keras Estimator.

  • And now, in the latest release TFX, I'm happy to release--

  • we are releasing experimental support of NativeKeras training

  • end to end.

  • So what does that mean?

  • Let's take a deeper look.

  • For data ingestion analysis, everything pretty much remains

  • the same, because TFTP, our data analysis library,

  • is model agnostic.

  • For future transform, we added a new Keras compatible layer

  • in TFT library so that we can transform features

  • in Keras model.

  • This layer will also take care of asset management

  • and the model exporting.

  • For training, we created a new generic trainer executor

  • which can be used to run any TensorFlow training code which

  • explores a saved model.

  • This also covers the training using NativeKeras API.

  • For model analysis and validation,

  • we create a new evaluator component,

  • which combines both evaluation and the model validation

  • capabilities.

  • This new component supports NativeKeras auto blocks.

  • And finally, when it gets to model serving validation,

  • we will release a new component called infra validator.

  • This component