Subtitles section Play video
-
[MUSIC PLAYING]
-
TRIS WARKENTIN: Hi, everyone.
-
I'm Tris Warkentin and I'm a product manager
-
on TensorFlow Extended, or TFX.
-
ZHITAO LI: Hi, my name is Zhitao Li.
-
I'm a tech lead manager from TFX Open Source.
-
TRIS WARKENTIN: At Google, putting machine learning models
-
into production is one of the most important things
-
our engineers and researchers do.
-
But to achieve this global reach and production readiness,
-
a reliable production platform is
-
critical to Google's success.
-
And that's the goal of TensorFlow Extended--
-
to create a stable platform for production ML at Google,
-
and a stable platform for you to build production-ready ML
-
systems, too.
-
So how does that work?
-
Our philosophy is to take modern software engineering
-
and combine it with what we've learned about machine learning
-
development at Google.
-
So what's the difference between writing code and doing machine
-
learning engineering?
-
In coding, you might build something
-
that one person can create end to end.
-
You might have untested code, undocumented code,
-
and code that's hard to reuse.
-
In modern software engineering, we
-
have solutions for all of those problems-- test-driven
-
development, modular designs, scalable performance
-
optimization, and much more.
-
So how is that different in machine learning development?
-
Well, a lot of the problems from coding still apply to ML,
-
but we also have a variety of new problems.
-
We have no problem statements.
-
We might need some continuous optimization.
-
We might need to understand when changes in data
-
will result in different shapes of our models.
-
We've been doing this at Google for a long time.
-
In 2007, we launched Sibyl, which
-
was our production scalable platform for production ML
-
here at Google.
-
And since 2016, we've been working on TFX,
-
and last year we open sourced it to make
-
it even easier for you to build production
-
ML in your platforms.
-
What does it look like in practice?
-
Well, the entirety of TFX as an end to end platform
-
runs from best practices all the way through to end to end.
-
From best practices, you don't even
-
have to use a single line of Google developed code
-
in order to get some of the best of TFX, all the way
-
through to end to end pipelines that allow you to deploy
-
scalable production scale ML.
-
This is what a pipeline might look like.
-
On the left side of the screen, you'll see data intake,
-
then it runs through the pipeline,
-
doing things like data validation, schema generation,
-
and much more in order to make sure
-
that you're doing things in a repeatable, testable,
-
consistent way and producing ML production results.
-
So it's hard to believe that we've only
-
been one year in open source for our end
-
to end pipeline offering, but we have
-
a lot of interesting things that we've done in 2019,
-
including building the foundations of metadata,
-
building basic 2.0 support for things like estimators,
-
as well as launches of Fairness Indicators and TFMA.
-
But we're definitely not done.
-
In 2020, we have a wide variety of interesting developments
-
coming, including NativeKeras on TFX,
-
which you'll hear more about from Zhitao later,
-
as well as TensorFlow Lite trainer rewrite
-
and some warm starting, which can make your machine learning
-
training a hundred times faster by using caching.
-
But we have something really exciting
-
that we're announcing today, which
-
you may have heard from Megan about in the keynote, which
-
is end to end ML pipelines.
-
These are our Cloud AI Platform Pipelines.
-
We're really excited about these, because they combine
-
a lot of the best of Google AI Platform with TFX
-
to create Cloud AI Platform Pipelines, available today.
-
Please check out our blog for more information.
-
You should be able to find it if you just Google
-
"Cloud API Platform Pipelines."
-
And now, can we please cut to the demo?
-
ZHITAO LI: So I'll be giving an explanation about this demo.
-
This is the Cloud AI Platform Pipelines page.
-
As you see, you can see all your existing Cloud AI Pipeline
-
clusters in this page.
-
We've already created one, and this page
-
can be found at AI Platforms Pipelines tab
-
from the left of the Google Cloud Console.
-
If you don't have any pipelines cluster created yet,
-
you can use the New Instance button to create a new one.
-
This gives you a click button experience
-
while creating clusters, which is usually
-
one of the difficult jobs in the past.
-
You can use the Config button to create a Cloud AI
-
Pipelines on Google Cloud.
-
This gives you Cloud AI Pipelines space on Kubenetes.
-
Runs on Google's GKE.
-
You can choose a class it will run it from, choose
-
the namespace where you want to create a class for inside,
-
and choose a name of a cluster.
-
Once you are done, you can simply click Deploy and Done.
-
Since I already have a cluster, I
-
will open up the Pipeline dashboard here.
-
In this page, you can see a list of demo pipelines
-
that you can play with.
-
You can see tutorials about creating pipelines and doing
-
various techniques, and you can use the Pipelines tab
-
from the left to view all your existing pipelines here.
-
Since this class is newly created,
-
there is no TFX pipelines in it yet.
-
We are going to use the newly launched TFX
-
templates to create the Cloud AI Pipelines in this cluster.
-
This is the Cloud AI Notebook.
-
I'm pretty much using this as a Python shell
-
to write some simple Python commands.
-
First, you set up your environments
-
and then making sure TFX is properly installed, together
-
with some other dependencies.
-
Making sure you have environment variables
-
like Path Properties set up, and the TFX version is up to date.
-
Now you're making sure you have a Google Cloud project--
-
perfect config.
-
In the Config, the Cloud AI Pipelines cluster endpoint.
-
Simply copy that from the URL into the Notebook shell.
-
Now, we're also making sure we create the Google Container
-
Image repo so that we can upload our containers, too.
-
Once that is done, we config the pipeline name and the project
-
directory.
-
Now we can use the Template Creation
-
to create a new template.
-
Since they are created, I'm going
-
to show the content created by the templates.
-
As you see, there are pipeline code in the pipeline.py file.
-
This includes our classical taxi pipeline
-
from TFX with all the components necessary to do
-
production machine learning.
-
There is configs.py, with some configurations
-
related to Google Cloud as well as some configuration about TFX
-
itself.
-
Once that is done, we enter the Templates Directory,
-
making sure all the templates valid are there.
-
You can even run some pre-generated unit test
-
on the features to making sure the configuration looks right.
-
Once that's done, you can then use TFX CLI command
-
to create a TFX pipeline on Google Cloud Pipelines page.
-
This will create a temporary image
-
with all your code and the dependencies.
-
Upload them to DCR, then create a pipeline using this container
-
image on the Pipelines page.
-
As we see, the pipeline compiles and the creation is successful,
-
and we go back to the Pipeline page.
-
Click on Refresh.
-
Boom-- we have our new pipeline.
-
Now, if we click through the pipeline,
-
you are going to see all the TFX components here
-
readily available.
-
We can create a test to run on this one.
-
And click on the Run.
-
We are going to see each of the components.
-
When they run, they will gradually
-
show up on the web page.
-
The first component should be ExampleGen.
-
So it has to be ExampleGen. Yes, it is there.
-
This component has started running.
-
You can click on it.
-
On the tab, you can look at the artifacts, inputs and outputs,
-
what Kubernetes' volumes used for the component, manifest,
-
and you can even inspect the logs of a component run.
-
We call this ExampleGen, StatsGen, SchemaGen.
-
And now the pipeline enters feature transform
-
and the example validation at the same time.
-
So now all the data preparation is finished.
-
The pipeline enters into a training stage,
-
which is producing a TensorFlow model.
-
If we click on the trainer component,
-
we can even inspect these logs.
-
Now, once trainer is complete, we
-
do some model validation and evaluation
-
using TFX components.
-
And once all the model evaluation is done,
-
we use Pusher to push the generated model
-
onto external serving system.
-
So everything-- so you have a model
-
ready to use in production.
-
You can also use the tabs on the left
-
to navigate on existing experiments, artifacts,
-
and executions.
-
We are going to take a look at the artifacts generated
-
from this pipeline using the Artifacts tab.
-
So here, you can see you have a pipeline.
-
If you click on the Model Output Artifacts from Trainer,
-
that represents a TensorFlow model.
-
This is the artifact ML metadata.
-
We can see it's a model artifact produced by trainer.
-
And this is a lineage view of the model--
-
this explains what components produced this model
-
from what input artifacts and how this artifact is further
-
used by other components which takes this one's input
-
and what outputs are generated from the downstream components.
-
OK this is all of the demo.
-
Now I'm going to talk about another important development
-
in TFX, which is supporting NativeKeras.
-
For those of you who are not very familiar with TensorFlow
-
2, let me capture a little of the history.
-
TensorFlow 2 was released in Q3 2019
-
with a focus on providing a more Pythonic experience.
-
That includes supporting the Keras
-
API, eager execution by default, and the Pythonic execution.
-
So this is a timeline of how TFX Open Source has been working
-
on supporting all of this.
-
We released the first version with TFX Open Source
-
in the last Dev Summit, which only
-
supports estimator-based TensorFlow training code.
-
Back to last TensorFlow World, TensorFlow 2.0 was launched
-
and we started working on supporting the Keras API.
-
Previous slide, please?
-
Previous slide, please?
-
Back to Q4 2019.
-
We released the basic TensorFlow 2.0 support.
-
In TFX 0.20-- in that version, we supported TensorFlow 2.0
-
package end to end with a limited Keras support with
-
Keras Estimator.
-
And now, in the latest release TFX, I'm happy to release--
-
we are releasing experimental support of NativeKeras training
-
end to end.
-
So what does that mean?
-
Let's take a deeper look.
-
For data ingestion analysis, everything pretty much remains
-
the same, because TFTP, our data analysis library,
-
is model agnostic.
-
For future transform, we added a new Keras compatible layer
-
in TFT library so that we can transform features
-
in Keras model.
-
This layer will also take care of asset management
-
and the model exporting.
-
For training, we created a new generic trainer executor
-
which can be used to run any TensorFlow training code which
-
explores a saved model.
-
This also covers the training using NativeKeras API.
-
For model analysis and validation,
-
we create a new evaluator component,
-
which combines both evaluation and the model validation
-
capabilities.
-
This new component supports NativeKeras auto blocks.
-
And finally, when it gets to model serving validation,
-
we will release a new component called infra validator.
-
This component