Placeholder Image

Subtitles section Play video

  • ROBERT CROWE: I'm Robert Crowe.

  • And we are here today to talk about production pipelines, ML

  • pipelines.

  • So we're not going to be talking about ML modeling

  • too much or different architectures.

  • This is really all focused about when you have a model

  • and you want to put it into production so that you can

  • offer a product or a service or some internal service

  • within your company, and it's something

  • that you need to maintain over the lifetime

  • of that deployment.

  • So normally when we think about ML,

  • we think about modeling code, because it's

  • the heart of what we do.

  • Modeling and the results that we get from the amazing models

  • that we're producing these days, that's

  • the reason we're all here, the results we can produce.

  • It's what papers are written about, for the most part,

  • overwhelmingly.

  • The majority are written about architectures and results

  • and different approaches to doing ML.

  • It's great stuff.

  • I love it.

  • I'm sure you do too.

  • But when you move to putting something into production,

  • you discover that there are a lot of other pieces

  • that are very important to making that model that you

  • spent a lot of time putting together

  • available and robust over the lifetime of a product

  • or a service that you're going to offer out to the world

  • so that they can experience really

  • the benefits of the model that you've worked on.

  • And those pieces are what TFX is all about.

  • In machine learning, we're familiar with a lot

  • of the issues that we have to deal with,

  • things like where do I get labeled data.

  • How do I generate the labels for the data that I have.

  • I may have terabytes of data, but I need labels for them?

  • Does my label cover the feature space

  • that I'm going to see when I actually

  • run inference against it?

  • Is my dimensionality-- is it minimized?

  • Or can I do more to try to simplify

  • my set, my feature vector, to make my model more efficient?

  • Have I got really the predictive information in the data

  • that I'm choosing?

  • And then we need to think about fairness as well.

  • Are we are we serving all of the customers

  • that we're trying to serve fairly, no matter where they

  • are, or what religion they are, what language they speak,

  • what demographic they might be because you

  • want to serve those people as well as you can?

  • You don't want to unfairly disadvantage people.

  • And we may have rare conditions too, especially in things

  • like health care where we're making

  • a prediction that's going to be pretty important

  • to someone's life.

  • And it maybe on a condition that occurs very rarely.

  • But a big one when you go into production

  • is understanding the data lifecycle.

  • Because once you've gone through that initial training

  • and you've put something into production,

  • that's just the start of the process.

  • You're now going to try to maintain that over a lifetime,

  • and the world changes.

  • Your data changes.

  • Conditions in your domain change.

  • Along with that, you're doing now production software

  • deployment.

  • So you have all of the normal things

  • that you have to deal with any software deployment, things

  • like scalability.

  • Will I need to scale up?

  • Is my solution ready to do that?

  • Can I extend it?

  • Is it something that I can build on?

  • Modularity, best practices, testability.

  • How do I test an ML solution?

  • And security and safety, because we

  • know there are attacks for ML models

  • that are getting pretty sophisticated these days.

  • Google created TFX for us to use.

  • We created it because we needed it.

  • It was not the first production ML framework that we developed.

  • We've actually learned over many years

  • because we have ML all over Google

  • taking in billions of inference requests

  • really on a planet scale.

  • And we needed something that would

  • be maintainable and usable at a very large production

  • scale with large data sets and large loads over a lifetime.

  • So TFX has evolved from earlier attempts.

  • And it is now what most of the products and services at Google

  • use.

  • And now we're also making it available to the world

  • as an open-source product available to you now

  • to use for your production deployments.

  • It's also used by several of our partners

  • and just companies that have adopted TFX.

  • You may have heard talks from some of these at the conference

  • already.

  • And there's a nice quote there from Twitter,

  • where they did an evaluation.

  • They were coming from a Torch-based environment,

  • looked at the whole suite or the whole ecosystem of TensorFlow,

  • and moved everything that they did to TensorFlow.

  • One of the big contributors to that

  • was the availability of TFX.

  • The vision is to provide a platform for everyone to use.

  • Along with that, there's some best practices and approaches

  • that we're trying to really make popular in the world, things

  • like strongly-typed artifacts so that when

  • your different components produce artifacts

  • they have a strong type.

  • Pipeline configuration, workflow execution,

  • being able to deploy on different platforms,

  • different distributed pipeline platforms using

  • different orchestrators, different underlying execution

  • engines--

  • trying to make that as flexible as possible.

  • There are some horizontal layers that

  • tie together the different components in TFX.

  • And we'll talk about components here in a little bit.

  • And we have a demo as well that will show you some of the code

  • and some of the components that we're talking about.

  • The horizontal layers-- an important one there is metadata

  • storage .

  • So each of the components produce and consume artifacts.

  • You want to be able to store those.

  • And you may want to do comparisons across months

  • or years to see how did things change, because change becomes

  • a central theme of what you're going to do in a production

  • deployment.

  • This is a conceptual look at the different parts of TFX.

  • On the top, we have tasks--

  • a conceptual look at tasks.

  • So things like ingesting data or training a model

  • or serving the model.

  • Below that, we have libraries that are available, again,

  • as open-source components that you can leverage.

  • They're leveraged by the components within TFX

  • to do much of what they do.

  • And on the bottom row in orange, and a good color for Halloween,

  • we have the TFX components.

  • And we're going to get into some detail about how your data will

  • flow through the TFX pipeline to go from ingesting data

  • to a finished trained model on the other side.

  • So what is a component?

  • A component has three parts.

  • This is a particular component, but it could be any of them.

  • Two of those parts, the driver and publisher,

  • are largely boilerplate code that you could change.

  • You probably won't.

  • A driver consumes artifacts and begins the execution

  • of your component.

  • A publisher takes the output from the component,

  • puts it back into metadata.

  • The executor is really where the work is

  • done in each of the components.

  • And that's also a part that you can change.

  • So you can take an existing component,

  • override the executor in it, and produce

  • a completely different component that

  • does completely different processing.

  • Each of the components has a configuration.

  • And for TFX, that configuration is written in Python.

  • And it's usually fairly simple.

  • Some of the components are a little more complex.

  • But most of them are just a couple of lines of code

  • to configure.

  • The key essential aspect here that I've alluded to

  • is that there is a metadata store.

  • The component will pull data from that store

  • as it becomes available.

  • So there's a set of dependencies that determine which artifacts