Placeholder Image

Subtitles section Play video

  • [LOGO MUSIC]

  • KONSTANTINOS KATSIAPIS: Hello everyone.

  • My name is Gus Katsiapis.

  • And together with my colleagues Kevin Haas and Tulsee Doshi,

  • we will talk to you today about TensorFlow Extended,

  • and the topic covers two areas.

  • Let's go into that

  • I think most of you here that have used ML in the real world

  • realize that machine learning is a lot more than just a model.

  • It's a lot more than just training a model.

  • And especially, when machine learning powers your business

  • or powers a product that actually affects your users,

  • you absolutely need the reliability.

  • So, today, we'll talk about how, in the face

  • of massive data and the real world,

  • how do you build applications that

  • use machine learning that are robust to the world

  • that they operate in?

  • And today's talk will actually have two parts.

  • The first part will be about TensorFlow Extended, otherwise

  • known as TFX.

  • This is an end-to-end machine learning platform.

  • And the second part of the talk will

  • talk about model understanding and how you can actually

  • get insights into your business by understanding

  • how your model performs in real-world situations.

  • OK.

  • So let's get started with the first part, TensorFlow

  • Extended, otherwise known as TFX.

  • We built TFX at Google.

  • We started building it approximately two and a half

  • years ago.

  • And a lot of the knowledge that went into building TFX actually

  • came from experience we had building other machine

  • learning platforms within Google that preceded it,

  • that preceded TensorFlow even.

  • So TFX has had a profound impact to Google,

  • and it's used throughout several Alphabet companies

  • and also in several products within Google itself.

  • And several of those products that you can see here--

  • like Gmail or Add, et cetera, or YouTube--

  • have pretty large scale.

  • So they affect billions of users, et cetera.

  • So this is one more reason for us

  • to pay extra attention to building systems that

  • use ML reliably.

  • Now, when we started building TFX,

  • we had published a paper about it,

  • and we promised we would eventually

  • make it available to the rest of the world.

  • So over the last few years, we've

  • been open sourcing aspects of it, and several of our partners

  • externally have actually been pretty successful

  • deploying this technology, several of the libraries we

  • have offered over time.

  • So just to call out an interesting case study

  • that Twitter made, they actually made a fascinating blog post

  • where they spoke about how they ranked tweets with TensorFlow

  • and how they used TensorFlow Hub in order

  • to do transfer learning and shared word embeddings

  • and share them within their organization.

  • And they also showcased how they use TensorFlow Model Analysis

  • in order to have a better understanding of their model--

  • how their model performs not just globally

  • over the population, but several slices of their population

  • that were important to the business.

  • So we'll be talking about more of this

  • later, especially with the model understanding talk.

  • Now I think most of you here are either software developers

  • or software engineers or are very much

  • familiar with software processes and technologies.

  • So I think most of you probably recognize

  • several of the themes presented in this slide,

  • like scalability, extensibility, modularity, et cetera.

  • But, my conjecture is that most people think

  • about those concepts in terms of code and how to build software.

  • Now, with the advent of machine learning,

  • we are building applications that

  • are powered by machine learning, which

  • means that those applications are powered by data--

  • fundamentally are powered by data.

  • So if you just think about code and you don't think about data,

  • you're only thinking about half of the picture,

  • 50% of the picture.

  • So you can optimize one amazingly.

  • But if you don't think about the other half,

  • you cannot be better than the half--

  • than the half itself.

  • So I would actually encourage everyone just

  • to take each of those concepts and see,

  • how does this concept apply to data as opposed to just code?

  • And if you can apply these concepts to both data and code,

  • you can build a holistic application

  • that is actually robust and powers your products.

  • So we will actually go into each of those

  • individually and see how they apply to machine learning.

  • OK.

  • Let's start with scalability.

  • Most of you know that, when you start your business,

  • it might be small.

  • But the reality is that, as your business grows,

  • so might your data.

  • So, ideally, you want a solution that

  • is able to over time scale together with your business.

  • Ideally, you would be able to write a single library

  • or software-- piece of software--

  • and that's also could operate on a laptop

  • because you want to experiment quickly,

  • but it could also operate on a beefy machine

  • with tons of processors or a ton of accelerators.

  • And you could also scale it over hundreds or thousands

  • of machines if you need to.

  • So this flexibility in terms of scale is quite important.

  • And ideally, each time you hop from kilobytes to megabytes

  • to gigabytes to terabytes, ideally

  • you wouldn't have to use different tools because you

  • have a huge learning curve each time you change your technology

  • under the covers.

  • So the ideal here is to have a machine learning platform that

  • is able to work on your laptop but can also scale it

  • on any cloud you would like.

  • OK.

  • Now, let's talk about accessibility.

  • So everyone here understands that you

  • can have libraries and components that make up

  • your system, and you can have things

  • that work out of the box.

  • But, you always want to customize it a little bit

  • to meet your goals.

  • You always want to put custom business logic in some part

  • of your application.

  • And this is similar for machine learning.

  • So if you think about the concrete example,

  • when you fit data into machine learning model,

  • you need to do multiple transformations

  • to put the data in a format that the model expects.

  • So as a developer of an ML application,

  • you want to have the transformation flexibility

  • that an ML platform can provide to you--

  • whether that's bucketizing, creating vocabularies,

  • et cetera.

  • And that's just one example, but this applies pervasively

  • throughout the ML process.

  • OK.

  • Let's talk a little bit about modularity.

  • All of you probably understand the importance

  • of having nice APIs and reusable libraries that

  • allow you to build bigger and bigger systems.

  • But, going back to our original question,

  • how does this apply to artifacts produced by machine learning

  • pipelines?

  • How does this apply to data?

  • So ideally, I would be able to reuse the reusable components

  • of a model that was trained to recognize images and take

  • that part--

  • the reusable part of it--

  • and put it in my model that predicts kinds of chairs.

  • So ideally, we would be able to reuse parts of models as easy

  • as it would be to reuse libraries.

  • So check out TensorFlow Hub, which actually allows

  • you to reuse the reusable parts of machine learning models

  • and plug them into your own infrastructure.

  • And going a step further, how does this apply to artifacts?

  • So machine learning platforms usually

  • produce lots of data artifacts, whether that's

  • statistics about your data or something else.

  • And many times, those operate in a continuous fashion.

  • So data continuously arrives into the system,

  • and you have to continuously produce

  • models that mimic reality quickly,

  • that understand reality faster.

  • And, if you have to redo computation from scratch,

  • then it means that you can sometimes not follow real time.

  • So somehow you need to be able to take artifacts that