Placeholder Image

Subtitles section Play video

  • Hi, I'm Robert Crowe,

  • and today I'm going to be talking about TensorFlow Extended,

  • also known as TFX,

  • and how it helps you put your amazing machine-learning models

  • into production.

  • This is Episode 2 of our five-part series

  • on "Real World Machine Learning in Production."

  • We covered a lot in Episode 1, so, if you haven't seen that yet,

  • I'd really recommend watching it.

  • In today's episode, we'll be asking the question,

  • "How do these pipeline things work?"

  • Let's find out.

  • TFX pipelines are created as a sequence of components,

  • each of which performs a different task.

  • Components are organized into directed acyclic graphs, or "DAGs."

  • But what exactly is a component?

  • A TFX component has three main parts:

  • a driver, an executor, and a publisher.

  • Two of these parts, the driver and publisher,

  • are mostly boilerplate code that you could change

  • but probably will never need to.

  • Where you insert your code and do your customization

  • is really in the executor.

  • The driver handles coordinating job execution

  • and feeding data to the executor.

  • The publisher takes the results of your executor

  • and updates the metadata store,

  • which we'll talk about more in the next episode.

  • But the executor is really where the work is done

  • for each of the components.

  • So, first, we need a configuration for our component,

  • and with TFX, that configuration is done using Python.

  • Next, we need some input for our component

  • and a place to send our results.

  • That's where the metadata store comes in.

  • We'll talk more about the metadata store in our next episode,

  • but, for now, just be aware that, for most components,

  • the input will come from the metadata store

  • and the result will be written back to the metadata store.

  • So, as our data moves through the pipeline,

  • components will read metadata

  • that was produced by an earlier component

  • and write metadata that will probably be used

  • by a component farther down the pipeline.

  • There are some exceptions,

  • like at the beginning and end of the pipeline,

  • but, for the most part,

  • that's how data flows through a TFX pipeline.

  • To organize all these components and manage these pipelines,

  • we need orchestration.

  • But what is orchestration, exactly, and how does it help us?

  • If all that you need to do is kick off the next stage of the pipeline,

  • task-aware architectures are enough.

  • You can simply start the next component

  • as soon as the previous component finishes.

  • But a task- and data-aware architecture is much more powerful,

  • and really almost a requirement for any production system,

  • because it stores all the artifacts of every component

  • over many executions.

  • Having that metadata creates a much more powerful pipeline

  • and enables a lot of things

  • which would otherwise be very difficult.

  • So TFX implements a task- and data-aware pipeline architecture.

  • We'll be discussing that in detail in the next episode,

  • so stay tuned.

  • To put an ML pipeline together,

  • define the sequence of components that make up the pipeline,

  • and manage their execution,

  • we need an orchestrator.

  • An orchestrator provides a management interface

  • that we can use to trigger tasks and monitor our components.

  • One of the ways that TFX is open and extendable is with orchestration.

  • We provide support for Apache Airflow and Kubeflow

  • out of the box.

  • But you can write code to use a different orchestrator if you need to.

  • So if you've already got an orchestrator that you like,

  • you can use it with TFX.

  • We don't force you to change.

  • Here's what a TFX DAG, or directed acyclic graph, looks like

  • in two different orchestrators-- Airflow and Kubeflow.

  • It's the same DAG,

  • just two slightly different ways of displaying it.

  • In our next episode,

  • we'll discuss the role of metadata

  • and how it helps us create much more powerful pipelines.

  • For more information on TFX,

  • visit us at tensorflow.org/tfx,

  • and don't forget to comment and like us below,

  • and thanks for watching.

  • ♪ (music) ♪

Hi, I'm Robert Crowe,

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it