Subtitles section Play video
SPEAKER: A machine learning workflow
can involve many steps, from data prep to training
to evaluation and more.
It's hard to track these in an ad hoc manner,
like in a set of notebooks or scripts.
On top of that, monitoring and version tracking
can become a challenge.
Kubeflow Pipelines lets data scientists codify their ML
workflows so that they're easily composable, shareable,
and reproducible.
Let's check out how it can help you achieve
ML engineering best practices.
[MUSIC PLAYING]
If you want to check out the documentation directly,
check out the link below to read more about Kubeflow Pipelines.
Kubeflow Pipelines help drive data scientists
to adopt a disciplined pipeline mindset when developing ML code
and scaling it up to the cloud.
It's a Kubernetes native solution
that helps you with a number of things,
like simplifying the orchestration of machine
learning pipelines, making experimentation easy
for you to try ideas, reproduce runs, and share pipelines.
And you can stitch together and reuse components and pipelines
to quickly create end-to-end solutions
without having to rebuild every time, like building blocks.
It also comes with framework support
for things like execution monitoring, workflow
scheduling, metadata logging, and versioning.
Kubeflow Pipelines is one of the Kubeflow core components.
It's automatically deployed during Kubeflow deployment.
Now, there are many ways of defining pipelines
when it comes to data science.
But for Kubeflow, a pipeline is a description
of an ML workflow.
Under the hood, it runs on containers,
which provide portability, repeatability,
and encapsulation, because it decouples the execution
environment from your code runtime.
When you break a pipeline down, it
includes all the components in the workflow
and how they combine, and that makes a graph.
It includes the definition of all the inputs or parameters
needed to run the pipeline.
A pipeline component is one step in the workflow that
does a specific task, which means it takes inputs and can
produce outputs.
An output of a component can become the input
of other components and so forth.
Think of it like a function, in that it has a name, parameters,
return values, and a body.
For example, a component can be responsible for data
preprocessing, data transformation, model training,
and so on.
Now, this is where Kubeflow Pipelines shines.
A pipeline component is made up of code, packaged
as a docker image, and that performs
one step in the pipeline.
That's right.
The system launches one or more Kubernetes pods corresponding
to each step in your pipeline.
You can also leverage prebuilt components found
on the Kubeflow GitHub page.
Under the hood, the pods start docker containers,
and the containers start your programs.
Containers can only give you composability and scalability,
but this means your teams can focus
on one aspect of the pipeline at a time.
While you can use the Kubeflow Pipelines
SDK to programmatically upload pipelines
and launch pipeline runs--
for example, directly from a notebook--
you can also work with pipelines via the Kubeflow UI.
That way you can leverage some of its powerful features,
like visualizing the pipeline through a graph.
As you execute a run, the graph shows the relationships
between pipelines and their status.
Once a step completes, you'll also see the output artifact
in the UI.
As you can see here, the UI takes the output
and can actually render it as a rich visualization.
You can also get statistics on the performance
of the model for performance evaluation,
quick decision-making, or comparison
across different runs.
And we can't forget about the experiments in the UI,
which let you group a few of your pipeline runs
and test different configurations
of your pipelines.
You can compare the results between experiments
and even schedule recurring runs.
Because they're built on the flexibility of the containers,
Kubeflow Pipelines are useful for all sorts of tasks,
like ETL and CI/CD, but they're most popularly used
for ML workflows.
While you can deploy it on your own installation of Kubeflow,
a new hosted version of Pipelines on Google Cloud's AI
platform lets you deploy a standalone version
of Kubeflow Pipelines on a GKE cluster in just a few clicks.
You can start using it by checking out
the AI platform section in the Google Cloud console.
Kubeflow Pipelines handles the orchestration of ML workflows
and hides the complexities of containers as much as possible.
You get continuous training and production,
automatic tracking of metadata, and reusable ML components.
You can clone and iterate on Pipelines and leverage
the power of the UI to visualize and compare models.
Stay tuned to learn more about what you can do with Kubeflow.
[MUSIC PLAYING]