Subtitles section Play video
-
NICK: Hi, everyone.
-
My name is Nick.
-
I am a engineer on the TensorBoard team.
-
And I'm here today to talk about TensorBoard and Summaries.
-
So first off, just an outline of what I'll be talking about.
-
First, I'll give an overview of TensorBoard, what it is
-
and how it works, just mostly sort of as background.
-
Then I'll talk for a bit about the tf.summary APIs.
-
In particular, how they've evolved from TF 1.x to TF 2.0.
-
And then finally, I'll talk a little bit
-
about the summary data format, log directories, event files,
-
some best practices and tips.
-
So let's go ahead and get started.
-
So TensorBoard-- hopefully, most of you
-
have heard of TensorBoard.
-
If you haven't, it's the visualization toolkit
-
for TensorFlow.
-
That's a picture of the web UI on the right.
-
Typically, you run this from the command line as the TensorBoard
-
command.
-
It prints out a URL.
-
You view it in your browser.
-
And from there on, you have a bunch of different controls
-
and visualizations.
-
And the sort of key selling point of TensorBoard
-
is that it provides cool visualizations out of the box,
-
without a lot of extra work.
-
you basically can just run it on your data
-
and get a bunch of different kinds of tools
-
and different sort of analyses you can do.
-
So let's dive into the parts of TensorBoard
-
from the user perspective a little bit.
-
First off, there's multiple dashboards.
-
So we have this sort of tabs setup
-
with dashboards across the top.
-
In the screenshot, it shows the scalers dashboard, which
-
is kind of the default one.
-
But there's also dashboards for images, histogram, graphs,
-
a whole bunch more are being added every month almost.
-
And one thing that many of the dashboards have in common
-
is this ability to sort of slice and dice
-
your data by run and by tag.
-
And a run, you can think of that as a sign
-
of a run of your TensorFlow program,
-
or your TensorFlow job.
-
And a tag corresponds to a specific named metric,
-
or a piece of summary data.
-
So here, the runs, we have a train
-
and evolve run on the lower left corner in the run selector.
-
And then we have different tags, including the cross [INAUDIBLE]
-
tag is the one being visualized.
-
And one more thing I'll mention is that one thing a lot
-
of TensorBoard emphasizes is seeing how
-
your data changes over time.
-
So most of the data takes the form of a time series.
-
And in this case, with the scalers dashboard,
-
the time series is sort of as a step count across the x-axis.
-
So we might ask, what's going on behind the scenes
-
to make this all come together?
-
And so here is our architecture diagram for TensorBoard.
-
We'll start over on the left with your TensorFlow job.
-
It writes data to disk using the tf.summary API.
-
And we'll talk both about the summary API and the event file
-
format a little more later.
-
Then the center component is TensorBoard itself.
-
We have a background thread that loads event file data.
-
And because the event file data itself
-
isn't efficient for querying, we construct a subsample
-
of the data and memory that we can query more efficiently.
-
And then the rest so TensorBoard is a web server that
-
has a plugin architecture.
-
So each dashboard on the frontend--
-
as a backend, it has a specific plugin
-
backend So for example, the scalers dashboard talks
-
to a scalers backend, images to an image backend.
-
And this allows the backends to do pre-processing or otherwise
-
structure the data in an appropriate way
-
for the frontend to display.
-
And then each plugin has a frontend dashboard component,
-
which are all compiled together by TensorBoard
-
and served as a single page and index.html.
-
And that page communicates back and forth through the backends
-
through standard HTTP requests.
-
And then finally, hopefully, we have our happy user
-
on the other end seeing their data,
-
analyzing it, getting useful insights.
-
And I'll talk a little more about just some details
-
about the frontend.
-
The front end is built on the Polymer web component
-
framework, where you define custom elements.
-
So the entirety of TensorBoard is one large custom element,
-
tf-tensorboard.
-
But that's just the top.
-
From there on, each plugin front end is--
-
each dashboard is its own frontend component.
-
For example, there's a tf-scaler dashboard.
-
And then all the way down to shared components
-
for more basic UI elements.
-
So we can think of this as a button, or a selector,
-
or a card element, or a collapsible pane.
-
And these components are shared across many of the dashboards.
-
And that's one of the key ways in which TensorBoard
-
achieves what is hopefully a somewhat uniform look
-
and feel from dashboard to dashboard.
-
The actual logic for these components
-
is implemented in JavaScript.
-
Some of that's actually TypeScript
-
that we compile to JavaScript.
-
Especially the more complicated visualizations,
-
TypeScript helps build them up as libraries
-
without having to worry about some of the pitfalls
-
you might get writing them in pure JavaScript.
-
And then the actual visualizations
-
are a mix of different implementations.
-
Many of them use Plottable, which
-
is a wrapper library over the D3, the standard JavaScript
-
visualization library.
-
Some of them use native D3.
-
And then for some of the more complex visualizations,
-
there are libraries that do some of the heavy lifting.
-
So the graph visualization, for example,
-
uses a directed graph library to do layout.
-
The projector uses a WebGL wrapper library
-
to do the 3D visualizations.
-
And the recently introduced What-If Tool plugin
-
uses the facets library from [INAUDIBLE] folks.
-
So we bring a whole bunch of different visualization
-
technologies together under one TensorBoard umbrella
-
is how you can think about the frontend.
-
So now that we have a overview of TensorBoard itself,
-
I'll talk about how your data actually gets to TensorBoard.
-
So how do you unlock all of this functionality?
-
And the spoiler announcement to that is the tf.summary API.
-
So to summarize the summary API, you
-
can think of it as structured logging for your model.
-
The goal is really to make it easy to instrument your model
-
code.
-
So to allow you to log metrics, weights,
-
details about predictions, input data, performance metrics,
-
pretty much anything that you might want to instrument.
-
And you can log these all, save them
-
to disk for later analysis.
-
And you won't necessarily always be calling the summary API
-
directly.
-
Some frameworks call the summary API for you.
-
So for examples, estimator has the summary saver hook.
-
Keras has a TensorBoard callback,
-
which takes care of some of the nitty gritty.
-
But underlying that is still the summary API.
-
So most data gets to TensorBoard in this way.
-
There are some exceptions.
-
Some dashboards have different data flows.
-
The debugger is a good example of this.
-
The debugger dashboard integrates with tfdbg.
-
It has a separate back channel that it uses
-
to communicate information.
-
It doesn't use the summary API.
-
But many of the commonly used dashboards do.
-
And so the summary API actually has sort of--
-
there's several variations.
-
And when talking about the variations,
-
it's useful to think of the API as having two basic halves.
-
On one half we have the instrumentation surface.
-
So these logging these are like logging
-
ops that you place in your model code.
-
They're pretty familiar to people
-
who have used the summary API, things like scaler, histogram,
-
image.
-
And then the other half of the summary API
-
is about writing that log data to disk.
-
And creating a specially formatted log
-
file which TensorBoard can read and extract the data from.
-
And so, just to give a sense of how those relate
-
to the different versions, there's
-
four variations of the summary API from TF 1.x to 2.0.
-
And the two key dimensions on which they vary
-
are the instrumentation side and the writing side.
-
And we'll go into this in more detail.
-
But first off, let's start with the most familiar summary
-
API from TF 1.x.
-
So just as a review-- again, if you've
-
used the summary API before, this will look familiar.
-
But this is kind of a code sample
-
of using the summary API 1.x.
-
The instrumentation ops, like scaler, actually output summary
-
protos directly.
-
And then those are merged together
-
by a merge all op that generates a combined proto output.
-
The combined output, you can fetch using session dot run.
-
And then, that output, you can write to a File Writer
-
for a particular log directory using
-
this add summary call that takes the summary proto itself
-
and also a step.
-
So this is, in a nutshell, the flow
-
for TF 1.x summary writing.
-
There's some limitations to this, which
-
I'll describe in two parts.
-
The first set of limitations has to do with the kinds of data
-
types that we can support.
-
So in TF 1.x, there's a fixed set of data types.
-
And adding new ones is a little involved.
-
It requires changes to TensorFlow in terms of you
-
would need a new proto definition field.
-
You'd need a new op definition, a new kernel, and a new Python
-
API symbol.
-
And this is a barrier to sensibility
-
for adding new data types to support new TensorBoard
-
plugins.
-
It's led people to do creative workarounds.
-
For example, like rendering a matplotlib plot
-
in your training code.
-
And then logging it as an image summary.
-
And the prompt here is, what if we instead
-
had a single op or a set of ops that could
-
generalize across data formats?
-
And this brings us to our first variation.
-
Which is the TensorBoard summary API,
-
where we try and make this extensible to new data types.
-
And the TensorBoard API, the mechanism
-
here is that we use the tensor itself as a generic data
-
container.
-
Which can correspond to--
-
for example, we can represent a histogram, an image,
-
scaler itself.
-
We can represent these all in certain formats as tensors.
-
And what this lets us do is use a shared tensor summary API
-
with some metadata that we can use
-
to describe the tensor format for our one place
-
to send summary data.
-
So TensorBoard.summary, the principle it takes
-
is actually that you can reimplement the tf.summary ops
-
and APIs as Python logic to call TensorFlow
-
ops for pre-processing and then a call to tensor summary.
-
And this is a win in the sense that you no longer need
-
individual C++ kernels and proto fields for each individual data
-
type.
-
So the TensorBoard plugins today actually do this.
-
They have for a while.
-
They have their own summary ops defined in TensorBoard.
-
And the result of this has been that for a new TensorBoard
-
plugins, where this is the only option,
-
there's been quite a bit of uptake.
-
For example, the pr_curve plugin has a pr_curve summary.
-
And that's the main route people use.
-
But for existing data types, there
-
isn't really much reason to stop using tf.summary.
-
And so, for those, it makes sense.
-
That's been what people have used.
-
But then tf.summary, it still has some other limitations.
-
And so that's what we're going to look at next.
-
So the second set of limitations in tf.summary
-
is around this requirement that the summary data
-
flows through the graph itself.
-
So merge_all uses the hidden graph collection essentially
-
to achieve the effect to the user
-
as though your summary ops have side effects of writing data.