Placeholder Image

Subtitles section Play video

  • NICK: Hi, everyone.

  • My name is Nick.

  • I am a engineer on the TensorBoard team.

  • And I'm here today to talk about TensorBoard and Summaries.

  • So first off, just an outline of what I'll be talking about.

  • First, I'll give an overview of TensorBoard, what it is

  • and how it works, just mostly sort of as background.

  • Then I'll talk for a bit about the tf.summary APIs.

  • In particular, how they've evolved from TF 1.x to TF 2.0.

  • And then finally, I'll talk a little bit

  • about the summary data format, log directories, event files,

  • some best practices and tips.

  • So let's go ahead and get started.

  • So TensorBoard-- hopefully, most of you

  • have heard of TensorBoard.

  • If you haven't, it's the visualization toolkit

  • for TensorFlow.

  • That's a picture of the web UI on the right.

  • Typically, you run this from the command line as the TensorBoard

  • command.

  • It prints out a URL.

  • You view it in your browser.

  • And from there on, you have a bunch of different controls

  • and visualizations.

  • And the sort of key selling point of TensorBoard

  • is that it provides cool visualizations out of the box,

  • without a lot of extra work.

  • you basically can just run it on your data

  • and get a bunch of different kinds of tools

  • and different sort of analyses you can do.

  • So let's dive into the parts of TensorBoard

  • from the user perspective a little bit.

  • First off, there's multiple dashboards.

  • So we have this sort of tabs setup

  • with dashboards across the top.

  • In the screenshot, it shows the scalers dashboard, which

  • is kind of the default one.

  • But there's also dashboards for images, histogram, graphs,

  • a whole bunch more are being added every month almost.

  • And one thing that many of the dashboards have in common

  • is this ability to sort of slice and dice

  • your data by run and by tag.

  • And a run, you can think of that as a sign

  • of a run of your TensorFlow program,

  • or your TensorFlow job.

  • And a tag corresponds to a specific named metric,

  • or a piece of summary data.

  • So here, the runs, we have a train

  • and evolve run on the lower left corner in the run selector.

  • And then we have different tags, including the cross [INAUDIBLE]

  • tag is the one being visualized.

  • And one more thing I'll mention is that one thing a lot

  • of TensorBoard emphasizes is seeing how

  • your data changes over time.

  • So most of the data takes the form of a time series.

  • And in this case, with the scalers dashboard,

  • the time series is sort of as a step count across the x-axis.

  • So we might ask, what's going on behind the scenes

  • to make this all come together?

  • And so here is our architecture diagram for TensorBoard.

  • We'll start over on the left with your TensorFlow job.

  • It writes data to disk using the tf.summary API.

  • And we'll talk both about the summary API and the event file

  • format a little more later.

  • Then the center component is TensorBoard itself.

  • We have a background thread that loads event file data.

  • And because the event file data itself

  • isn't efficient for querying, we construct a subsample

  • of the data and memory that we can query more efficiently.

  • And then the rest so TensorBoard is a web server that

  • has a plugin architecture.

  • So each dashboard on the frontend--

  • as a backend, it has a specific plugin

  • backend So for example, the scalers dashboard talks

  • to a scalers backend, images to an image backend.

  • And this allows the backends to do pre-processing or otherwise

  • structure the data in an appropriate way

  • for the frontend to display.

  • And then each plugin has a frontend dashboard component,

  • which are all compiled together by TensorBoard

  • and served as a single page and index.html.

  • And that page communicates back and forth through the backends

  • through standard HTTP requests.

  • And then finally, hopefully, we have our happy user

  • on the other end seeing their data,

  • analyzing it, getting useful insights.

  • And I'll talk a little more about just some details

  • about the frontend.

  • The front end is built on the Polymer web component

  • framework, where you define custom elements.

  • So the entirety of TensorBoard is one large custom element,

  • tf-tensorboard.

  • But that's just the top.

  • From there on, each plugin front end is--

  • each dashboard is its own frontend component.

  • For example, there's a tf-scaler dashboard.

  • And then all the way down to shared components

  • for more basic UI elements.

  • So we can think of this as a button, or a selector,

  • or a card element, or a collapsible pane.

  • And these components are shared across many of the dashboards.

  • And that's one of the key ways in which TensorBoard

  • achieves what is hopefully a somewhat uniform look

  • and feel from dashboard to dashboard.

  • The actual logic for these components

  • is implemented in JavaScript.

  • Some of that's actually TypeScript

  • that we compile to JavaScript.

  • Especially the more complicated visualizations,

  • TypeScript helps build them up as libraries

  • without having to worry about some of the pitfalls

  • you might get writing them in pure JavaScript.

  • And then the actual visualizations

  • are a mix of different implementations.

  • Many of them use Plottable, which

  • is a wrapper library over the D3, the standard JavaScript

  • visualization library.

  • Some of them use native D3.

  • And then for some of the more complex visualizations,

  • there are libraries that do some of the heavy lifting.

  • So the graph visualization, for example,

  • uses a directed graph library to do layout.

  • The projector uses a WebGL wrapper library

  • to do the 3D visualizations.

  • And the recently introduced What-If Tool plugin

  • uses the facets library from [INAUDIBLE] folks.

  • So we bring a whole bunch of different visualization

  • technologies together under one TensorBoard umbrella

  • is how you can think about the frontend.

  • So now that we have a overview of TensorBoard itself,

  • I'll talk about how your data actually gets to TensorBoard.

  • So how do you unlock all of this functionality?

  • And the spoiler announcement to that is the tf.summary API.

  • So to summarize the summary API, you

  • can think of it as structured logging for your model.

  • The goal is really to make it easy to instrument your model

  • code.

  • So to allow you to log metrics, weights,

  • details about predictions, input data, performance metrics,

  • pretty much anything that you might want to instrument.

  • And you can log these all, save them

  • to disk for later analysis.

  • And you won't necessarily always be calling the summary API

  • directly.

  • Some frameworks call the summary API for you.

  • So for examples, estimator has the summary saver hook.

  • Keras has a TensorBoard callback,

  • which takes care of some of the nitty gritty.

  • But underlying that is still the summary API.

  • So most data gets to TensorBoard in this way.

  • There are some exceptions.

  • Some dashboards have different data flows.

  • The debugger is a good example of this.

  • The debugger dashboard integrates with tfdbg.

  • It has a separate back channel that it uses

  • to communicate information.

  • It doesn't use the summary API.

  • But many of the commonly used dashboards do.

  • And so the summary API actually has sort of--

  • there's several variations.

  • And when talking about the variations,

  • it's useful to think of the API as having two basic halves.

  • On one half we have the instrumentation surface.

  • So these logging these are like logging

  • ops that you place in your model code.

  • They're pretty familiar to people

  • who have used the summary API, things like scaler, histogram,

  • image.

  • And then the other half of the summary API

  • is about writing that log data to disk.

  • And creating a specially formatted log

  • file which TensorBoard can read and extract the data from.

  • And so, just to give a sense of how those relate

  • to the different versions, there's

  • four variations of the summary API from TF 1.x to 2.0.

  • And the two key dimensions on which they vary

  • are the instrumentation side and the writing side.

  • And we'll go into this in more detail.

  • But first off, let's start with the most familiar summary

  • API from TF 1.x.

  • So just as a review-- again, if you've

  • used the summary API before, this will look familiar.

  • But this is kind of a code sample

  • of using the summary API 1.x.

  • The instrumentation ops, like scaler, actually output summary

  • protos directly.

  • And then those are merged together

  • by a merge all op that generates a combined proto output.

  • The combined output, you can fetch using session dot run.

  • And then, that output, you can write to a File Writer

  • for a particular log directory using

  • this add summary call that takes the summary proto itself

  • and also a step.

  • So this is, in a nutshell, the flow

  • for TF 1.x summary writing.

  • There's some limitations to this, which

  • I'll describe in two parts.

  • The first set of limitations has to do with the kinds of data

  • types that we can support.

  • So in TF 1.x, there's a fixed set of data types.

  • And adding new ones is a little involved.

  • It requires changes to TensorFlow in terms of you

  • would need a new proto definition field.

  • You'd need a new op definition, a new kernel, and a new Python

  • API symbol.

  • And this is a barrier to sensibility

  • for adding new data types to support new TensorBoard

  • plugins.

  • It's led people to do creative workarounds.

  • For example, like rendering a matplotlib plot

  • in your training code.

  • And then logging it as an image summary.

  • And the prompt here is, what if we instead

  • had a single op or a set of ops that could

  • generalize across data formats?

  • And this brings us to our first variation.

  • Which is the TensorBoard summary API,

  • where we try and make this extensible to new data types.

  • And the TensorBoard API, the mechanism

  • here is that we use the tensor itself as a generic data

  • container.

  • Which can correspond to--

  • for example, we can represent a histogram, an image,

  • scaler itself.

  • We can represent these all in certain formats as tensors.

  • And what this lets us do is use a shared tensor summary API

  • with some metadata that we can use

  • to describe the tensor format for our one place

  • to send summary data.

  • So TensorBoard.summary, the principle it takes

  • is actually that you can reimplement the tf.summary ops

  • and APIs as Python logic to call TensorFlow

  • ops for pre-processing and then a call to tensor summary.

  • And this is a win in the sense that you no longer need

  • individual C++ kernels and proto fields for each individual data

  • type.

  • So the TensorBoard plugins today actually do this.

  • They have for a while.

  • They have their own summary ops defined in TensorBoard.

  • And the result of this has been that for a new TensorBoard

  • plugins, where this is the only option,

  • there's been quite a bit of uptake.

  • For example, the pr_curve plugin has a pr_curve summary.

  • And that's the main route people use.

  • But for existing data types, there

  • isn't really much reason to stop using tf.summary.

  • And so, for those, it makes sense.

  • That's been what people have used.

  • But then tf.summary, it still has some other limitations.

  • And so that's what we're going to look at next.

  • So the second set of limitations in tf.summary

  • is around this requirement that the summary data

  • flows through the graph itself.

  • So merge_all uses the hidden graph collection essentially

  • to achieve the effect to the user

  • as though your summary ops have side effects of writing data.