Subtitles section Play video Print subtitles [MUSIC PLAYING] CHRISTINA GREER: Hi. My name is Christina and I'm a software engineer on the Google Brain team. I'm here today to tell you about some tools that my team and I have built to help make the end-to-end lifecycle of the machine learning pipeline easier. I'm going to start by talking about model analysis and validation. These are two different components in TFX, but they are very similar in how they're actually executed. The main difference is how you as an end user will use them. I'm going to start by talking about the evaluator. So why is model evaluation important? Well, for one thing, we have gathered data. We've cleaned that data. We've trained a model. But we really want to make sure that model works. And so, model evaluation can help you assess the overall quality of your model. You also may want to analyze how your model is performing on specific slices of the data. So in this case, with the Chicago taxi example that Clemens started this off with, why are my tip predictions sometimes wrong? Slicing the data and looking at where you're doing poorly can be a real benefit, because it identifies some low hanging fruit where you can get gains in accuracy by adding more data or making some other changes to make some of these segments improve. You also want to track your performance over time. You're going to be continuously training models and updating them with fresh data, so that your models don't get stale. And you want to make sure that your metrics are improving over time and not regressing. And model evaluation can help you with all of this. The component of TFX that supports this is called the evaluator. And it is based on a library called TensorFlow Model Analysis. From the pipeline perspective, you have inputs, which is your eval set that was generated by your ExampleGen. You have the trainer outputting a saved model. You also need to specify the splits in your data that you find most interesting, so that the evaluator can precompute metrics for these slices of data. Your data then goes into the evaluator. And a process is run to generate metrics for the overall slice and the slices that you have specified. The output of the evaluator is evaluation metrics. This is a structured data format that has your data, the splits you specified, and the metrics that correspond to each one of these splits. The TensorFlow Model Analysis library also has a visualization tool that allows you to load up these metrics and dig around in your data in a user friendly way. So going back to our Chicago taxi example, you can see how the model evaluator can help you look at your top line objective. How well can you predict trips that result in large tips? The TFMA visualization shows the overall slice of data here. The numbers are probably small, but accuracy is 94.7%. That's pretty good. You'd get an A for that. But maybe you want to say 95%. 95% accuracy is a lot better number than 94, 94.7. So maybe you want to bump that up a bit. So then you can dig into why your tip predictions are sometimes wrong. We have sliced the data here by the hour of day that the trip starts on. And we've sorted by poor performance. When I look at this data, I see that trips that start, like, 2:00, 3:00 AM were performing quite poorly in these times. Because of the statistics generation tool that Clemens talked about, I do know that the data is sparse here. But if I didn't know that, perhaps I would think, maybe there's something that people that get taxis at 2:00 or 3:00 in the morning might have in common that causes erratic tipping behavior. Someone smarter than me is going to have to figure that one out. You also want to know if you can get better at predicting trips over time. So you are continuously training these models for new data, and you're hoping that you get better. So the TensorFlow Model Analysis tool that powers the evaluator and TFX can show you the trends of your metrics over time. And so here you see three different models and the performance over each with accuracy in AUC. Now I'm going to move on to talking about the ModelValidator component. With the evaluator, you were an active user. You generated the metrics. You loaded them up in the UI. You dug around in your data. You looked for issues that you could fix to improve your model. But eventually, you're going to iterate. Your data is going to get better. Your model's going to improve. And you're going to be ready to launch. You're also going to have a pipeline continuously feeding new data into this model. And every time you generate a new model with new data, you don't want to have to do a manual process of pushing this to a server somewhere. The ModelValidator component of TFX acts as a gate that keeps you from pushing bad versions of your model, while allowing you to automate pushing of quality models. So why model validation is important-- we really want to avoid pushing models with degraded quality, specifically in an automated fashion. If you train a model with new data and the performance drops, but say it increases in certain segments of the data that you really care about, maybe you make the judgment call that this is an improvement overall. So we'll launch it. But you don't want to do this automatically. You want to have some say before you do this. So this acts as your gatekeeper. You also want to avoid breaking downstream components. If your model suddenly started outputting something that your server binary couldn't handle, you'd want to know that also before you push. The TFX component that supports this is called the ModelValidator. It takes very similar inputs and outputs to the model evaluator. And the libraries that compute the metrics are pretty much the same underneath the hood. However, instead of one model, you provide two-- the new model that you're trying to evaluate and the last good evaluated model. It then runs on your if eval split data and compares the metrics on the same data between the two models. If your metrics have stayed the same or improved, then you go ahead and bless the model. If the metrics that you care about have degraded, you will not bless the model. Get some information about which metrics failed, so that you can do some further analysis. The outcome of this is a validation outcome. It just says blessed if everything went right. Another thing to note about the ModelValidator is that it allows you to do next day eval of your previously pushed model. So maybe the last model that you blessed, it was trained with old data. With the ModelValidator, it evaluates it on the new data. And finally, I'm going to talk about the pusher. The pusher is probably the simplest component in the entire TFX pipeline. But it does serve quite a useful purpose. It has one input, which is that blessing that you got from the ModelValidator. And then the output is if you passed your validation, then the pusher will copy your saved model into a file system destination that you've specified. And now you're ready to serve your model and make it useful to the world at large. I'm going to talk about model deployment next. So this is where we are. We have a trained SavedModel. A SavedModel is a universal serialization format for TensorFlow models. It contains your graph, your learned variable weights, your assets like embeddings and vocabs. But to you, this is just an implementation detail. Where you really want to be is you have an API. You have a server that you can query to get answers in real time or provide those answers to your users. We provide several deployment options. And many of them are going to be discussed at other talks in the session. TensorFlow.js is optimized for serving in the browser or on Node.js. TensorFlow Lite is optimized for mobile devices. We already heard a talk about how Google Assistant is using TensorFlow Lite to support model inference on their Google Home devices. TensorFlow Hub is something new. And Andre is going to come on in about five minutes and tell you about that, so I'm not going to step on his toes. I'm going to talk about TensorFlow Serving. So if you want to put up a REST API that serves answers for your model, you would want to use TensorFlow Serving. And why would you want to use this? For one thing, TensorFlow Serving has a lot of flexibility. It supports multi-tenancy. You can run multiple models on a single server instance. You can also run multiple versions of the same model. This can be really useful when you're trying to canary a new model. Say you have a tried and tested version of your model. You've created a new one. It's passed your evaluator. It's passed your validation. But you still want to do some A/B testing with real users before you completely switch over. TensorFlow Serving supports this. We also support optimization with GPU and TensorRT. And you can expose a gRPC or a REST API. TensorFlow Serving is also optimized for high performance. It provides low latency, request batching-- so that you can optimize your throughput while still respecting latency requirements-- and traffic isolation. So if you are serving multiple models on a single server, a traffic spike in one of those models won't affect the serving of the other. And finally, TensorFlow Serving is production-ready. This is what we used to serve many of our models inside of Google. We've served millions of QPS with it. You can scale in minutes, particularly if you use the Docker image and scale up on Kubernetes. And we support dynamic version refresh. So you can specify a version refresh policy to either take the latest version of your model, or you can pin to a specific version. This can be really useful for rollbacks if you find a problem with the latest version after you've already pushed. I'm going to go into a little bit more detail about how you might deploy a REST API for your model. We have two different options for doing this presented here. The first, the top command is using Docker, which we really recommend. It requires a little bit of ramp up at the beginning, but you will really save time in the long run by not having to manage your environment and not having to manage your own dependencies. You can also run locally on your own host, but then you do have to do all of that stuff long term. I'm going to go into a little bit more detail on the Docker run command. So you start with Docker run. You choose a port that you want to bind your API to. You provide the path to the saved model that was generated by your trainer. Hopefully, it was pushed by the pusher. You provide the model name. And you tell Docker to run the TensorFlow Serving binary. Another advantage of using Docker is that you can easily enable hardware acceleration. If you're running on a host with a GPU and the Nvidia Docker image installed, you can modify this command line by a few tokens, and then be running on accelerated hardware. If you need even further optimization, we now support optimizing your model for serving using TensorRT. TensorRT is a platform for Nvidia for optimized deep learning inference. Your Chicago taxi example that we've been using here probably wouldn't benefit from this. But if you had, say, an image recognition model, a ResNet, you could really get some performance boosting and cost savings by using TensorRT. We provide a command line that allows you to convert the saved model into a TensorRT optimized model. So then again, a very simple change to that original command line. And you're running on accelerated GPU hardware with TensorRT optimization. So to put it all together again, we introduced TensorFlow Extended or TFX. We showed you how the different components that TFX consists of can work together to help you manage the end-to-end lifecycle of your machine learning pipeline. First, you have your data. And we have tools to help you make sense of that and process it and prepare it for training. We then support training your model. And after you train your model, we provide tools that allow you to make sense of what you're seeing, of what your model's doing, and to make improvements. Also, to make sure that you don't regress. Then we have the pusher that allows you to push to various deployment options and make your model available to serve users in the real world. To get started with TensorFlow Extended, please visit us on GitHub. There is also more documentation at TensorFlow.org/tfx. And some of my teammates are running a workshop tomorrow. And they'd love to see you there. You don't need to bring a laptop. We have machines that are set up and ready to go. And you can get some hands-on experience using TensorFlow Extended. [MUSIC PLAYING]
B1 model data evaluator tfx serving docker TensorFlow Extended (TFX) Post-training Workflow (TF Dev Summit '19) 1 0 林宜悉 posted on 2020/04/04 More Share Save Report Video vocabulary