Placeholder Image

Subtitles section Play video

  • TIM DAVIS: Dance Like enables you to learn

  • how to dance on a mobile phone.

  • CHRIS MCCLANAHAN: TensorFlow flow

  • can take our smartphone camera and turn it

  • into a powerful tool for analyzing body posts.

  • ANDREW SELLE: We had a team at Google

  • that had developed an advanced model

  • for doing pose segmentation.

  • So we're able to take their implementation,

  • convert it into TensorFlow Lite.

  • Once we had it there, we could use it directly.

  • SHIKHAR AGARWAL: To run all the AI and machine learning models,

  • to detect body part, it's a very computationally expensive

  • process where we need to use the on device GPU,

  • TensorFlow Library made it possible so that we can

  • leverage all these resources-- the compute on the device--

  • and give a great user experience.

  • ANDREW SELLE: Teaching people to dance

  • is just the tip of the iceberg.

  • Anything that involves movement would be a great candidate.

  • TIM DAVIS: So that means people who have skills

  • can teach other people those skills.

  • And AI is just this layer that really just interfaces

  • between the two things.

  • When you empower people to teach people,

  • I think that's really when you have something

  • that is game changing.

  • NUPUR GARG: When Tim originally did this,

  • he did this in slow motion.

  • We use these models that are running on device in order

  • to speed up his dance performance to match

  • the professional dancer.

  • We also snapshotted a few motions

  • in order to understand what motions he was doing well

  • and what he needed to improve on applications like this can

  • be used on device for educational purposes

  • for not only dance but other applications as well.

  • New cutting edge models are also pushing the boundaries

  • of what's available on device.

  • BERT is a method of pre-training language representations,

  • which obtain state-of-the-art results on a wide array

  • of natural language processing tasks.

  • Today, we're launching MobileBERT.

  • BERT has been completely re-architected to not only

  • be smaller, but be faster without losing any accuracy.

  • Running MobileBERT with TensorFlow Lite

  • is 4.4 times faster on the CPU than BERT,

  • and 77% smaller while maintaining the same accuracy.

  • Let's take a look at a demo application.

  • So this is a question and answer demo application that

  • takes snippets from Wikipedia.

  • It has a user ask questions on a particular topic,

  • or it suggests a few preselected questions to ask.

  • And then searches a text corpus for the answers

  • to the questions all on device.

  • We encourage you to take a look at both of these demo

  • applications at our booth.

  • So we've worked hard to bring these features of Dance Like

  • and MobileBERT to your applications

  • by making it easy to run machine learning models on device.

  • In order to deploy on device, you first

  • need to get a TensorFlow Lite model.

  • Once you have the model, then you

  • can load it into your application,

  • transform the data in the way that the model requires it,

  • run the model, and use the resulting output.

  • In order to get the model, we've created a rich model

  • repository.

  • We've added many new models that can

  • be utilized in your applications in production right now.

  • These models include the basic models such as MobileNet

  • and inception.

  • It'll also include MobileBERT, Style Transfer, and DeepLab v3.

  • Once you have your models, you can use our TensorFlow Lite

  • Support Library that we're also launching this week.

  • It's a new library for processing and transforming

  • data.

  • Right now, it's available for Android for image models.

  • But we're working on adding support for iOS

  • as well as additional types of models.

  • The support library simplifies the pre-processing and

  • post-processing logic on Android.

  • This includes functions such as rotating the image 90 degrees

  • or cropping the image.

  • We're working on providing auto-generation APIs that

  • target your specific model and provide APIs

  • that are simple for your model.

  • With our initial launch, as I mentioned,

  • it'll be focused on image use cases.

  • However, we're working on expanding the use cases

  • to a broader range of models.

  • So let's take a look at how this looks in code.

  • So before a support library, in order

  • to add TensorFlow Lite in your application,

  • you needed to do all of this code,

  • mostly doing data pre-processing and post-processing.

  • However, with the use of the auto-generation support

  • libraries, all of this code is simplified

  • into five lines of code.

  • The first two lines are loading the model.

  • Then, you can load your image bit data into the model.

  • And it'll transform the image as required.

  • Next, you can run your model and it'll

  • output a map of the string labels

  • with the float probabilities.

  • This is how the code will look with auto-generation APIs

  • that we'll be launching later this year.

  • One of the biggest frustrations with using models

  • was not knowing the inputs and outputs of the models.

  • Now, model authors can include this metadata

  • with your model to have it available from the start.

  • This is an example of a JSON file

  • that the model author can package into the model.

  • This will be launched with the auto-generate APIs.

  • And all of our models in the model garden

  • will be updated to have this metadata.

  • In order to make it easy to use all of the models in our model

  • garden and leverage the TF support library,

  • we have added example applications for Android

  • and iOS for all of the models, and made

  • the applications use the TF Support Library

  • wherever possible.

  • We're also continuing to build out our demo applications

  • on both the Raspberry Pi and as CPU.

  • So now, what if your use case wasn't

  • covered, either by our model garden or the support library?

  • Revisiting all the use cases, there is a ton of use cases

  • that we haven't talked about in those specific models

  • that we listed.

  • So the first thing you need to do is either find a model

  • or generate a model yourself from TensorFlow APIs,

  • either keras APIs or estimator APIs.

  • Once you have a SavedModel, which is the unified file

  • format for 2.0, you can take the model,

  • pass it through the TensorFlow Lite Converter.

  • And then you'll get a TensorFlow Lite flapper for model output.

  • In code, it's actually very simple.

  • You can generate your model, save it with one line,

  • and use two lines of code to take in the same model

  • and convert it.

  • We also have APIs that directly convert keras models.

  • All the details of those are available on our website.

  • Over the last few months, we've worked really hard

  • on improving our converter.

  • We've added a new converter, which has better debug ability,

  • including source file location identification.

  • This means you can know exactly where in your code cannot be

  • converted to TF Lite.

  • We've also added support for Control Flow

  • v2, which is the default control flow in 2.0.

  • In addition, we're adding new operations, as well as

  • support for new models, including

  • Mask R-CNN, Faster R-CNN, MobileBert, and Deep Speech v2.

  • In order to enable this new feature, all you have to do

  • is set experimental new converter flag to true.

  • We encourage everyone to participate

  • in the testing process.

  • We've planned to set this new converter as a default back end

  • at some point in the future.

  • So let's look at the debug ability of this new converter.

  • So when running this model, it gives an error

  • that TF reciprocal op is neither a custom op nor a flex op.

  • Then it provides a stack trace, allowing

  • you to understand where in the code this operation is called.

  • And that way, you know exactly what line to address.

  • Once you have your TF Lite model,

  • it can be integrated into your application

  • the same way as before.

  • You have to load the model, pre-process it, run it,

  • and use the resulting output.

  • Let's take a look at a paradigm version of this code in Kotlin.

  • So the first two lines, you have to load the model.

  • And then you have to run it through our interpreter.

  • Once you have loaded the model, then you

  • need to initialize the input array and the output array.

  • The input should be a byte buffer.

  • And the output array needs to contain

  • all of the probabilities.

  • So it's a general float array.

  • Then you can run it through the interpreter

  • and do any post-processing as needed.

  • To summarize these concepts, you have the converter

  • to generate your model and the interpreter to run your model.

  • The interpreter calls into op kernels and delegates,

  • which I'll talk about in detail in a bit.

  • And guess what?

  • You can do all of this in a variety of language bindings.

  • We've released a number of new, first class language bindings,

  • including Swift and Objective C for iOS,

  • C# for Unity developers, and C for native developers on any

  • platform.

  • We've also seen a creation of a number

  • of community-owned language bindings

  • for Rust, Go, and Dart.

  • Now that we've discussed how TensorFlow Lite works

  • at a high level, let's take a closer look under the hood.

  • One of the first hurdles developers

  • face when deploying models on device is performance.

  • We've worked very hard, and we're

  • continuing to work hard on making

  • this easy out of the box.

  • We worked on improvements on the CPU, GPU, and many custom

  • hardwares, as well as adding tooling

  • to make it easy to improve your performance.

  • So this slide shows TF Lite's performance

  • at Google I/O in May.

  • Since then, we've had a significant performance

  • improvement across the board, from float models on the CPU

  • to models on the GPU.

  • Just to reemphasize how fast this is,

  • a floor model for MobileNet v1 takes 37 milliseconds

  • to run on the CPU.

  • If you quantize that model, it takes only 13 milliseconds

  • on the CPU.

  • On the GPU, a float model takes six milliseconds.

  • And on the Edge TPU, in quantized fixed point,

  • it takes two milliseconds.

  • Now let's discuss some common techniques to improve the model

  • performance.

  • There is five main approaches in order to do this--

  • use quantization, pruning, leverage hardware accelerators,

  • use mobile optimized architectures,

  • and per-op profiling.

  • The first way to improve performance

  • is use quantization.

  • Quantization is a technique used to reduce

  • the position of static parameters,

  • such as weights, and dynamic values, such as activations.

  • For most models, training at inference useful at 32.

  • However, in many use cases, using int 8 or float 16

  • instead of float 32 improves latency

  • without a significant decrease to accuracy.

  • Using quantization enables many hardware accelerators that only

  • support 8-bit computations.

  • In addition, it allows additional acceleration

  • on the GPU, which is able to do two float 16 computations

  • for one Float 32 computation.

  • We provide a variety of techniques

  • for performing quantization as part of the model optimization

  • toolkit.

  • Many of these techniques can be performed

  • after training for ease of use.

  • The second technique for improving model performance

  • is pruning.

  • During model pruning, we set unnecessary weight values

  • to zero.

  • By doing this, we're able to remove

  • what we believe are unnecessary connections between layers

  • of a neural network.

  • This is done during the training process

  • in order to allow the neural network

  • to adapt to the changes.

  • The resulting weight tensor is will have a lot more zeros,

  • and therefore will increase the sparsity of the model.

  • With the addition of sparse tensor representations,

  • the memory band width of the kernels can be reduced,

  • and faster kernels can be implemented for the CPU

  • and custom hardware.

  • For those who are interested, Raziel

  • will be talking about pruning and quantization in-depth

  • after lunch in the Great American Ballroom.

  • Revisiting the architecture diagram more closely,

  • the interpreter calls into op kernels and delegates.

  • The op colonels are highly optimized for the ARM Neon

  • instruction set.

  • And the delegates allow you to access

  • accelerators, such as the GPU, DSP, and Edge TPU.

  • So let's see how that works.

  • Delegates allow part or entire parts

  • of the graph to execute on specialized hardware instead

  • of the CPU.

  • In some cases, some operations may not

  • be supported by the accelerator.

  • So portions of that graph that can be offloaded

  • for acceleration are delegated.

  • And remaining portions of the graph are run on the CPU.

  • However, it's important to note that when the graph is

  • delegated into too many components, then

  • it can slow down the graph execution in some cases.

  • The first delegate we'll discuss is the GPU delegate,

  • which enables faster execution for float models.

  • It's up to seven times faster than the floating point

  • CPU implementations.

  • Currently, the GPU delegate uses OpenCL when possible,

  • or otherwise OpenGL on Android.

  • And uses Metal on iOS.

  • One trade-off with delegates is the increase

  • to the binary size.

  • The GPU delegate adds about 250 kilobytes to the binary size.

  • The next delegate is a Qualcomm Hexagon DSP delegate.

  • In order to support a greater range of devices,

  • in especially in mid to low-tier devices,

  • we have worked with Qualcomm to develop a delegate

  • for the hexagon chipset.

  • We recommend using the hexagon delegate on devices Android O

  • and below, and the NN API delegate,

  • which I'll talk about next, on devices Android P and above.

  • This delegate accepts integer models

  • and increases the binary size by about two megabytes.

  • And it'll be launching soon.

  • Finally, we have the NN API delegate, or the Neural Network

  • API.

  • The NN API delegate supports over 30 ops on the Android P,

  • and over 90 ops on Android Q. This delegate

  • accepts both float and integer models.

  • And it's built into Android devices

  • and therefore has no binary size increase.

  • The code for all the delegates is very similar.

  • All you have to do is create the delegate

  • and add it to the TF Lite options for the interpreter

  • when using it.

  • Here's an example with a GPU delegate.

  • And here's an example with an NN API delegate.

  • The next way to improve performance

  • is to choose a suitable model with a suitable model

  • architecture.

  • For many image classification tasks,

  • people generally use Inception.

  • However, when doing on device, MobileNet

  • is 15 times faster and nine times smaller.

  • And therefore, it's important to investigate

  • the trade-off between the accuracy and the model

  • performance and size.

  • This applies to other applications as well.

  • Finally, you want to ensure that you're

  • benchmarking and validating all of your models.

  • We offer simple tools to enable this

  • for per-op profiling, which helps

  • determine which ops are taking the most computation time.

  • This slide shows a way to execute the per-op profiling

  • tool through the command line.

  • This is what our tool will output when you're doing

  • per-op profiling for a model.

  • And it enables you to narrow down your graph execution

  • and go back and tune performance bottlenecks.

  • Beyond performance, we have a variety of techniques

  • relating to op coverage.

  • The first allows you to utilize TensorFlow ops that are not

  • natively supported in TF Lite.

  • And the second allows you to reduce your binary size

  • if you only want to include a subset of ops.

  • So one of the main issues that users

  • face when converting a model from TensorFlow to TensorFlow

  • Lite is unsupported ops.

  • TF Lite has native implementations

  • for a subset of the TensorFlow ops

  • that are optimized for mobile.

  • In order to increase op coverage,

  • we have added a feature called TensorFlow Lite Select, which

  • adds support for many of the TensorFlow ops.

  • The one trade-off is that it can increase binary size

  • by six megabytes, because we're pulling in the full TensorFlow

  • runtime.

  • This is a code snippet showing how you

  • can use TensorFlow Lite Select.

  • You have to set the target_spec.supported_ops

  • to include both built-in and select ops.

  • So built-in ops will be used when possible in order

  • to utilize optimized kernels.

  • And select ops will be used in all other cases.

  • On the other hand, for TF Lite developers

  • who deeply care about their binary footprint,

  • we've added a technique that we call selective registration,

  • which only includes the ops that are required by the model.

  • Let's take a look at how this works in code.

  • You create a custom op resolver that you

  • use in place of TF Lite to build an op resolver.

  • And then in your build file, you specify

  • your model and the custom op resolver that you created.

  • And TF Lite will scan over your model

  • and create a registry of ops contained within your model.

  • When you build the interpreter, it'll

  • only include the ops that are required by your model,

  • therefore reducing your overall binary size.

  • This technique is similar to the technique that's

  • used to provide support for custom operations, which

  • are user-provided implementations for ops

  • that we do not support as built-in ops.

  • And next, we have Pete talking about microcontrollers.

  • PETE WARDEN: As you've seen, TensorFlow

  • has had a lot of success in mobile devices,

  • like Android and iOS.

  • We're in over three billion devices in production.

  • Oh, I might actually have to switch back to--

  • let's see-- yes, there we go.

  • So what is really interesting, though,

  • is that there were actually over 250 billion

  • microcontrollers out there in the world already.

  • And you might not be familiar with them

  • because they tend to hide in plain sight.

  • But these are things that you get

  • in your cars and your washing machines,

  • in almost any piece of electronics these days.

  • They are extremely small.

  • They only have maybe tens of kilobytes of RAM and Flash

  • to actually work with.

  • They often don't have a proper operating system.

  • They definitely don't have anything like Linux.

  • And they are incredibly resource-constrained.

  • And you might think, OK, I've only

  • got tens of kilobytes of space.

  • What am I going to be able to do with this?

  • A classic example of using microcontrollers is actually--

  • and you'll have to forgive me if anybody's phone goes off--

  • but, OK Google.

  • That's driven by a microcontroller

  • that runs always on DSP.

  • And the reason that it's running on a DSP,

  • even though you have this very powerful ARM CPU sitting there

  • is that a DSP only uses tiny amounts of battery.

  • And if you want your battery to last

  • for more than an hour or so, you don't want the CPU

  • on all the time.

  • You need something that's going to be able to sit there and sip

  • almost no power.

  • So the setup that we tend to use for that is you

  • have a small, comparatively low accuracy model

  • that's always running on this very low energy DSP that's

  • listening out for something that might

  • sound a bit like OK Google.

  • And then if it thinks it's heard that,

  • it actually wakes up the main CPU,

  • which is much more battery hungry,

  • to run an even more elaborate model to just double check

  • that.

  • So you're actually able to get this cascade of deep learning

  • models to try and detect things that you're interested in.

  • And this is a really, really common pattern.

  • Even though you might not be able to do an incredibly

  • accurate model on a microcontroller or a DSP,

  • if you actually have this kind of architecture,

  • it's very possible to do really interesting and useful

  • applications and keep your battery life actually alive.

  • So we needed a framework that would actually

  • fit into this tens of kilobytes of memory.

  • But we didn't want to lose all of the advantages we

  • get from being part of this TensorFlow Lite

  • ecosystem and this whole TensorFlow ecosystem.

  • So what we've actually ended up doing

  • is writing an interpreter that fits

  • within just a few kilobytes of memory,

  • but still uses the same APIs, the same kernels, the same file

  • buffer format that you use with regular TensorFlow Lite

  • for mobile.

  • So you get all of these advantages, all

  • of these wonderful tooling things

  • that Nupur was just talking about that are coming out.

  • But you actually get to deploy on these really tiny devices.

  • [VIDEO PLAYBACK]

  • - Animation.

  • OK, so now it's ready.

  • And so it even gives you instructions.

  • So instead of listening constantly,

  • which we thought some people don't like the privacy side

  • effects of it, is you have to press the button A here.

  • And then you speak into this microphone

  • that I've just plugged into this [INAUDIBLE] port here.

  • It's just a standard microphone.

  • And it will display a video and animation and audio.

  • So let's try it out.

  • I'm going to press A and speak into this mic.

  • Yes.

  • Moo.

  • Bam.

  • - You did it.

  • - Live demo.

  • - So that's what we wanted to show.

  • And it has some feedback on the screen.

  • It shows the version, it shows what we're using.

  • And this is all hardware that we have now-- battery power,

  • [INAUDIBLE] power.

  • - Yes, yes, yes, yes, yes.

  • This is all battery powered.

  • [END PLAYBACK]

  • PETE WARDEN: So what's actually happening

  • though is it plays an animation when she says the word yes,

  • because it's recognized that.

  • There's actually an example of using

  • TensorFlow Lite for microcontrollers,

  • which is able to recognize simple words like yes or no.

  • And it's really a tutorial on how you can create

  • something that's very similar to the OK Google model

  • that we've run on DSP and phones to recognize short words,

  • or even do things like recognize,

  • if you want to recognize breaking glass,

  • if you want to recognize any other audio noises,

  • there's a complete tutorial that you can actually grab

  • and then you could deploy on these kind of microcontrollers.

  • And if you're lucky and you stop by the TensorFlow Lite booth,

  • we might even have a few of these microcontrollers left

  • to give away from Adafruit.

  • So I know some of you out there in the audience already

  • have that box, but thanks to the generosity of ARM,

  • we've actually been able to hand some of those out.

  • So come by and check that out.

  • So let's see if I can actually--

  • yes.

  • And the other good thing about this

  • is that you can use this on a whole variety

  • of different microcontrollers.

  • We have an official Arduino library.

  • So if you're using the Arduino IDE,

  • you can actually grab it immediately.

  • Again, AV-- much harder than AI.

  • Let's see.

  • So we'll have the slides available.

  • So you can grab them.

  • But we actually have a library that you can grab directly

  • through the Arduino IDE.

  • And you just choose it like you would any other library,

  • if you're familiar with that.

  • But we also have it available through systems like Mbed,

  • if you're used to that on the ARM devices.

  • And through places like SparkFun and Adafruit,

  • you can actually get boards.

  • And what this does--

  • you'll have to trust me because you

  • won't be able to see the LED.

  • But if I do a W gesture, it lights up the red LED.

  • If I do an O, it lights up the blue LED.

  • Some of you in the front may be able to vouch for me.

  • And then if I do an L--

  • see if I get this right--

  • it lights up the yellow LED.

  • As you can tell, I'm not an expert wizard.

  • We might need to click on the play focus.

  • Let's see this.

  • Fingers crossed.

  • Yay.

  • I'm going to skip past--

  • oh my god, we have audio.

  • This is amazing.

  • It's a Halloween miracle.

  • Awesome.

  • So, yes, you can see here--

  • Arduino.

  • A very nice video from them.

  • And they have some great examples out there too.

  • You can just pick up their board and get

  • running in a few minutes.

  • It's pretty cool.

  • And as I was mentioning with the magic wand,

  • here we're doing an accelerometer gesture

  • recognition.

  • You can imagine there's all sorts of applications for this.

  • And the key thing here is this is running on something that's

  • running on a coin battery, and can run on a coin battery

  • for days or even weeks or months,

  • if we get the power optimization right.

  • So this is really the key to this ubiquitous ambient

  • computing that you might be hearing a lot about.

  • And what other things can you do with these kind of MCUs?

  • They are really resource limited.

  • But you can do some great things like simple speech recognition,

  • like we've shown.

  • We have a demo at the booth of doing person detection using

  • a 250 kilobyte MobileNet model that just detects

  • whether or not there's a person in front of the camera, which

  • is obviously super useful for all sorts of applications.

  • We also have predictive maintenance,

  • which is a really powerful application.

  • If you think about machines in factories,

  • even if you think about something like your own car,

  • you can tell when it's making a funny noise.

  • And you might need to take it to the mechanics.

  • Now if you imagine using machine learning models

  • on all of the billions of machines that are running

  • in factories and industry all around the world,

  • you can see how powerful that can actually be.

  • So as we mentioned, we've got these examples

  • out there now as part of TensorFlow Lite

  • that you can run on Arduino, SparkFun, Adafruit,

  • all these kinds of boards, recognizing yes/no

  • with the ability to retrain using TensorFlow

  • for your own words you care about.

  • Doing person detection is really interesting

  • because we've trained it for people,

  • but it will actually also work for a whole bunch

  • of other objects in the COCO data set.

  • So if you want to detect cars instead of people,

  • it's very, very easy to just re-target it for that.

  • And gesture recognition.

  • We've been able to train it to recognize

  • these kinds of gestures.

  • Obviously, if you have your own things

  • that you want to recognize through accelerometers,

  • that's totally possible to do as well.

  • So one of the things that's really

  • helped us do this has been our partnership with ARM,

  • who designed all the devices that we've actually

  • been showing today.

  • So maybe the ARM people up front,

  • if you can just give a wave so people can find you.

  • And thank you.

  • They've actually been contributing a lot of code.

  • And this has been a fantastic partnership for us.

  • And stay tuned for lots more where that came from.

  • So that's it for the microcontrollers.

  • Just to finish up, I want to cover a little bit

  • about where TensorFlow Lite is going in the future.

  • So what we hear more than anything is people

  • want to bring more models to mobile and embedded devices.

  • So more ops, more supported models.

  • They want their models to run faster.

  • So we're continuing to push on performance improvements.

  • They want to see more integration with TensorFlow

  • and things like TensorFlow Hub.

  • And easier usage of all these, which

  • means better documentation, better examples, better

  • tutorials.

  • On-device training and personalization

  • is a really, really interesting area

  • where things are progressing.

  • And we also really care about trying to figure out

  • where your performance is going, and actually trying

  • to automate the process of profiling and optimization,

  • and helping you do a better job with your models.

  • And to help with all of that, we also

  • have a brand new course that's launched on Udacity

  • aimed at TensorFlow Lite.

  • So please check that out.

  • So that's it from us.

  • Thank you for your patience through all

  • of the technical hiccups.

  • I'm happy to answer any of your questions.

  • I think we're going to be heading over to the booth

  • after this.

  • So we will be there.

  • And you can email us at tflite@tensorflow.org

  • if you have anything that you want to ask us about.

  • So thank you so much.

  • I'll look forward to chatting.

  • [APPLAUSE]

TIM DAVIS: Dance Like enables you to learn

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it