Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • SARA ROBINSON: Hi, everyone.

  • Thank you for coming.

  • Today we're going to talk about machine learning APIs

  • by example.

  • I'm going to teach you how you can

  • access pre-trained machine learning models

  • with a single API call.

  • My name is Sara Robinson.

  • I'm a developer advocate on the Google Cloud Platform

  • team, which basically means I get to help build demos, give

  • talks about them, and bring product feedback back

  • to the engineering teams.

  • You can find me on Twitter, @SRobTweets.

  • And I live in New York.

  • So before we get started, let's talk

  • about what machine learning is at a high level.

  • So at a high level, machine learning

  • is teaching computers to recognize patterns

  • in the same way that our brains do.

  • So it's really easy for a child to recognize

  • the difference between a picture of a cat and a dog,

  • but it's much more difficult to teach computers

  • to do the same thing, right?

  • So we could write rules to look for specific things,

  • but we can almost always find a condition that's

  • going to break those rules.

  • So instead, what we want to do is

  • write code that finds these rules for us and improves

  • over time through examples and experience.

  • Here we have a neural network that's

  • identifying a picture as either a picture of a cat or a dog.

  • And we can think of the input to this network

  • as pixels in the image.

  • And then each neuron is looking for

  • a specific identifying feature.

  • Maybe it's the shape of the ear or the hair.

  • And then the output is a prediction--

  • in this case, that is a dog.

  • Let's take a step back from this for a moment

  • and let's try some human-powered image detection,

  • if we were to do this on our own.

  • We'll take this picture of an apple and an orange.

  • And let's say we were going to start

  • writing an algorithm that would identify

  • the difference between these two.

  • What are some features that you might look for you?

  • You can shout it out.

  • AUDIENCE: Color.

  • SARA ROBINSON: Color.

  • I heard a bunch of "color."

  • AUDIENCE: Shape.

  • SARA ROBINSON: Shape.

  • AUDIENCE: Texture.

  • SARA ROBINSON: Texture-- lots of good ones.

  • So color's a good one, but then what

  • would happen if we had black and white images?

  • Then we might have to start all over again.

  • So in that case, we could for a stem.

  • Texture would be good.

  • But then what happens if we add a third fruit?

  • If we add a mango, we have to start all over again as well.

  • But these pictures are all pretty similar, right?

  • So what would happen if we had pictures of two things that

  • were very different?

  • This should be really easy, right?

  • A dog and a mop have pretty much nothing in common

  • from these pictures that I can see.

  • But it's actually a little tricky.

  • So what we have here is pictures of sheep dogs and mops.

  • And it's actually kind of hard to tell the difference, right?

  • If we were going to write code that identified these two,

  • it would be pretty tricky to do.

  • And then what happens if we have photos of everything?

  • We don't want to write specific rules

  • to identify each little thing that we're trying to look for.

  • And in addition to photos, we could have many other types

  • of unstructured data.

  • We could have video, audio, text,

  • lots of different types of data we'd be dealing with.

  • And really, we want some tools to help

  • us make sense of all of this unstructured data.

  • And Google Cloud Platform has several products

  • to help you benefit from machine learning.

  • On the left-hand side here, you can use your own data

  • to build and train your own machine learning model.

  • So we have TensorFlow and Cloud Machine Learning Engine for you

  • to do that.

  • And on the right-hand side, this is

  • where I'm going to focus today.

  • This is what I like to call friendly machine learning.

  • These are machine learning APIs that give you

  • access to a pre-trained machine learning model with one

  • single REST API request.

  • So you make a request.

  • You send it some data.

  • You get some data back from this pre-trained model.

  • And you don't have to worry about building and training

  • your own model or anything that's going on under the hood.

  • I'm going to give you an introduction

  • to each of these five APIs.

  • I'm going to start with Vision, and I'm

  • going to end with Video Intelligence, which

  • is our newest API you may have seen in the keynote yesterday.

  • So let's get started with Vision.

  • The Vision API lets you do complex image detection

  • with a simple REST request.

  • And I'm going to start each section

  • by talking about a customer or customers

  • that are using each API.

  • So for the Vision API, the first example on the left

  • is Disney used the Vision API for a game

  • to promote the movie "Pete's Dragon."

  • And the way the game worked is that users were given a quest.

  • So they had a clue, and they had to take

  • a picture of that word--

  • maybe couch, computer, everyday objects.

  • And if they took the picture correctly,

  • they would superimpose an image of the dragon on that object.

  • So the problem was they needed a way

  • to verify that the user took a picture of the correct object

  • that they were prompted for.

  • And the Vision API was a perfect fit to do that.

  • They used the label detection feature, which basically tells

  • you, what is this a picture of?

  • And they were able to verify images in the game that way.

  • Realtor.com uses the Vision API for their mobile application.

  • So its a real estate listing service, and people

  • can go around as they are looking for houses

  • and take pictures of a "for sale" sign.

  • And they use the Vision API's OCR, Optical Character

  • Recognition, to read the text in the image

  • and then pull up the relevant listing for that house.

  • So those are two examples of the Vision API in production.

  • Let's talk a little bit more about the specific features

  • of the Vision API.

  • So as I mentioned, we have label detection,

  • which is kind of the core feature which

  • tells-- you send it an image.

  • In this case, it's a cheetah.

  • And it'll tell you what this is a picture of.

  • It'll give you a bunch of different labels back.

  • Face detection will identify faces in an image.

  • It'll tell you where those faces are in the image.

  • And it'll even tell you if they're happy, sad, surprised,

  • or angry.

  • OCR is what I mentioned with realtor.com's use case.

  • This can identify text in an image.

  • It will tell you where the text is, what the text says,

  • and what language it's in.

  • Explicit content detection will tell you,

  • is this image appropriate or not--

  • really useful if you've got a site

  • with a lot of user-generated content

  • and you don't want to manually filter images.

  • You can use this API method to do that really easily.

  • Landmark detection will tell you, is this a common landmark?

  • If so, what is the latitude and longitude?

  • And then logo detection, pretty self-explanatory,

  • will identify logos in an image.

  • So a quick look at some of the JSON response

  • you might get back for these different features.

  • This is face detection.

  • This is actually a selfie that I took with two teammates

  • on a trip to Jordan last year.

  • And the response you're looking at in the slide is on my face.

  • So it'll return an object for each face

  • you find in the image.

  • And we can see it says, headwear likelihood, very unlikely,

  • which is true.

  • I'm not wearing a hat.

  • But for both of my teammates, it did return headwear likelihood,

  • very likely.

  • And then we can see it's highlighted below.

  • It says, joy likelihood is very likely, which is true.

  • I am smiling in the picture.

  • The next feature I want to show you the response for

  • is landmark detection.

  • So we have a picture here of what

  • looks like the Eiffel Tower.

  • It's actually the Paris Hotel and Casino in Las Vegas.

  • I wanted to see if the Vision API was fooled.

  • And it was not.

  • It correctly identified this as the Paris Hotel and Casino.

  • You can see that MID in the JSON response.

  • That's an ID that maps to Google's Knowledge Graph

  • API, which will just give you a little more data

  • about the entity.

  • And it also tells us the latitude and longitude of where

  • this Paris Hotel and Casino is.

  • In addition to these features, we launched some new features

  • this week, which you may have heard

  • about in yesterday's keynote.

  • I'm going to quickly talk about what they are.

  • And then I'll show you some examples.

  • The first one is crop hints, which

  • will give you suggested crop dimensions for your photos.

  • Web annotations-- I'm super excited about this one.

  • This will give you some granular data

  • on entities, web entities that are found in your image.

  • It'll also tell you all the other pages where the image

  • exists on the internet.

  • So if you need to do copyright detection,

  • it'll give you the URL of the image and the URL

  • of the page where the image is.

  • And then finally, we announced document text annotations.

  • So in addition to the OCR we had before, this is improved

  • OCR for large blocks of text.

  • If you have an image of a receipt or something

  • with a lot of text, it'll give you

  • very granular data on the paragraphs and the words

  • and the symbols in that text.

  • Some examples of the new features,

  • I want to highlight web annotations.

  • So here we have a picture of a car.

  • It's in a museum.

  • I'm actually a big "Harry Potter" fan,

  • so this is a car from one of the "Harry Potter" movies.

  • And I wanted to see what the Vision

  • API was able to find from the web entities in this image.

  • So it was able to identify it correctly

  • as a Ford Anglia, which is correct.

  • This is from the second "Harry Potter" movie when they

  • tried to fly a car to school.

  • Second entity it returned is art science museum.

  • And this is a museum in Singapore

  • where this car is on display.

  • And it is finally able to tell me

  • that this car is from "Harry Potter," which

  • is a literary series.

  • So lots of great metadata you can get back from web

  • annotations.

  • Even more data that it returns--

  • it tells you the full matching image URLs.

  • Where else does this image exists on the internet?

  • Partial matching images.

  • And then finally, all the pages that point to this image.

  • It's really useful information you get back

  • with web annotations.

  • And you can try this in the browser with your own images

  • before writing any code.

  • Just go to cloud.google.com/vision.

  • You can upload your images, play around,

  • see all the responses you get back

  • from the different features.

  • And you can actually do this with all of the APIs

  • that I'm going to talk about today.

  • There's a way to try them in the browser.

  • And in case you were wondering how the Vision API did

  • with the sheep dogs and mops, so this is the response it got

  • from that picture on the right.

  • So it's 99% sure it's a dog.

  • It actually even is able to identify the breed of the dog--

  • Komondor.

  • I may be saying that wrong.

  • And the mops, it successfully identified this

  • as a broom or a tool.

  • And skipping ahead, it did pretty well overall.

  • In the top row, the third one, it didn't identify it as a dog.

  • It just said "fur."

  • So I don't know if that's a hit or a miss.

  • And then the third mop, it said "textile."

  • So it didn't quite get that it was a mop or a broom.

  • But overall, Vision API performed pretty well

  • on these tricky images that are even hard for us to decipher

  • what they are exactly.

  • So that was the Vision API showing you

  • how you can get a lot of data on your images

  • with a pre-trained machine learning model.

  • Next I want to talk about audio data.

  • And the Speech API essentially exposes the functionality

  • of "OK, Google" to developers.

  • It lets you do speech-to-text transcription in over 80

  • languages.

  • One app that's using the Speech API is called Azar.

  • And they have connected 15 million

  • matches, an app to find friends and chat.

  • And they use the Speech API for all of the messages

  • that involve audio snippets.

  • And they're also using this in combination

  • with the Cloud Translation API, which I'm

  • going to talk about later on.

  • So there's a lot of potential use cases

  • where you could combine different machine learning APIs

  • together.

  • So in cases where the matches don't speak the same language,

  • they'll use the Speech API to transcribe the audio

  • and then the Translation API to translate that text.

  • The best way to experience a Speech API is with a demo.

  • Before I get into it, I want to explain a bit how it works.

  • So we're going to make a recording.

  • I wrote a bash script.

  • We're going to use SoX to do that.

  • It's a command line utility for audio.

  • So what we'll do is we'll record our audio.

  • We'll create an API request in a JSON file.

  • And we'll send it to the Speech API.

  • And then we'll see the JSON response.

  • So if we could go ahead and switch to the demo--

  • OK, let me make the font a little bigger.

  • So I'm going to call my file with bash request.sh.

  • And it press "Press Enter when you're ready to record."

  • It's going to ask me to record a five-second audio file.

  • So here we go.

  • I built a Cloud Speech API demo using SoX.

  • OK, so this is the JSON request file that it just created.

  • We need to tell the Speech API the encoding type.

  • In this case, we're using FLAC encoding,

  • the sample rate in hertz.

  • The language code is optional.

  • If you leave it out, it will default to English.

  • Otherwise, you need to tell it what language your audio is in.

  • And then the speech context, I'm going

  • to talk about that in a little bit.

  • So I'm going to call the Speech API.

  • It's making a curl request.

  • And let's see how it did.

  • OK, so it did pretty good.

  • It said, "I built a Cloud Speech API demo using socks."

  • But you'll notice it got the wrong "SoX"

  • because "SoX" is a proper noun.

  • It was 89% confident that it got this correct.

  • That even was able to get "API" as an acronym.

  • So I mentioned this speech context parameter before.

  • And what this actually lets you do is let's

  • say you have a proper noun or a word

  • that you're expecting your application that's

  • unique that you wouldn't expect the API to recognize normally.

  • You can actually pass it as a parameter,

  • and it'll look out for that word.

  • So I'm going to hop on over to Sublime,

  • and I'm going to add "SoX" as a phrase so look out for.

  • And let's see if it's able to identify it.

  • I'm going to say the same thing again.

  • I'm going to record.

  • I built a Cloud Speech API demo using SoX.

  • And we can see it's now got that phrase in there.

  • And we will call the Speech API.

  • And it was able to get it correctly

  • using the phrases parameter, which is pretty cool.

  • Just one REST API request, and we are easily

  • transcribing an audio file, even with a unique entity.

  • And you can also pass the API audio files

  • in over 80 different languages.

  • You just need to tell it, again, the language code that you'd

  • like it to transcribe.

  • So that is the Speech API.

  • You can hop back to the slides.

  • So we've just transcribed our audio.

  • We have text.

  • So what happens if you want to do more analysis on that text?

  • And that is where the Natural Language API comes into play.

  • It lets you extract entities, sentiment, and syntax

  • from text.

  • A company that's using the Natural Language API

  • is called Wootric.

  • And they are a customer feedback platform

  • to help businesses improve their customer service.

  • And they do this by collecting millions of survey responses

  • each week.

  • A little more detail on how it works

  • is if you look at that box in the top right--

  • so a customer would place this on different pages

  • throughout their app.

  • Maybe if you're a developer, you would see it

  • on a documentation page.

  • And it would ask you to rate your experience on that page

  • from 0 to 10, which is what we call the Net Promoter

  • Score, NPS.

  • So they gather that score, and then they

  • give you some open feedback to expand on what

  • you thought of the experience.

  • So as you can imagine, it's pretty easy for them

  • to average out the Net Promoter Score among tons

  • of different responses.

  • But what's much more difficult is looking

  • at that open-ended feedback.

  • And that's where they use the Natural Language API.

  • And they actually made use of all three methods

  • in the Natural Language API.

  • They used intimate analysis to calibrate the numbered score

  • that users gave with their feedback to see if it aligned.

  • And then they use the entity and syntax annotation to figure out

  • what the subject was of the feedback

  • and then route it accordingly.

  • So maybe somebody was unhappy about pricing.

  • Then they could route that feedback

  • to the necessary person and respond pretty fast.

  • So using the Natural Language API,

  • they were able to route and respond to feedback

  • in near real-time rather than having

  • someone need to read each response,

  • classify it, and then route it.

  • So let's look at each of the methods of the Natural Language

  • API in a bit more detail.

  • As I mentioned, I'm a big "Harry Potter" fan.

  • So I took this sentence from JK Rowling's Wikipedia page.

  • And let's see what happens if we send this to the entity

  • extraction endpoint.

  • So it's able to pull these five entities from the sentence.

  • And the JSON we get back looks like this.

  • So what's interesting here is JK Rowling's name

  • is written in three different ways.

  • Robert Galbraith is actually a pen name

  • she used for a later book series.

  • And it's able to point all of these to the same entity.

  • So if you had things like "San Francisco" and "SF,"

  • it would point those to the same entities

  • so that you could count the different ways of mentioning

  • the same thing as the same entity.

  • So we can see it finds her name, Joanne "Jo" Rowling.

  • It tells you what type of entity it is--

  • a person.

  • And then if the entity has metadata,

  • it'll give you more metadata about it.

  • So here we get an MID, which maps to JK Rowling's Knowledge

  • Graph entry.

  • And then we get the Wikipedia URL to the page about her.

  • The JSON response look similar for the other entities

  • we found--

  • British.

  • It maps it to a location.

  • And then notice it connects it to the United Kingdom Wikipedia

  • URL.

  • So if it had instead said "UK" or "United Kingdom,"

  • it would point it to the same page.

  • And then for "Harry Potter," we also get person.

  • And we get the Wikipedia page for that entity as well.

  • So that's entity extraction.

  • That's one method you could use in the Natural Language API.

  • The next one is sentiment analysis.

  • So this is a review you might see.

  • It says, "The food was excellent,

  • I would definitely go back."

  • And we get two things here.

  • We get a score value, which will tell us,

  • is the sentiment positive or negative?

  • It's a value ranging from negative 1 to 1.

  • So we can see here it's almost completely positive.

  • And then magnitude will tell you how

  • strong is the sentiment regardless of being

  • positive or negative.

  • This can range from 0 to infinity.

  • And it's based on the length of the text.

  • So since this is a pretty short block of text,

  • the value is pretty low.

  • And then finally, you can analyze syntax.

  • So this method is a bit more complex.

  • It gets into the linguistic details of a piece of text.

  • It returns a bunch of different data here.

  • This visualization is actually created

  • from the in-browser demo.

  • So if you want to try it out in the browser,

  • you can create a similar visualization

  • with your own text.

  • And what it returns is on that top row, those green arrows,

  • that's what we call a dependency parse tree.

  • And that will tell us how each of the words in a sentence

  • relate to each other, which words they depend on.

  • In the second row, we see the orange row.

  • That's the parse label, which tells us

  • the role of each word in the sentence.

  • So we can see that "helps"-- the sentence

  • is "The natural language API helps us understand text."

  • "Helps" is the root verb.

  • "Us" is the nominal subject.

  • We can see the role of all the other words in the sentence

  • as well.

  • That third row where we only have one word is the lemma.

  • We can see here it says "help."

  • And what that is is the canonical form of the word.

  • So the canonical form of "helps" is "help."

  • So this way, if you're trying to count

  • how many times a specific word occurs,

  • it won't count "helps" and "help" as two different words.

  • It'll consolidate them into one word.

  • And then in red, we have the part of speech,

  • whether it's a noun, verb, adjective, or punctuation.

  • And then in blue, we have some additional morphology details

  • on the word.

  • There's more of these returned for Spanish and Japanese,

  • which are the other two languages that the API

  • currently supports.

  • So the syntax annotation feature might

  • be a little harder to grasp when you might

  • use this in an application.

  • So I wanted to show you a demo specifically focused

  • on that feature.

  • And for this demo, over the course of the past few days--

  • oh, I think the--

  • there we go.

  • Mic is back.

  • I've been using the Twitter Streaming API

  • to stream tweet about Google Next.

  • So I've streamed tweets with the hashtag #googlenext17

  • and a couple other search terms that are put in node server

  • that's running on Compute Engine.

  • And I'm streaming those tweets.

  • So the Streaming API gathers just a subset

  • of those tweets, not all the tweets with that hashtag.

  • And I'm sending the text of the tweet

  • to the Natural Language API.

  • And then I'm storing the response in BigQuery.

  • BigQuery's our big data analytics warehouse tool.

  • It let's you do analytics on really large data sets.

  • So I'm storing it in BigQuery.

  • And then from there, I can gather some data

  • on the parts of speech in a sentence.

  • So I can find, for example, the most common adjectives

  • that people are using to tweet about Google Next.

  • So if we could go to the demo--

  • cool.

  • So this is the BigQuery web UI which lets you run queries

  • directly in the browser.

  • So here, I just did a limit 10 to show you

  • what the table looks like.

  • I think I've got about 6,000 to 7,000 tweets in here so far.

  • So here I'm collecting the ID of the tweet, the text,

  • the created at, how many followers the user has,

  • what hashtags it finds that's returned from the Twitter API.

  • And then I've got this giant JSON sring of the response

  • from the Natural Language API.

  • So you're probably wondering, how

  • am I going to write a SQL query to parse that?

  • Well, BigQuery has a feature called user-defined functions,

  • which lets you write custom JavaScript functions that you

  • can run on rows in your table.

  • So over here, I've got a query that's

  • going to run on every tweet in this table.

  • And my JavaScript function is right here.

  • And what it's going to do is it's

  • going to count all of the adjectives

  • and then return that in my output table.

  • So if I run this query here, it's

  • running this custom JavaScript function

  • on all the tweets in my table, which I

  • think is about 6,000 right now.

  • It ran pretty fast, and I'm not using cached results.

  • So let's take a look.

  • We've got 405 uses of the word "more," "new," "great,"

  • "good," "late," "awesome."

  • You can see some more here.

  • So that's one example of a use case for the syntax annotation

  • feature of the Natural Language API.

  • You can go back to the slides.

  • So the Natural Language API lets you do analysis on text.

  • One other thing that you might want to do with text

  • is translate it.

  • You likely have users of your application all over the world.

  • It'd be useful to translate it into their host language.

  • And the Translation API exposes the functionality

  • of Google Translate to developers

  • and lets you translate text in 100-plus languages.

  • And for a second, let's talk about Google Translate.

  • I'm a big fan of Google Translate.

  • Has anyone here used it before?

  • It looks like a lot of people.

  • I use it when I travel all the time.

  • So a couple of months ago, I was on a trip to Japan.

  • I was at a restaurant where nobody spoke English,

  • and I really wanted to order octopus.

  • So I typed it into Google translate.

  • It turns out the word for that is "tako,"

  • which confused me a little bit.

  • I didn't to order an octopus taco,

  • although maybe that would be good.

  • So I just showed the person at the restaurant

  • my Google translate app, successfully got my octopus.

  • That's a picture of it right there.

  • But likely, you want to do more than translate the word

  • for octopus.

  • And that's why we have the Translation API, which

  • lets you translate text in your application

  • in many different languages.

  • And Airbnb is an example of a company that's

  • using the Translation API.

  • And what you might not know is that 60% of Airbnb bookings

  • connects people that are using the app in different languages

  • because people use Airbnb a lot, especially when they travel

  • internationally.

  • And so for all of those connections,

  • they're using to translate API to translate not only listings,

  • but also reviews and conversations

  • into the person's host's language.

  • And they found that this significantly

  • improves a guest likelihood to book

  • if it's translated into their host language.

  • And that's one example of someone using the Translation

  • API.

  • It's pretty self-explanatory, but I

  • wanted to show you a code snippet of how easy it

  • is to call the API.

  • This is some Python code that's making a request

  • to the Translation API.

  • And you can see here we just create a translate client.

  • And then we pass it the phrase we'd like to translate,

  • the target language.

  • It'll detect the original language for us.

  • And then we can print the result.

  • And one thing we've added to the Translation API

  • recently is neural machine translation.

  • And this greatly improves the underlying translation model.

  • And basically, the way it works is

  • with first-generation translation,

  • which is what we had before, it was translating

  • each word in a sentence separately.

  • So let's say you had a dictionary

  • and you were looking up word for word

  • each sentence in the dictionary and translating it

  • without understanding the context around that word, that

  • works pretty well.

  • But what neural machine translation does

  • is it actually looks at the surrounding words,

  • and it's able to understand the context

  • of the word in the sentence.

  • And it's able to produce much higher quality translations.

  • There's a great "New York Times" article with a lot more details

  • on how that model works.

  • You can find it at that Bitly link there.

  • If anyone wants to take a picture,

  • I'll leave it up for a second.

  • And just to show you some of the improvements

  • that neural machine translation brings--

  • this is a lot of text.

  • I know.

  • But what I did is I took a paragraph

  • from the Spanish version of "Harry Potter."

  • So this is the original text that the Spanish translator

  • wrote.

  • And then I showed you in first-generation translation

  • in the middle how it'll translate that

  • to English, and then in neural machine translation all

  • the way on the right-hand side.

  • And I bolded the improvements.

  • So we can look at a few of them.

  • The first bold word is--

  • it says, "which made drills."

  • It's describing the company where he works.

  • And then neural machine translation

  • is able to change that word to "manufactured,"

  • which is much more specific to the context of the sentence.

  • Another example is if we look at where

  • it's describing Mrs. Dursley's neck,

  • the first generation says "almost twice longer

  • than usual."

  • In the second version, it says "almost twice as long as

  • usual," which is a slight improvement.

  • And then if we look at the bottom,

  • it goes from "fence of the garden" to "garden fence."

  • And then in the last example, the first generation

  • used the pronoun "their."

  • And then neural machine translation

  • is able to identify the correct pronoun "her"

  • more specifically.

  • So just a quick example highlighting some improvements

  • that neural machine translation brings to the Translation API.

  • And on its own, the Translation API is pretty self-explanatory,

  • but I wanted to show you a small demo of a Python script

  • I wrote of how to combine different APIs together.

  • So in this demo, it'll take three types of text input.

  • It could take either raw text, audio, or an image of text.

  • And then we'll pass it through the Natural Language API.

  • And then finally, we'll translate it

  • into a few languages.

  • And then we'll translate it back to English

  • so that you can see the result.

  • So if we could switch back to the demo--

  • it looks like it's up here.

  • So I'm going to run the script--

  • python textify.py.

  • And it's going to tell me we're going

  • to send some text to the Natural Language API.

  • It supports English, Spanish, and Japanese.

  • And I have three options.

  • I can either type my text, record a file,

  • or send a photo of text.

  • So I'm going to type some text.

  • I'm going to say "We are using the Translation API.

  • It is awesome."

  • And we got a bunch of data back here.

  • So this is what the JSON response

  • looks like just for one token.

  • I didn't want to print the whole JSON blob here.

  • This is just for the token "we."

  • This is all the data it returned.

  • So it tells us it's a pronoun.

  • And a lot of these part of speech morphology data

  • is going to be unknown for English,

  • but it relates to other languages

  • that the API supports.

  • But it is able to tell us that it is a plural token

  • and it is in first person.

  • And it is the nominal subject of the sentence.

  • And we ran some sentiment analysis on it.

  • It says you seem very happy.

  • It was an excited sentence.

  • And it tells us the entities it found.

  • So it found Translation API.

  • And the way the entity analysis endpoint works

  • is it's able to identify entities even if they

  • don't have a Wikipedia URL.

  • So Translation API doesn't, but it's still

  • able to pull this out as an entity.

  • So if we were, for example, building an app to maybe route

  • customer feedback, we could say, OK, this feedback

  • is asking about the Translation API.

  • And then we could route it to the appropriate person.

  • So now we're going to translate this text.

  • And let's translate it into Japanese.

  • There we go.

  • So this is the version translated into Japanese.

  • I'm guessing most of us don't speak Japanese,

  • so I've translated it back to English.

  • And you can see that it did a pretty good job.

  • So I'm going to run the script once more.

  • And [AUDIO OUT] use an image.

  • So if we look over here, I've got just a generic resume

  • image.

  • We're going to pass it to the API.

  • So I'll clear the screen and run the text again.

  • We're going to send a photo this time.

  • And it is resume.jpg.

  • Sending it to the Vision API.

  • And the Vision API found, if we scroll up,

  • all this text in the image.

  • Using the new document text extraction,

  • it was able to pull essentially all the text from that resume.

  • There's an example of what a token returned.

  • And it found all these different entities.

  • And now we can translate it.

  • Let's translate it to German.

  • It's a lot of text there; I know.

  • But this is the resume translated into German,

  • and then again back to English.

  • So just an example of how you can combine multiple machine

  • learning APIs to do some cool text analysis.

  • And you can go back to the slides now.

  • So that was the Translation API.

  • And the last thing I want to talk about

  • is the Video Intelligence API.

  • How many of you saw it at the keynote yesterday?

  • Looks like most of you.

  • So the Video Intelligence API lets

  • you understand your videos entities at a shot, frame,

  • or video level.

  • So video level entities, it will tell you,

  • at a high level, what is this video about?

  • And then it'll give you more granular data

  • on what is happening in each scene of the video.

  • A company that's using this is Cantemo.

  • They're a media asset management company.

  • So a company or a user that has a lot of videos

  • would upload their videos to Cantemo,

  • and Cantemo would help them better understand those videos,

  • search their library, transcode videos.

  • And this is a quote from the VP of product development

  • at Cantemo.

  • He says, "Thanks to the Google Cloud Video Intelligence API,

  • we've been able to very quickly process and understand

  • the content of video down to the individual frame

  • with an impressively rich taxonomy."

  • So they're using the Video API to help

  • their customers better search their own video libraries.

  • And I'm going to show you the same demo that you

  • saw in the keynote, but we'll look at a different video

  • and go into a bit more detail.

  • So if we could switch back to the demo, here's the API.

  • And I've got a different video than I

  • showed in the keynote earlier.

  • This is a video that just shows you a tour of the Google Paris

  • office and a little bit of the neighborhood around it.

  • And I'll play the first bit of it.

  • It starts up by just showing some frames.

  • And then we'll get into a tour of the neighborhood

  • around the office.

  • And then we go inside.

  • It interviews some employees.

  • I won't play the whole thing.

  • But we can look at some of the labels it returned.

  • So it's able to identify this amusement ride, amusement park,

  • from the beginning.

  • We know there's a bunch of very short frames

  • in the beginning of that video.

  • It's able to see that it's a statue.

  • If we look at the fruit annotation,

  • it identifies a basket of fruit in this scene.

  • We can scroll down and look at a couple more labels--

  • landscaping and cuisine.

  • We see people getting some food.

  • And school-- here it thinks it's a school inside there.

  • So you can see we're able to get pretty granular data on what's

  • happening in each scene of the video.

  • And another thing the Video Intelligence API lets us do

  • is search a large video library.

  • So if we're a media publisher, we've

  • got petabytes of video data sitting in storage buckets.

  • It's otherwise pretty hard to search a large library

  • of video content.

  • You'd have to manually watch the videos looking

  • for a particular clip if you want to create, say,

  • like a highlight reel of a specific content

  • within your library.

  • Intelligence API makes this pretty easy,

  • because as you can see, all the data we get back on this video,

  • we can get that back on all the videos in our library, which

  • makes it pretty easy to just search for a specific entity

  • within our library.

  • So as I showed you in the keynote,

  • one example is let's say--

  • actually, first, let me show you the library.

  • So we've got a bunch of videos here, as you can see.

  • And let's say we'd like to search for all our baseball

  • videos.

  • We can see what we get back here.

  • And it shows us this video is almost entirely about baseball.

  • This one has fewer baseball clips.

  • And we can point to all of them specifically.

  • And then in this one, we see that moment from the--

  • not playing.

  • Let me try refreshing the page.

  • There we go.

  • We see that moment from the Year in Search video from last year

  • when the Cubs won the World Series.

  • I'm from Chicago, so I was pretty excited about that.

  • One more search that I showed you before is we

  • can find all of the beach clips in our videos.

  • So here, it's easy to, if we wanted to create a highlight

  • reel, if we were really missing the beach,

  • see all the beach clips in our videos.

  • It'd be super easy to do this using the Video Intelligence

  • API.

  • Now, since most of you saw this demo in the keynote,

  • I wanted to talk a little bit more about how I built it.

  • So you can go back to slides.

  • It was built by me and Alex Wolfe.

  • If you like the UI, you should give him a shoutout on Twitter,

  • @alexwolfe.

  • He would appreciate it.

  • So we worked on this together.

  • And this is an architecture diagram of how it works.

  • So the video API processing is being done on the back end.

  • And the way it works is you pass the Video Intelligence

  • API a Google Cloud Storage URL of your video.

  • And then it'll run the analysis on that video.

  • So I have a Cloud Storage bucket where

  • I'm storing all my videos.

  • And I've got a cloud function listening on that bucket.

  • And that cloud function will be triggered

  • anytime a new file is added to the bucket.

  • It will check if it's a video.

  • If it is, it'll send it to the Video Intelligence

  • API for processing.

  • And one cool thing about the API is

  • you can pass it the output URL of the URL of a file

  • you'd like it to write the JSON response to

  • in Google Cloud Storage.

  • So I've got a separate cloud storage

  • bucket where I'm storing all the video annotation JSON

  • responses.

  • And the API automatically writes all of my video annotations

  • to that bucket.

  • So the front end of my application

  • doesn't have to call the video API directly.

  • It's already got all that metadata in two separate cloud

  • storage buckets.

  • And the front end of the application

  • is a Node.js application built on Google App Engine.

  • That's a little bit about how the demo works.

  • And this is more granularly what the JSON

  • response looks like from the Video Intelligence API.

  • So this is a video of a tour of the White House.

  • And at this particular point in time,

  • it identifies the label "Bird's-eye view."

  • And it's able to tell us the start time and end

  • time in microseconds of where that label appears

  • in the video.

  • And it also returns a confidence score ranging from 0 to 1,

  • which will tell us how confident is the API that it successfully

  • identified this as bird's-eye view.

  • In this case, it is 96%.

  • So it is pretty confident that this is a bird's-eye view.

  • And then one more snippet of this video,

  • a portrait of George Washington-- and it's

  • able to successfully identify that this is a portrait.

  • It tells us the start time and end

  • time of where that is in the video, along with a confidence

  • score--

  • 83%.

  • So just an example of what the JSON looks

  • like that you get back from the Video Intelligence API.

  • That wraps up my tour of the APIs.

  • And if you all want to start using them today,

  • you can go to the try-it pages.

  • So for each of the API product pages,

  • as I showed you with Vision, there's

  • an in-browser demo where you can try out all the APIs directly

  • in the browser before writing any code

  • to see if it's right for you and your application.

  • So I definitely recommend checking that out.

  • I'll let you guys take a picture of that page before I switch.

  • OK, it looks like almost everyone has got it.

  • Some other talks that I recommend that are

  • related to machine learning--

  • BigQuery and Cloud Machine Learning Engine

  • was a talk that was yesterday.

  • All the videos from yesterday have already been posted.

  • So if there's any talks that you wanted to see that you missed,

  • you can go watch them on YouTube.

  • Another talk, Introduction to Video Intelligence--

  • so if you want to see a deep-dive on the Video

  • Intelligence API, [INAUDIBLE], who's

  • the product manager on that, is going to be giving that talk

  • today at 4:00 PM.

  • Highly recommend checking that talk out.

  • And then if you're more interested in the side

  • of building and training your own machine learning model,

  • there's a session tomorrow at 11:20

  • on the lifecycle of a machine learning model.

  • I definitely recommend checking out all three

  • of these sessions.

  • And if you can't make it, you can always

  • watch the videos on YouTube after the session.

  • So thank you.

  • That's all I've got.

  • [APPLAUSE]

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it