Subtitles section Play video Print subtitles JEFF DEAN: I'm really excited to be here. I think it was almost four years ago to the day that we were about 20 people sitting in a small conference room in one of the Google buildings. We've woken up early because we wanted to kind of time this for an early East Coast launch where we were turning on the TensorFlow.org website and releasing the first version of TensorFlow as an open source project. And I'm really, really excited to see what it's become. It's just remarkable to see the growth and all the different kinds of ways in which people have used this system for all kinds of interesting things around the world. So one thing that's interesting is the growth in the use of TensorFlow also kind of mirrors the growth in interest in machine learning and machine learning research generally around the world. So this is a graph showing the number of machine learning archive papers that have been posted over the last 10 years or so. And you can see it's growing quite, quite rapidly, much more quickly than you might expect. And that lower red line is kind of the nice doubling every couple of years growth rate, exponential growth rate we got used to in computing power, due to Moore's law for so many years. That's now kind of slowed down. But you can see that the machine learning research community is generating research ideas at faster than that rate, which is pretty remarkable. We've replaced computational growth with growth of ideas, and we'll see those both together will be important. And really, the excitement about machine learning is because we can now do things we couldn't do before, right? As little as five or six years ago, computers really couldn't see that well. And starting in about 2012, 2013, we started to have people use deep neural networks to try to tackle computer vision problems, image classification, object detection, things like that. And so now, using deep learning and deep neural networks, you can feed in the raw pixels of an image and fairly reliably get a prediction of what kind of object is in that image. Feed in the pixels there. Red, green, and blue values in a bunch of different coordinates, and you get out the prediction leopard. This works for speech as well. You can feed an audio wave forms, and by training on lots of audio wave forms and transcripts of what's being said in those wave forms, we can actually take a completely new recording and tell you what is being said amid a transcript. Bonjour, comment allez-vous? You can even combine these ideas and have models that take in pixels, and instead of just predicting classifications of what are in the object, it can actually write a short sentence, a short caption, that a human might write about the image-- a cheetah lying on top of a car. That's one of my vacation photos, which was kind of cool. And so just to show the progress in computer vision, in 2011, Stanford hosts an ImageNet contest every year to see how well computer vision systems can predict one of 1,000 categories in a full color image. And you get about a million images to train on, and then you get a bunch of test images your model has never seen before. And you need to make a prediction. In 2011, the winning entrant got 26% error, right? So you can kind of make out what that is. But it's pretty hard to tell. We know from human experiment that human error of a well-trained human, someone who's practiced at this particular task and really understands 1,000 categories, gets about 5% error. So this is not a trivial task. And in 2016, the winning entrant got 3% error. So just look at that tremendous progress in the ability of computers to resolve and understand computer imagery and have computer vision that actually works. This is remarkably important in the world, because now we have systems that can perceive the world around us and we can do all kinds of really interesting things about. We've seen similar progress in speech recognition and language translation and things like that. So for the rest of the talk, I'd like to kind of structure it around this nice list of 14 challenges that the US National Academy of Engineering put out and felt like these were important things for the science and engineering communities to work on for the next 100 years. They put this out in 2008 and came up with this list of 14 things after some deliberation. And I think you'll agree that these are sort of pretty good large challenging problems, that if we actually make progress on them, that we'll actually have a lot of progress in the world. We'll be healthier. We'll be able to learn things better. We'll be able to develop better medicines. We'll have all kinds of interesting energy solutions. So I'm going to talk about a few of these. And the first one I'll talk about is restoring and improving urban infrastructure. So we're on the cusp of the sort of widespread commercialization of a really interesting new technology that's going to really change how we think about transportation. And that is autonomous vehicles. And this is a problem that has been worked on for quite a while, but it's now starting to look like it's actually completely possible and commercially viable to produce these things. And a lot of the reason is that we now have computer vision and machine learning techniques that can take in sort of raw forms of data that the sensors on these cars collect. So they have the spinning LIDARs on the top that give them 3D point cloud data. They have cameras in lots of different directions. They have radar in the front bumper and the rear bumper. And they can really take all this raw information in, and with a deep neural network, fuse it all together to build a high level understanding of what is going on around the car. Or is it another car to my side, there's a pedestrian up here to the left, there's a light post over there. I don't really need to worry about that moving. And really help to understand the environment in which they're operating and then what actions can they take in the world that are both legal, safe, obey all the traffic laws, and get them from A to B. And this is not some distant far-off dream. Alphabet's Waymo subsidiary has actually been running tests in Phoenix, Arizona. Normally when they run tests, they have a safety driver in the front seat, ready to take over if the car does something kind of unexpected. But for the last year or so, they've been running tests in Phoenix with real passengers in the backseat and no safety drivers in the front seat, running around suburban Phoenix. So suburban Phoenix is a slightly easier training ground than, say, downtown Manhattan or San Francisco. But it's still something that is like not really far off. It's something that's actually happening. And this is really possible because of things like machine learning and the use of TensorFlow in these systems. Another one that I'm really, really excited about is advance health informatics. This is a really broad area, and I think there's lots and lots of ways that machine learning and the use of health data can be used to make better health care decisions for people. So I'll talk about one of them. And really, I think the potential here is that we can use machine learning to bring the wisdom of experts through a machine learning model anywhere in the world. And that's really a huge, huge opportunity. So let's look at this through one problem we've been working on for a while, which is diabetic retinopathy. So diabetic retinopathy is the fastest growing cause of preventable blindness in the world. And screening every year, if you're at risk for this, and if you have diabetes or early sort of symptoms that make it likely you might develop diabetes, you should really get screened every year. So there's 400 million people around the world that should be screened every year. But the screening is really specialized. Doctors can't do it. You really need ophthalmologist level of training in order to do this effectively. And the impact of the shortage is significant. So in India, for example, there's a shortage of 127,000 eye doctors to do this sort of screening. And as a result, 45% of patients who are diagnosed to this disease actually have suffered either full or partial vision loss before they're actually diagnosed and then treated. And this is completely tragic because this disease, if you catch it in time, is completely treatable. There's a very simple 99% effective treatment that we just need to make sure that the right people get treated at the right time. So what can you do? So, it turns out diabetic retinopathy screening is also a computer vision problem, and the progress we've made on general computer vision problems where you want to take a picture and tell if that's a leopard or an aircraft carrier or a car actually also works for diabetic retinopathy. So you can take a retinal image, which is what the screening camera, sort of the raw data that comes off the screening camera, and try to feed that into a model that predicts 1, 2, 3, 4, or 5. That's how these things are graded, 1 being no diabetic retinopathy, 5 being proliferative, and the other numbers being in between. So it turns out you can get a collection of data of retinal images and have ophthalmologists label them. Turns out if you ask two ophthalmologists to label the same image, they agree with each other 60% of the time on the number 1, 2, 3, 4, or 5. But perhaps slightly scarier if you ask the same ophthalmologist to grade the same image a few hours apart, they agree with themselves 65% of the time. But you can fix this by actually getting each image labeled by a lot of ophthalmologists, so you'll get it labeled by seven ophthalmologists. If five of them say it's a 2, and two of them say it's a 3, it's probably more like a 2 than a 3. Eventually, you have a nice, high quality data set you can train on. Like many machine learning problems, high quality data is the right raw ingredient. But then you can apply, basically, an off-the-shelf computer vision model trained on this data set. And now you can get a model that is on par or perhaps slightly better than the average board certified ophthalmologist in the US, which is pretty amazing. It turns out you can actually do better than that. And if you get the data labeled by retinal specialists, people who have more training in retinal disease and change the protocol by which you label things, you get three retinal specialists to look at an image, discuss it amongst themselves, and come up with what's called a sort of coordinated assessment and one number. Then you can train a model and now be on par with retinal specialists, which is kind of the gold standard of care in this area. And that's something you can now take and distribute widely around the world. So one issue particularly with health care kinds of problems is you want explainable models. You want to be able to explain to a clinician why is this person, why do we think this person has moderate diabetic retinopathy. So you can take a retinal image like this, and one of the things that really helps is if you can show in the model's assessment why this is a 2 and not a 3.