Subtitles section Play video Print subtitles [MUSIC PLAYING] LILY PENG: Hi everybody. My name is Lily Peng. I'm a physician by training and I work on the Google medical-- well, Google AI health-care team. I am a product manager. And today we're going to talk to you about a couple of projects that we have been working on in our group. So first off, I think you'll get a lot of this, so I'm not going to go over this too much. But because we apply deep learning to medical information, I kind of wanted to just define a few terms that get used quite a bit but are somewhat poorly defined. So first off, artificial intelligence-- this is a pretty broad term and it encompasses that grand project to build a nonhuman intelligence. Machine learning is a particular type of artificial intelligence, I suppose, that teaches machines to be smarter. And deep learning is a particular type of machine learning which you guys have probably heard about quite a bit and will hear about quite a bit more. So first of all, what is deep learning? So it's a modern reincarnation of artificial neural networks, which actually was invented in the 1960s. It's a collection of simple trainable units, organized in layers. And they work together to solve or model complicated tasks. So in general, with smaller data sets and limited compute, which is what we had in the 1980s and '90s, other approaches generally work better. But with larger data sets and larger model sizes and more compute power, we find that neural networks work much better. So there's actually just two takeaways that I want you guys to get from this slide. One is that deep learning trains algorithms that are very accurate when given enough data. And two, that deep learning can do this without feature engineering. And that means without explicitly writing the rules. So what do I mean by that? Well in traditional computer vision, we spend a lot of time writing the rules that a machine should follow to make a certain prediction task. In convolutional neural networks, we actually spend very little time in feature engineering and writing these rules. Most of the time we spend in data preparation and numerical optimization and model architecture. So I get this question quite a bit. And the question is, how much data is enough data for a deep neural network? Well in general, more is better. But there are diminishing returns beyond a certain point. And a general rule of thumb is that we like to have about 5,000 positives per class. But the key thing is good and relevant data-- so garbage in, garbage out. The model will predict very well what you ask it to predict. So when you think about where machine learning, and especially deep learning, can make the biggest impact, it's really in places where there's lots of data to look through. One of our directors, Greg Corrado, puts it best. Deep learning is really good for tasks that you've done 10,000 times, and on the 10,001st time, you're just sick of it and you don't want to do it anymore. So this is really great for health care in screening applications where you see a lot of patients that are potentially normal. It's also great where expertise is limited. So here on the right you see a graph of the shortage of radiologists kind of worldwide. And this is also true for other medical specialties, but radiologists are sort of here. And we basically see a worldwide shortage of medical expertise. So one of the screening applications that our group has worked on is with diabetic retinopathy. We call it DR because it's easier to say than diabetic retinopathy. And it's the fastest growing cause of preventable blindness. All 450 million people with diabetes are at risk and need to be screened once a year. This is done by taking a picture of the back of the eye with a special camera, as you see here. And the picture looks a little bit like that. And so what a doctor does when they get an image like this is they grade it on a scale of one to five from no disease, so healthy, to proliferate disease, which is the end stage. And when they do grading, they look for sometimes very subtle findings, little things called micro aneurysms that are outpouchings in the blood vessels of the eye. And that indicates how bad your diabetes is affecting your vision. So unfortunately in many parts of the world, there are just not enough eye doctors to do this task. So with one of our partners in India, or actually a couple of our partners in India, there is a shortage of 127,000 eye doctors in the nation. And as a result, about 45% of patients suffer some sort of vision loss before the disease is detected. Now as you recall, I said that this disease was completely preventable. So again, this is something that should not be happening. So what we decided to do was we partnered with a couple of hospitals in India, as well as a screening provider in the US. And we got about 130,000 images for this first go around. We hired 54 ophthalmologists and built a labeling tool. And then the 54 ophthalmologists actually graded these images on this scale, from no DR to proliferative. The interesting thing was that there was actually a little bit of variability in how doctors call the images. And so we actually got about 880,000 diagnoses in all. And with this labelled data set, we put it through a fairly well known convolutional neural net. This is called Inception. I think lot of you guys may be familiar with it. It's generally used to classify cats and dogs for our photo app or for some other search apps. And we just repurposed it to do fundus images. So the other thing that we learned while we were doing this work was that while it was really useful to have this five-point diagnosis, it was also incredibly useful to give doctors feedback on housekeeping predictions like image quality, whether this is a left or right eye, or which part of the retina this is. So we added that to the network as well. So how well does it do? So this is the first version of our model that we published in a medical journal in 2016 I believe. And right here on the left is a chart of the performance of the model in aggregate over about 10,000 images. Sensitivity is on the y-axis, and then 1 minus specificity is on the x-axis. So sensitivity is a percentage of the time when a patient has a disease and you've got that right, when the model was calling the disease. And then specificity is the proportion of patients that don't have the disease that the model or the doctor got right. And you can see you want something with high sensitivity and high specificity. And so up and to the right-- or up and to the left is good. And you can see here on the chart that the little dots are the doctors that were grading the same set. So we get pretty close to the doctor. And these are board-certified US physicians. And these are ophthalmologists, general ophthalmologists by training. In fact if you look at the F score, which is a combined measure of both sensitivity and specificity, we're just a little better than the median ophthalmologist in this particular study. So since then we've improved the model. So last year about December 2016 we were sort of on par with generalists. And then this year-- this is a new paper that we published-- we actually used retinal specialists to grade the images. So they're specialists. We also had them argue when they disagreed about what the diagnosis was. And you can see when we train the model using that as the ground truth, the model predicted that quite well as well. So this year we're sort of on par with the retina specialists. And this weighted kappa thing is just agreement on the five-class level. And you can see that, essentially, we're sort of in between the ophthalmologists and the retina specialists, in fact kind of in between the retinal specialists. Another thing that we've been working on beyond improving the models is actually trying to have the networks explain how it's making a prediction. So again, taking a playbook or a play out of the playbook from the consumer world, we started using this technique called show me where. And this is where using an image, we actually generate a heat map of where the relevant pixels are for this particular prediction. So here you can see a picture of a Pomeranian. And the heat map shows you that there is something in the face of the Pomeranian that makes it look Pomeranian-y. And on the right here, you kind of have an Afghan hound, and the network's highlighting the Afghan hound. So using this very similar technique, we applied it to the fundus images and we said, show me where. So this is a case of mild disease. And I can tell it's mild disease because-- well, it looks completely normal to me. I can't tell that there is any disease there. But a highly trained doctor would be able to pick out little thing called microaneurysms where the green spots are. Here's a picture of moderate disease. And this is a little worse because you can see some bleeding at the ends here. And actually I don't know if I can signal, but there's a bleeding there. And the heat map-- so here's a heat map. You can see that it picks up the bleeding. But there's two artifacts in this image. So there is a dust spot, just like a little dark spot. And then there is this little reflection in the middle of the image. And you could tell that the model just ignores it, essentially. So what's next? We trained a model. We showed that it's somewhat explainable. We think it's doing the right thing. What's next? Well, we actually have to deploy this into health-care systems. And we're partnering with health-care providers and companies to bring this to patients. And actually Dr. Jess Mega, who is going to speak after me, is going to have a little more details about this effort there. So I've given the screening application. And here's an application in diagnosis that we're working on. So in this particular example, we're talking about a disease-- well, we're talking about breast cancer, but we're talking about metastases of breast cancer into nearby lymph nodes. So when a patient is diagnosed with breast cancer and the primary breast cancer is removed, the surgeon spends some time taking out what we call lymph nodes so that we can examine to see whether or not the breast cancer has metastasized to those nodes. And that has an impact on how you treat the patient. So reading these lymph nodes is actually not an easy task. And in fact about in 24% of biopsies when they went back to look at them, the 24% had a change in nodal status. Which means that if it was positive, it was read negative, and it was negative, read positive. So that's a really big deal. It's one in four.