Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • LILY PENG: Hi everybody.

  • My name is Lily Peng.

  • I'm a physician by training and I work on the Google medical--

  • well, Google AI health-care team.

  • I am a product manager.

  • And today we're going to talk to you about a couple of projects

  • that we have been working on in our group.

  • So first off, I think you'll get a lot of this,

  • so I'm not going to go over this too much.

  • But because we apply deep learning

  • to medical information, I kind of wanted

  • to just define a few terms that get used quite a bit

  • but are somewhat poorly defined.

  • So first off, artificial intelligence-- this

  • is a pretty broad term and it encompasses that grand project

  • to build a nonhuman intelligence.

  • Machine learning is a particular type

  • of artificial intelligence, I suppose,

  • that teaches machines to be smarter.

  • And deep learning is a particular type

  • of machine learning which you guys have probably

  • heard about quite a bit and will hear about quite a bit more.

  • So first of all, what is deep learning?

  • So it's a modern reincarnation of artificial neural networks,

  • which actually was invented in the 1960s.

  • It's a collection of simple trainable units, organized

  • in layers.

  • And they work together to solve or model complicated tasks.

  • So in general, with smaller data sets and limited compute,

  • which is what we had in the 1980s and '90s,

  • other approaches generally work better.

  • But with larger data sets and larger model sizes

  • and more compute power, we find that neural networks

  • work much better.

  • So there's actually just two takeaways

  • that I want you guys to get from this slide.

  • One is that deep learning trains algorithms

  • that are very accurate when given enough data.

  • And two, that deep learning can do this

  • without feature engineering.

  • And that means without explicitly writing the rules.

  • So what do I mean by that?

  • Well in traditional computer vision,

  • we spend a lot of time writing the rules

  • that a machine should follow to make a certain prediction task.

  • In convolutional neural networks,

  • we actually spend very little time in feature

  • engineering and writing these rules.

  • Most of the time we spend in data preparation

  • and numerical optimization and model architecture.

  • So I get this question quite a bit.

  • And the question is, how much data is enough data

  • for a deep neural network?

  • Well in general, more is better.

  • But there are diminishing returns beyond a certain point.

  • And a general rule of thumb is that we

  • like to have about 5,000 positives per class.

  • But the key thing is good and relevant data--

  • so garbage in, garbage out.

  • The model will predict very well what you ask it to predict.

  • So when you think about where machine learning,

  • and especially deep learning, can make the biggest impact,

  • it's really in places where there's

  • lots of data to look through.

  • One of our directors, Greg Corrado, puts it best.

  • Deep learning is really good for tasks that you've done 10,000

  • times, and on the 10,001st time, you're just sick of it and you

  • don't want to do it anymore.

  • So this is really great for health care in screening

  • applications where you see a lot of patients

  • that are potentially normal.

  • It's also great where expertise is limited.

  • So here on the right you see a graph

  • of the shortage of radiologists kind of worldwide.

  • And this is also true for other medical specialties,

  • but radiologists are sort of here.

  • And we basically see a worldwide shortage of medical expertise.

  • So one of the screening applications

  • that our group has worked on is with diabetic retinopathy.

  • We call it DR because it's easier

  • to say than diabetic retinopathy.

  • And it's the fastest growing cause of preventable blindness.

  • All 450 million people with diabetes are at risk and need

  • to be screened once a year.

  • This is done by taking a picture of the back

  • of the eye with a special camera, as you see here.

  • And the picture looks a little bit like that.

  • And so what a doctor does when they get an image like this

  • is they grade it on a scale of one to five from no disease,

  • so healthy, to proliferate disease,

  • which is the end stage.

  • And when they do grading, they look for sometimes very subtle

  • findings, little things called micro aneurysms

  • that are outpouchings in the blood vessels of the eye.

  • And that indicates how bad your diabetes

  • is affecting your vision.

  • So unfortunately in many parts of the world,

  • there are just not enough eye doctors to do this task.

  • So with one of our partners in India,

  • or actually a couple of our partners in India,

  • there is a shortage of 127,000 eye doctors in the nation.

  • And as a result, about 45% of patients

  • suffer some sort of vision loss before the disease is detected.

  • Now as you recall, I said that this disease

  • was completely preventable.

  • So again, this is something that should not be happening.

  • So what we decided to do was we partnered

  • with a couple of hospitals in India,

  • as well as a screening provider in the US.

  • And we got about 130,000 images for this first go around.

  • We hired 54 ophthalmologists and built a labeling tool.

  • And then the 54 ophthalmologists actually

  • graded these images on this scale,

  • from no DR to proliferative.

  • The interesting thing was that there was actually

  • a little bit of variability in how doctors call the images.

  • And so we actually got about 880,000 diagnoses in all.

  • And with this labelled data set, we put it through a fairly well

  • known convolutional neural net.

  • This is called Inception.

  • I think lot of you guys may be familiar with it.

  • It's generally used to classify cats and dogs for our photo app

  • or for some other search apps.

  • And we just repurposed it to do fundus images.

  • So the other thing that we learned

  • while we were doing this work was

  • that while it was really useful to have

  • this five-point diagnosis, it was also

  • incredibly useful to give doctors

  • feedback on housekeeping predictions like image quality,

  • whether this is a left or right eye,

  • or which part of the retina this is.

  • So we added that to the network as well.

  • So how well does it do?

  • So this is the first version of our model

  • that we published in a medical journal in 2016 I believe.

  • And right here on the left is a chart

  • of the performance of the model in aggregate

  • over about 10,000 images.

  • Sensitivity is on the y-axis, and then 1 minus specificity

  • is on the x-axis.

  • So sensitivity is a percentage of the time when

  • a patient has a disease and you've

  • got that right, when the model was calling the disease.

  • And then specificity is the proportion

  • of patients that don't have the disease that the model

  • or the doctor got right.

  • And you can see you want something

  • with high sensitivity and high specificity.

  • And so up and to the right--

  • or up and to the left is good.

  • And you can see here on the chart

  • that the little dots are the doctors that

  • were grading the same set.

  • So we get pretty close to the doctor.

  • And these are board-certified US physicians.

  • And these are ophthalmologists, general ophthalmologists

  • by training.

  • In fact if you look at the F score, which

  • is a combined measure of both sensitivity and specificity,

  • we're just a little better than the median ophthalmologist

  • in this particular study.

  • So since then we've improved the model.

  • So last year about December 2016 we were sort of on par

  • with generalists.

  • And then this year--

  • this is a new paper that we published--

  • we actually used retinal specialists

  • to grade the images.

  • So they're specialists.

  • We also had them argue when they disagreed

  • about what the diagnosis was.

  • And you can see when we train the model using

  • that as the ground truth, the model predicted that quite well

  • as well.

  • So this year we're sort of on par

  • with the retina specialists.

  • And this weighted kappa thing is just

  • agreement on the five-class level.

  • And you can see that, essentially, we're

  • sort of in between the ophthalmologists and the retina

  • specialists, in fact kind of in between

  • the retinal specialists.

  • Another thing that we've been working on

  • beyond improving the models is actually

  • trying to have the networks explain

  • how it's making a prediction.

  • So again, taking a playbook or a play

  • out of the playbook from the consumer world,

  • we started using this technique called show me where.

  • And this is where using an image,

  • we actually generate a heat map of where

  • the relevant pixels are for this particular prediction.

  • So here you can see a picture of a Pomeranian.

  • And the heat map shows you that there

  • is something in the face of the Pomeranian

  • that makes it look Pomeranian-y.

  • And on the right here, you kind of have an Afghan hound,

  • and the network's highlighting the Afghan hound.

  • So using this very similar technique,

  • we applied it to the fundus images

  • and we said, show me where.

  • So this is a case of mild disease.

  • And I can tell it's mild disease because--

  • well, it looks completely normal to me.

  • I can't tell that there is any disease there.

  • But a highly trained doctor would

  • be able to pick out little thing called microaneurysms

  • where the green spots are.

  • Here's a picture of moderate disease.

  • And this is a little worse because you can see

  • some bleeding at the ends here.

  • And actually I don't know if I can signal,

  • but there's a bleeding there.

  • And the heat map--

  • so here's a heat map.

  • You can see that it picks up the bleeding.

  • But there's two artifacts in this image.

  • So there is a dust spot, just like a little dark spot.

  • And then there is this little reflection

  • in the middle of the image.

  • And you could tell that the model just

  • ignores it, essentially.

  • So what's next?

  • We trained a model.

  • We showed that it's somewhat explainable.

  • We think it's doing the right thing.

  • What's next?

  • Well, we actually have to deploy this into health-care systems.

  • And we're partnering with health-care providers

  • and companies to bring this to patients.

  • And actually Dr. Jess Mega, who is going to speak after me,

  • is going to have a little more details about this effort

  • there.

  • So I've given the screening application.

  • And here's an application in diagnosis

  • that we're working on.

  • So in this particular example, we're talking about a disease--

  • well, we're talking about breast cancer,

  • but we're talking about metastases of breast cancer

  • into nearby lymph nodes.

  • So when a patient is diagnosed with breast cancer

  • and the primary breast cancer is removed,

  • the surgeon spends some time taking out

  • what we call lymph nodes so that we can examine

  • to see whether or not the breast cancer has metastasized

  • to those nodes.

  • And that has an impact on how you treat the patient.

  • So reading these lymph nodes is actually not an easy task.

  • And in fact about in 24% of biopsies when they went back

  • to look at them, the 24% had a change in nodal status.

  • Which means that if it was positive, it was read negative,

  • and it was negative, read positive.

  • So that's a really big deal.

  • It's one in four.