Placeholder Image

Subtitles section Play video

  • Most of us think of motion as a very visual thing.

  • If I walk across this stage or gesture with my hands while I speak,

  • that motion is something that you can see.

  • But there's a world of important motion that's too subtle for the human eye,

  • and over the past few years,

  • we've started to find that cameras

  • can often see this motion even when humans can't.

  • So let me show you what I mean.

  • On the left here, you see video of a person's wrist,

  • and on the right, you see video of a sleeping infant,

  • but if I didn't tell you that these were videos,

  • you might assume that you were looking at two regular images,

  • because in both cases,

  • these videos appear to be almost completely still.

  • But there's actually a lot of subtle motion going on here,

  • and if you were to touch the wrist on the left,

  • you would feel a pulse,

  • and if you were to hold the infant on the right,

  • you would feel the rise and fall of her chest

  • as she took each breath.

  • And these motions carry a lot of significance,

  • but they're usually too subtle for us to see,

  • so instead, we have to observe them

  • through direct contact, through touch.

  • But a few years ago,

  • my colleagues at MIT developed what they call a motion microscope,

  • which is software that finds these subtle motions in video

  • and amplifies them so that they become large enough for us to see.

  • And so, if we use their software on the left video,

  • it lets us see the pulse in this wrist,

  • and if we were to count that pulse,

  • we could even figure out this person's heart rate.

  • And if we used the same software on the right video,

  • it lets us see each breath that this infant takes,

  • and we can use this as a contact-free way to monitor her breathing.

  • And so this technology is really powerful because it takes these phenomena

  • that we normally have to experience through touch

  • and it lets us capture them visually and non-invasively.

  • So a couple years ago, I started working with the folks that created that software,

  • and we decided to pursue a crazy idea.

  • We thought, it's cool that we can use software

  • to visualize tiny motions like this,

  • and you can almost think of it as a way to extend our sense of touch.

  • But what if we could do the same thing with our ability to hear?

  • What if we could use video to capture the vibrations of sound,

  • which are just another kind of motion,

  • and turn everything that we see into a microphone?

  • Now, this is a bit of a strange idea,

  • so let me try to put it in perspective for you.

  • Traditional microphones work by converting the motion

  • of an internal diaphragm into an electrical signal,

  • and that diaphragm is designed to move readily with sound

  • so that its motion can be recorded and interpreted as audio.

  • But sound causes all objects to vibrate.

  • Those vibrations are just usually too subtle and too fast for us to see.

  • So what if we record them with a high-speed camera

  • and then use software to extract tiny motions

  • from our high-speed video,

  • and analyze those motions to figure out what sounds created them?

  • This would let us turn visible objects into visual microphones from a distance.

  • And so we tried this out,

  • and here's one of our experiments,

  • where we took this potted plant that you see on the right

  • and we filmed it with a high-speed camera

  • while a nearby loudspeaker played this sound.

  • (Music: "Mary Had a Little Lamb")

  • And so here's the video that we recorded,

  • and we recorded it at thousands of frames per second,

  • but even if you look very closely,

  • all you'll see are some leaves

  • that are pretty much just sitting there doing nothing,

  • because our sound only moved those leaves by about a micrometer.

  • That's one ten-thousandth of a centimeter,

  • which spans somewhere between a hundredth and a thousandth

  • of a pixel in this image.

  • So you can squint all you want,

  • but motion that small is pretty much perceptually invisible.

  • But it turns out that something can be perceptually invisible

  • and still be numerically significant,

  • because with the right algorithms,

  • we can take this silent, seemingly still video

  • and we can recover this sound.

  • (Music: "Mary Had a Little Lamb")

  • (Applause)

  • So how is this possible?

  • How can we get so much information out of so little motion?

  • Well, let's say that those leaves move by just a single micrometer,

  • and let's say that that shifts our image by just a thousandth of a pixel.

  • That may not seem like much,

  • but a single frame of video

  • may have hundreds of thousands of pixels in it,

  • and so if we combine all of the tiny motions that we see

  • from across that entire image,

  • then suddenly a thousandth of a pixel

  • can start to add up to something pretty significant.

  • On a personal note, we were pretty psyched when we figured this out.

  • (Laughter)

  • But even with the right algorithm,

  • we were still missing a pretty important piece of the puzzle.

  • You see, there are a lot of factors that affect when and how well

  • this technique will work.

  • There's the object and how far away it is;

  • there's the camera and the lens that you use;

  • how much light is shining on the object and how loud your sound is.

  • And even with the right algorithm,

  • we had to be very careful with our early experiments,

  • because if we got any of these factors wrong,

  • there was no way to tell what the problem was.

  • We would just get noise back.

  • And so a lot of our early experiments looked like this.

  • And so here I am,

  • and on the bottom left, you can kind of see our high-speed camera,

  • which is pointed at a bag of chips,

  • and the whole thing is lit by these bright lamps.

  • And like I said, we had to be very careful in these early experiments,

  • so this is how it went down.

  • (Video) Abe Davis: Three, two, one, go.

  • Mary had a little lamb! Little lamb! Little lamb!

  • (Laughter)

  • AD: So this experiment looks completely ridiculous.

  • (Laughter)

  • I mean, I'm screaming at a bag of chips --

  • (Laughter) --

  • and we're blasting it with so much light,

  • we literally melted the first bag we tried this on. (Laughter)

  • But ridiculous as this experiment looks,

  • it was actually really important,

  • because we were able to recover this sound.

  • (Audio) Mary had a little lamb! Little lamb! Little lamb!

  • (Applause)

  • AD: And this was really significant,

  • because it was the first time we recovered intelligible human speech

  • from silent video of an object.

  • And so it gave us this point of reference,

  • and gradually we could start to modify the experiment,

  • using different objects or moving the object further away,

  • using less light or quieter sounds.

  • And we analyzed all of these experiments

  • until we really understood the limits of our technique,

  • because once we understood those limits,

  • we could figure out how to push them.

  • And that led to experiments like this one,

  • where again, I'm going to speak to a bag of chips,

  • but this time we've moved our camera about 15 feet away,

  • outside, behind a soundproof window,

  • and the whole thing is lit by only natural sunlight.

  • And so here's the video that we captured.

  • And this is what things sounded like from inside, next to the bag of chips.

  • (Audio) Mary had a little lamb whose fleece was white as snow,

  • and everywhere that Mary went, that lamb was sure to go.

  • AD: And here's what we were able to recover from our silent video

  • captured outside behind that window.

  • (Audio) Mary had a little lamb whose fleece was white as snow,

  • and everywhere that Mary went, that lamb was sure to go.

  • (Applause)

  • AD: And there are other ways that we can push these limits as well.

  • So here's a quieter experiment

  • where we filmed some earphones plugged into a laptop computer,

  • and in this case, our goal was to recover the music that was playing on that laptop

  • from just silent video

  • of these two little plastic earphones,

  • and we were able to do this so well

  • that I could even Shazam our results.

  • (Laughter)

  • (Music: "Under Pressure" by Queen)

  • (Applause)

  • And we can also push things by changing the hardware that we use.

  • Because the experiments I've shown you so far

  • were done with a camera, a high-speed camera,

  • that can record video about a 100 times faster

  • than most cell phones,

  • but we've also found a way to use this technique

  • with more regular cameras,

  • and we do that by taking advantage of what's called a rolling shutter.

  • You see, most cameras record images one row at a time,

  • and so if an object moves during the recording of a single image,

  • there's a slight time delay between each row,

  • and this causes slight artifacts

  • that get coded into each frame of a video.

  • And so what we found is that by analyzing these artifacts,

  • we can actually recover sound using a modified version of our algorithm.

  • So here's an experiment we did

  • where we filmed a bag of candy

  • while a nearby loudspeaker played

  • the same "Mary Had a Little Lamb" music from before,

  • but this time, we used just a regular store-bought camera,

  • and so in a second, I'll play for you the sound that we recovered,

  • and it's going to sound distorted this time,

  • but listen and see if you can still recognize the music.

  • (Audio: "Mary Had a Little Lamb")

  • And so, again, that sounds distorted,

  • but what's really amazing here is that we were able to do this

  • with something that you could literally run out

  • and pick up at a Best Buy.

  • So at this point,

  • a lot of people see this work,

  • and they immediately think about surveillance.

  • And to be fair,

  • it's not hard to imagine how you might use this technology to spy on someone.

  • But keep in mind that there's already a lot of very mature technology

  • out there for surveillance.

  • In fact, people have been using lasers

  • to eavesdrop on objects from a distance for decades.

  • But what's really new here,

  • what's really different,

  • is that now we have a way to picture the vibrations of an object,

  • which gives us a new lens through which to look at the world,

  • and we can use that lens

  • to learn not just about forces like sound that cause an object to vibrate,

  • but also about the object itself.

  • And so I want to take a step back

  • and think about how that might change the ways that we use video,

  • because we usually use video to look at things,

  • and I've just shown you how we can use it

  • to listen to things.

  • But there's another important way that we learn about the world:

  • that's by interacting with it.

  • We push and pull and poke and prod things.

  • We shake things and see what happens.

  • And that's something that video still won't let us do,

  • at least not traditionally.

  • So I want to show you some new work,

  • and this is based on an idea I had just a few months ago,

  • so this is actually the first time I've shown it to a public audience.

  • And the basic idea is that we're going to use the vibrations in a video

  • to capture objects in a way that will let us interact with them

  • and see how they react to us.

  • So here's an object,

  • and in this case, it's a wire figure in the shape of a human,

  • and we're going to film that object with just a regular camera.

  • So there's nothing special about this camera.

  • In fact, I've actually done this with my cell phone before.

  • But we do want to see the object vibrate,

  • so to make that happen,

  • we're just going to bang a little bit on the surface where it's resting

  • while we record this video.

  • So that's it: just five seconds of regular video,

  • while we bang on this surface,

  • and we're going to use the vibrations in that video

  • to learn about the structural and material properties of our object,

  • and we're going to use that information to create something new and interactive.

  • And so here's what we've created.

  • And it looks like a regular image,

  • but this isn't an image, and it's not a video,

  • because now I can take my mouse

  • and I can start interacting with the object.

  • And so what you see here

  • is a simulation of how this object

  • would respond to new forces that we've never seen before,

  • and we created it from just five seconds of regular video.