Subtitles section Play video Print subtitles Imagine if you could record your life -- everything you said, everything you did, available in a perfect memory store at your fingertips, so you could go back and find memorable moments and relive them, or sift through traces of time and discover patterns in your own life that previously had gone undiscovered. Well that's exactly the journey that my family began five and a half years ago. This is my wife and collaborator, Rupal. And on this day, at this moment, we walked into the house with our first child, our beautiful baby boy. And we walked into a house with a very special home video recording system. (Video) Man: Okay. Deb Roy: This moment and thousands of other moments special for us were captured in our home because in every room in the house, if you looked up, you'd see a camera and a microphone, and if you looked down, you'd get this bird's-eye view of the room. Here's our living room, the baby bedroom, kitchen, dining room and the rest of the house. And all of these fed into a disc array that was designed for a continuous capture. So here we are flying through a day in our home as we move from sunlit morning through incandescent evening and, finally, lights out for the day. Over the course of three years, we recorded eight to 10 hours a day, amassing roughly a quarter-million hours of multi-track audio and video. So you're looking at a piece of what is by far the largest home video collection ever made. (Laughter) And what this data represents for our family at a personal level, the impact has already been immense, and we're still learning its value. Countless moments of unsolicited natural moments, not posed moments, are captured there, and we're starting to learn how to discover them and find them. But there's also a scientific reason that drove this project, which was to use this natural longitudinal data to understand the process of how a child learns language -- that child being my son. And so with many privacy provisions put in place to protect everyone who was recorded in the data, we made elements of the data available to my trusted research team at MIT so we could start teasing apart patterns in this massive data set, trying to understand the influence of social environments on language acquisition. So we're looking here at one of the first things we started to do. This is my wife and I cooking breakfast in the kitchen, and as we move through space and through time, a very everyday pattern of life in the kitchen. In order to convert this opaque, 90,000 hours of video into something that we could start to see, we use motion analysis to pull out, as we move through space and through time, what we call space-time worms. And this has become part of our toolkit for being able to look and see where the activities are in the data, and with it, trace the pattern of, in particular, where my son moved throughout the home, so that we could focus our transcription efforts, all of the speech environment around my son -- all of the words that he heard from myself, my wife, our nanny, and over time, the words he began to produce. So with that technology and that data and the ability to, with machine assistance, transcribe speech, we've now transcribed well over seven million words of our home transcripts. And with that, let me take you now for a first tour into the data. So you've all, I'm sure, seen time-lapse videos where a flower will blossom as you accelerate time. I'd like you to now experience the blossoming of a speech form. My son, soon after his first birthday, would say "gaga" to mean water. And over the course of the next half-year, he slowly learned to approximate the proper adult form, "water." So we're going to cruise through half a year in about 40 seconds. No video here, so you can focus on the sound, the acoustics, of a new kind of trajectory: gaga to water. (Audio) Baby: Gagagagagaga Gaga gaga gaga guga guga guga wada gaga gaga guga gaga wader guga guga water water water water water water water water water. DR: He sure nailed it, didn't he. (Applause) So he didn't just learn water. Over the course of the 24 months, the first two years that we really focused on, this is a map of every word he learned in chronological order. And because we have full transcripts, we've identified each of the 503 words that he learned to produce by his second birthday. He was an early talker. And so we started to analyze why. Why were certain words born before others? This is one of the first results that came out of our study a little over a year ago that really surprised us. The way to interpret this apparently simple graph is, on the vertical is an indication of how complex caregiver utterances are based on the length of utterances. And the [horizontal] axis is time. And all of the data, we aligned based on the following idea: Every time my son would learn a word, we would trace back and look at all of the language he heard that contained that word. And we would plot the relative length of the utterances. And what we found was this curious phenomena, that caregiver speech would systematically dip to a minimum, making language as simple as possible, and then slowly ascend back up in complexity. And the amazing thing was that bounce, that dip, lined up almost precisely with when each word was born -- word after word, systematically. So it appears that all three primary caregivers -- myself, my wife and our nanny -- were systematically and, I would think, subconsciously restructuring our language to meet him at the birth of a word and bring him gently into more complex language. And the implications of this -- there are many, but one I just want to point out, is that there must be amazing feedback loops. Of course, my son is learning from his linguistic environment, but the environment is learning from him. That environment, people, are in these tight feedback loops and creating a kind of scaffolding that has not been noticed until now. But that's looking at the speech context. What about the visual context? We're not looking at -- think of this as a dollhouse cutaway of our house. We've taken those circular fish-eye lens cameras, and we've done some optical correction, and then we can bring it into three-dimensional life. So welcome to my home. This is a moment, one moment captured across multiple cameras. The reason we did this is to create the ultimate memory machine, where you can go back and interactively fly around and then breathe video-life into this system. What I'm going to do is give you an accelerated view of 30 minutes, again, of just life in the living room. That's me and my son on the floor. And there's video analytics that are tracking our movements. My son is leaving red ink. I am leaving green ink. We're now on the couch, looking out through the window at cars passing by. And finally, my son playing in a walking toy by himself. Now we freeze the action, 30 minutes, we turn time into the vertical axis, and we open up for a view of these interaction traces we've just left behind. And we see these amazing structures -- these little knots of two colors of thread we call "social hot spots." The spiral thread we call a "solo hot spot." And we think that these affect the way language is learned. What we'd like to do is start understanding the interaction between these patterns and the language that my son is exposed to to see if we can predict how the structure of when words are heard affects when they're learned -- so in other words, the relationship between words and what they're about in the world. So here's how we're approaching this. In this video, again, my son is being traced out. He's leaving red ink behind. And there's our nanny by the door. (Video) Nanny: You want water? (Baby: Aaaa.) Nanny: All right. (Baby: Aaaa.) DR: She offers water, and off go the two worms over to the kitchen to get water. And what we've done is use the word "water" to tag that moment, that bit of activity. And now we take the power of data and take every time my son ever heard the word water and the context he saw it in, and we use it to penetrate through the video and find every activity trace that co-occurred with an instance of water. And what this data leaves in its wake is a landscape. We call these wordscapes. This is the wordscape for the word water, and you can see most of the action is in the kitchen. That's where those big peaks are over to the left. And just for contrast, we can do this with any word. We can take the word "bye" as in "good bye." And we're now zoomed in over the entrance to the house. And we look, and we find, as you would expect, a contrast in the landscape where the word "bye" occurs much more in a structured way. So we're using these structures to start predicting the order of language acquisition, and that's ongoing work now. In my lab, which we're peering into now, at MIT -- this is at the media lab. This has become my favorite way of videographing just about any space. Three of the key people in this project, Philip DeCamp, Rony Kubat and Brandon Roy are pictured here. Philip has been a close collaborator on all the visualizations you're seeing. And Michael Fleischman was another Ph.D. student in my lab who worked with me on this home video analysis, and he made the following observation: that "just the way that we're analyzing how language connects to events which provide common ground for language, that same idea we can take out of your home, Deb, and we can apply it to the world of public media." And so our effort took an unexpected turn.