Subtitles section Play video
-
Imagine if you could record your life --
-
everything you said, everything you did,
-
available in a perfect memory store at your fingertips,
-
so you could go back
-
and find memorable moments and relive them,
-
or sift through traces of time
-
and discover patterns in your own life
-
that previously had gone undiscovered.
-
Well that's exactly the journey
-
that my family began
-
five and a half years ago.
-
This is my wife and collaborator, Rupal.
-
And on this day, at this moment,
-
we walked into the house with our first child,
-
our beautiful baby boy.
-
And we walked into a house
-
with a very special home video recording system.
-
(Video) Man: Okay.
-
Deb Roy: This moment
-
and thousands of other moments special for us
-
were captured in our home
-
because in every room in the house,
-
if you looked up, you'd see a camera and a microphone,
-
and if you looked down,
-
you'd get this bird's-eye view of the room.
-
Here's our living room,
-
the baby bedroom,
-
kitchen, dining room
-
and the rest of the house.
-
And all of these fed into a disc array
-
that was designed for a continuous capture.
-
So here we are flying through a day in our home
-
as we move from sunlit morning
-
through incandescent evening
-
and, finally, lights out for the day.
-
Over the course of three years,
-
we recorded eight to 10 hours a day,
-
amassing roughly a quarter-million hours
-
of multi-track audio and video.
-
So you're looking at a piece of what is by far
-
the largest home video collection ever made.
-
(Laughter)
-
And what this data represents
-
for our family at a personal level,
-
the impact has already been immense,
-
and we're still learning its value.
-
Countless moments
-
of unsolicited natural moments, not posed moments,
-
are captured there,
-
and we're starting to learn how to discover them and find them.
-
But there's also a scientific reason that drove this project,
-
which was to use this natural longitudinal data
-
to understand the process
-
of how a child learns language --
-
that child being my son.
-
And so with many privacy provisions put in place
-
to protect everyone who was recorded in the data,
-
we made elements of the data available
-
to my trusted research team at MIT
-
so we could start teasing apart patterns
-
in this massive data set,
-
trying to understand the influence of social environments
-
on language acquisition.
-
So we're looking here
-
at one of the first things we started to do.
-
This is my wife and I cooking breakfast in the kitchen,
-
and as we move through space and through time,
-
a very everyday pattern of life in the kitchen.
-
In order to convert
-
this opaque, 90,000 hours of video
-
into something that we could start to see,
-
we use motion analysis to pull out,
-
as we move through space and through time,
-
what we call space-time worms.
-
And this has become part of our toolkit
-
for being able to look and see
-
where the activities are in the data,
-
and with it, trace the pattern of, in particular,
-
where my son moved throughout the home,
-
so that we could focus our transcription efforts,
-
all of the speech environment around my son --
-
all of the words that he heard from myself, my wife, our nanny,
-
and over time, the words he began to produce.
-
So with that technology and that data
-
and the ability to, with machine assistance,
-
transcribe speech,
-
we've now transcribed
-
well over seven million words of our home transcripts.
-
And with that, let me take you now
-
for a first tour into the data.
-
So you've all, I'm sure,
-
seen time-lapse videos
-
where a flower will blossom as you accelerate time.
-
I'd like you to now experience
-
the blossoming of a speech form.
-
My son, soon after his first birthday,
-
would say "gaga" to mean water.
-
And over the course of the next half-year,
-
he slowly learned to approximate
-
the proper adult form, "water."
-
So we're going to cruise through half a year
-
in about 40 seconds.
-
No video here,
-
so you can focus on the sound, the acoustics,
-
of a new kind of trajectory:
-
gaga to water.
-
(Audio) Baby: Gagagagagaga
-
Gaga gaga gaga
-
guga guga guga
-
wada gaga gaga guga gaga
-
wader guga guga
-
water water water
-
water water water
-
water water
-
water.
-
DR: He sure nailed it, didn't he.
-
(Applause)
-
So he didn't just learn water.
-
Over the course of the 24 months,
-
the first two years that we really focused on,
-
this is a map of every word he learned in chronological order.
-
And because we have full transcripts,
-
we've identified each of the 503 words
-
that he learned to produce by his second birthday.
-
He was an early talker.
-
And so we started to analyze why.
-
Why were certain words born before others?
-
This is one of the first results
-
that came out of our study a little over a year ago
-
that really surprised us.
-
The way to interpret this apparently simple graph
-
is, on the vertical is an indication
-
of how complex caregiver utterances are
-
based on the length of utterances.
-
And the [horizontal] axis is time.
-
And all of the data,
-
we aligned based on the following idea:
-
Every time my son would learn a word,
-
we would trace back and look at all of the language he heard
-
that contained that word.
-
And we would plot the relative length of the utterances.
-
And what we found was this curious phenomena,
-
that caregiver speech would systematically dip to a minimum,
-
making language as simple as possible,
-
and then slowly ascend back up in complexity.
-
And the amazing thing was
-
that bounce, that dip,
-
lined up almost precisely
-
with when each word was born --
-
word after word, systematically.
-
So it appears that all three primary caregivers --
-
myself, my wife and our nanny --
-
were systematically and, I would think, subconsciously
-
restructuring our language
-
to meet him at the birth of a word
-
and bring him gently into more complex language.
-
And the implications of this -- there are many,
-
but one I just want to point out,
-
is that there must be amazing feedback loops.
-
Of course, my son is learning
-
from his linguistic environment,
-
but the environment is learning from him.
-
That environment, people, are in these tight feedback loops
-
and creating a kind of scaffolding
-
that has not been noticed until now.
-
But that's looking at the speech context.
-
What about the visual context?
-
We're not looking at --
-
think of this as a dollhouse cutaway of our house.
-
We've taken those circular fish-eye lens cameras,
-
and we've done some optical correction,
-
and then we can bring it into three-dimensional life.
-
So welcome to my home.
-
This is a moment,
-
one moment captured across multiple cameras.
-
The reason we did this is to create the ultimate memory machine,
-
where you can go back and interactively fly around
-
and then breathe video-life into this system.
-
What I'm going to do
-
is give you an accelerated view of 30 minutes,
-
again, of just life in the living room.
-
That's me and my son on the floor.
-
And there's video analytics
-
that are tracking our movements.
-
My son is leaving red ink. I am leaving green ink.
-
We're now on the couch,
-
looking out through the window at cars passing by.
-
And finally, my son playing in a walking toy by himself.
-
Now we freeze the action, 30 minutes,
-
we turn time into the vertical axis,
-
and we open up for a view
-
of these interaction traces we've just left behind.
-
And we see these amazing structures --
-
these little knots of two colors of thread
-
we call "social hot spots."
-
The spiral thread
-
we call a "solo hot spot."
-
And we think that these affect the way language is learned.
-
What we'd like to do
-
is start understanding
-
the interaction between these patterns
-
and the language that my son is exposed to
-
to see if we can predict
-
how the structure of when words are heard
-
affects when they're learned --
-
so in other words, the relationship
-
between words and what they're about in the world.
-
So here's how we're approaching this.
-
In this video,
-
again, my son is being traced out.
-
He's leaving red ink behind.
-
And there's our nanny by the door.
-
(Video) Nanny: You want water? (Baby: Aaaa.)
-
Nanny: All right. (Baby: Aaaa.)
-
DR: She offers water,
-
and off go the two worms
-
over to the kitchen to get water.
-
And what we've done is use the word "water"
-
to tag that moment, that bit of activity.
-
And now we take the power of data
-
and take every time my son
-
ever heard the word water
-
and the context he saw it in,
-
and we use it to penetrate through the video
-
and find every activity trace
-
that co-occurred with an instance of water.
-
And what this data leaves in its wake
-
is a landscape.
-
We call these wordscapes.
-
This is the wordscape for the word water,
-
and you can see most of the action is in the kitchen.
-
That's where those big peaks are over to the left.
-
And just for contrast, we can do this with any word.
-
We can take the word "bye"
-
as in "good bye."
-
And we're now zoomed in over the entrance to the house.
-
And we look, and we find, as you would expect,
-
a contrast in the landscape
-
where the word "bye" occurs much more in a structured way.
-
So we're using these structures
-
to start predicting
-
the order of language acquisition,
-
and that's ongoing work now.
-
In my lab, which we're peering into now, at MIT --
-
this is at the media lab.
-
This has become my favorite way
-
of videographing just about any space.
-
Three of the key people in this project,
-
Philip DeCamp, Rony Kubat and Brandon Roy are pictured here.
-
Philip has been a close collaborator
-
on all the visualizations you're seeing.
-
And Michael Fleischman
-
was another Ph.D. student in my lab
-
who worked with me on this home video analysis,
-
and he made the following observation:
-
that "just the way that we're analyzing
-
how language connects to events
-
which provide common ground for language,
-
that same idea we can take out of your home, Deb,
-
and we can apply it to the world of public media."