Subtitles section Play video
Imagine if you could record your life --
everything you said, everything you did,
available in a perfect memory store at your fingertips,
so you could go back
and find memorable moments and relive them,
or sift through traces of time
and discover patterns in your own life
that previously had gone undiscovered.
Well that's exactly the journey
that my family began
five and a half years ago.
This is my wife and collaborator, Rupal.
And on this day, at this moment,
we walked into the house with our first child,
our beautiful baby boy.
And we walked into a house
with a very special home video recording system.
(Video) Man: Okay.
Deb Roy: This moment
and thousands of other moments special for us
were captured in our home
because in every room in the house,
if you looked up, you'd see a camera and a microphone,
and if you looked down,
you'd get this bird's-eye view of the room.
Here's our living room,
the baby bedroom,
kitchen, dining room
and the rest of the house.
And all of these fed into a disc array
that was designed for a continuous capture.
So here we are flying through a day in our home
as we move from sunlit morning
through incandescent evening
and, finally, lights out for the day.
Over the course of three years,
we recorded eight to 10 hours a day,
amassing roughly a quarter-million hours
of multi-track audio and video.
So you're looking at a piece of what is by far
the largest home video collection ever made.
(Laughter)
And what this data represents
for our family at a personal level,
the impact has already been immense,
and we're still learning its value.
Countless moments
of unsolicited natural moments, not posed moments,
are captured there,
and we're starting to learn how to discover them and find them.
But there's also a scientific reason that drove this project,
which was to use this natural longitudinal data
to understand the process
of how a child learns language --
that child being my son.
And so with many privacy provisions put in place
to protect everyone who was recorded in the data,
we made elements of the data available
to my trusted research team at MIT
so we could start teasing apart patterns
in this massive data set,
trying to understand the influence of social environments
on language acquisition.
So we're looking here
at one of the first things we started to do.
This is my wife and I cooking breakfast in the kitchen,
and as we move through space and through time,
a very everyday pattern of life in the kitchen.
In order to convert
this opaque, 90,000 hours of video
into something that we could start to see,
we use motion analysis to pull out,
as we move through space and through time,
what we call space-time worms.
And this has become part of our toolkit
for being able to look and see
where the activities are in the data,
and with it, trace the pattern of, in particular,
where my son moved throughout the home,
so that we could focus our transcription efforts,
all of the speech environment around my son --
all of the words that he heard from myself, my wife, our nanny,
and over time, the words he began to produce.
So with that technology and that data
and the ability to, with machine assistance,
transcribe speech,
we've now transcribed
well over seven million words of our home transcripts.
And with that, let me take you now
for a first tour into the data.
So you've all, I'm sure,
seen time-lapse videos
where a flower will blossom as you accelerate time.
I'd like you to now experience
the blossoming of a speech form.
My son, soon after his first birthday,
would say "gaga" to mean water.
And over the course of the next half-year,
he slowly learned to approximate
the proper adult form, "water."
So we're going to cruise through half a year
in about 40 seconds.
No video here,
so you can focus on the sound, the acoustics,
of a new kind of trajectory:
gaga to water.
(Audio) Baby: Gagagagagaga
Gaga gaga gaga
guga guga guga
wada gaga gaga guga gaga
wader guga guga
water water water
water water water
water water
water.
DR: He sure nailed it, didn't he.
(Applause)
So he didn't just learn water.
Over the course of the 24 months,
the first two years that we really focused on,
this is a map of every word he learned in chronological order.
And because we have full transcripts,
we've identified each of the 503 words
that he learned to produce by his second birthday.
He was an early talker.
And so we started to analyze why.
Why were certain words born before others?
This is one of the first results
that came out of our study a little over a year ago
that really surprised us.
The way to interpret this apparently simple graph
is, on the vertical is an indication
of how complex caregiver utterances are
based on the length of utterances.
And the [horizontal] axis is time.
And all of the data,
we aligned based on the following idea:
Every time my son would learn a word,
we would trace back and look at all of the language he heard
that contained that word.
And we would plot the relative length of the utterances.
And what we found was this curious phenomena,
that caregiver speech would systematically dip to a minimum,
making language as simple as possible,
and then slowly ascend back up in complexity.
And the amazing thing was
that bounce, that dip,
lined up almost precisely
with when each word was born --
word after word, systematically.
So it appears that all three primary caregivers --
myself, my wife and our nanny --
were systematically and, I would think, subconsciously
restructuring our language
to meet him at the birth of a word
and bring him gently into more complex language.
And the implications of this -- there are many,
but one I just want to point out,
is that there must be amazing feedback loops.
Of course, my son is learning
from his linguistic environment,
but the environment is learning from him.
That environment, people, are in these tight feedback loops
and creating a kind of scaffolding
that has not been noticed until now.
But that's looking at the speech context.
What about the visual context?
We're not looking at --
think of this as a dollhouse cutaway of our house.
We've taken those circular fish-eye lens cameras,
and we've done some optical correction,
and then we can bring it into three-dimensional life.
So welcome to my home.
This is a moment,
one moment captured across multiple cameras.
The reason we did this is to create the ultimate memory machine,
where you can go back and interactively fly around
and then breathe video-life into this system.