C. C. Mei Distinguished Speaker Series: Dr. George Sugihara - VoiceTube: Learn English through videos!

Subtitles section Play video

Welcome, everybody.
It's a great pleasure to welcome you to our CC Mei Distinguished
seminar series.
This is a series that is sponsored
by the Department of Civil and Environmental Engineering
and the CC Mei Fund, and this is our first Distinguished seminar
of the term.
It's a great pleasure to see it's a full house.
Hopefully for the people that will be late,
they will still find some seats.
And so for today's inauguration talk of the term,
we will be hearing from Professor George Sugihara,
and George Sugihara is a Professor
of Biological Oceanography at the Physical Oceanography
Research Division, Scripps Institute of Oceanography
at UC San Diego.
I'm co-hosting Professor George Sugihara with Professor Serguei
Saavedra here in CEE.
So professor Sugihara is a data-driven theoretician
whose work focuses on developing minimalist inductive theory,
extracting information from observational data
with minimal assumptions.
He has worked across many scientific domains,
including ecology, finance, climate science, medicine,
and fisheries.
He's most known for topological models in ecology,
empirical dynamic forecasting models, research
and genetic early warning signs of critical transitions,
methods of distinguishing correlation
from causal interaction time series,
and has championed the idea that causation
can occur without correlation.
He provided one of the earliest field demonstrations of chaos
in ecology and biology.
Professor Sugihara is the inaugural holder
of the McQuown Chair of Natural Science at the Scripps
Institute of Oceanography at UCSD.
He has won many other awards and recognitions,
including being member of National Academies
Board on Mathematical Sciences and their applications
for a few years.
And today, he will discuss understanding nature
holistically and without equations.
And that's extremely intriguing for all of us.
And so without further ado, please join me
in welcoming Professor Sugihara.
[APPLAUSE]
This is in my presenter notes, so I'm reading it
off of the screen here.
I want to make a disclaimer, however.
In the abstract, it says that these ideas are intuitive.
Are you good?
Are we good?
OK.
So the abstract says that the ideas that I'm going to present
are intuitive, but this is not entirely true.
In fact, for whatever reason, at one point,
the playwright Tom Stoppard approached me,
and he said that he was interested in writing something
about these ideas and wondered if it
would be possible to explain these to a theater audience.
And just read the dark black there.
His response was that if he tried to explain it
to a theater audience, they'd probably
be in the lobby drinking before he
got through the first sentence.
So the ideas are in fact decidedly counter-intuitively.
And this is a fact that in a sense
goes against how we usually try to understand things.
So I'll explain what that means in a second.
So we're all familiar with Berkeley's famous dictum,
but despite this warning, correlation
is very much at the core of Western science.
Untangling networks of cause and effect
is really how we try to understand nature.
It's essentially what the business of science
is all about.
And for the most part and very much
despite Berkeley's warning, correlation
is very much at the core of how we try to get a grasp on this.
It's an unspoken rule, in fact, that within science
and with how we normally operate,
it's a correlation is a reasonable thing to do.
It's innocent until it's proven guilty.
Thus, distinguishing this intuitive correlation
from the somewhat counter-intuitive causation
is at the crux, and it's the topic of this talk today.
So I'm going to develop a discussion
for making this distinction that hinges on two main elements.
First, the fact that the nature is
dynamic in the temporal sequence matters.
Meaning that nature is better understood as a movie than as
snapshots, OK?
And secondly is the fact that nature is nonlinear,
that it consists of interdependent parts that
are basically non-separable, that context really matters.
That nature can't be understood as independent pieces
but rather each piece needs to be studied
in the context surrounding it.
So let's start with a nice, simple example.
All right.
Consider these two time series.
One might be a species, or these might
be two species interacting, or one
might be an environmental driver and responding species,
or a driver and a physiological response,
or money supply and interest rates, something like that.
So if you look at 10 years of data,
you say your first hypothesis is that these things are
positively correlated.
You have this kind of working model for what's going on.
If you roll forward another dozen years,
you find your hypothesis holds, but then it
falls apart a little bit here and in the middle,
right in here.
And then it sort of flips back on here towards the end.
So out of 18 years of observations, actually
more like 22 years of observations,
we find that our hypothesis that these things are correlated
is a pretty good one.
If this was an ecology pattern, if this
was a pattern from ecology, we'd say that this
is a really good hypothesis.
So we might make an adaptive caveat here, kind of an excuse
for what happened when it became uncorrelated, but more or less,
this looks like a pretty good hypothesis.
This is, however, what we see if we roll forward
another couple of decades.
In fact, for very long periods of time,
these two variables are uncorrelated.
They're totally unrelated.
However, they appear from a statistical sense
to be unrelated, but they were actually
generated from a coupled two-species difference
equation.
So this is a simple example of nonlinear dynamics.
We see to two things can appear to be coupled
for short periods of time, uncoupled,
but for very long periods of time,
there's absolutely no correlation.
So not only does correlation not imply causation,
but with simple nonlinear dynamics, lack of correlation
does not imply lack of causation.
That's actually something that I think is fairly important.
In retrospect, what I just showed you,
you might think this is obvious, but apparently this
is not well known, and it contradicts a currently held
view that correlation is a necessary condition
for causation.
So this was Edward Tufte who said
that empirically observed variation is
a necessary condition for causation.
OK.
So the activity of correlation, I think,
reflects the physiology of how we learn.
And one can argue that it's almost wired
into our cognitive apparatus.
So the basic notion beyond Hebbian learning
is that cells that fire together wire together.
So the mechanism of how we learn is really
very sort of supportive of the whole notion of correlation.
So I think it's very fundamental to how we perceive things
as human beings.
OK.
The picture that emerges is not only
that correlation does not necessarily imply causation,
but that you can have causation without correlation.
OK, and this is the realm of nonlinear systems.
This is interesting, because this is also
the realm of biological systems.
So within this realm, there's a further consequence
of non-linearity that was demonstrated in the model
example, and that's this phenomenon
of mirage correlation.
So correlations that come and go and that even change sign.
So here is a nice, simple example of mirage correlation.
This is an example not from finance but from ecology.
This is a study by John McGowan, and it was an attempt
to try to explain harmful algal blooms at Scripps,
these red tides.
So these spikes here are spikes in chlorophyll
found at Scripps Pier.
And what we see in the blue at the bottom
are sea surface temperature anomalies.
And so the idea was that the spikes in chlorophyll
were really caused by the sea surface temperature anomalies.
This is about a decade's worth of observations.
They were about to publish it, but they
were kind of slow in doing so.
And in the meantime, this correlation reversed itself.
And not only did it reverse itself,
it then became completely uncorrelated.
So I think this is a classic example
of a mirage correlation.
OK.
So here's another example from Southern California.
Using data up to 1991, there's a very significant relationship
between sea surface temperature here,
and this is a measure of sardine production,
so-called recruitment.
So this was reported in '94 and was subsequently written
into state law for managing harvest.
So if you are above 17 degrees, the harvest levels are higher.
If you're below 17 degrees, they were lower.
However, when data-- if you add it to this existing data,
data from '94 up to 2010, this is what you find.
The correlation seemed to disappear in both cases.
So these are two different ways of measuring productivity,
and the correlation disappeared in both of them.
So this statute that was written into state law
has now been suspended.
And this is where it now stands.
All right, so another famous example from fisheries
was this meta-analysis on 74 environment recruitment
correlations that were reported in the literature.
So these correlations were tested
subsequent to the publication of each original paper
by adding additional data to see if they were upheld.
And only 28 out of the 74 were.
And among the 28 that were upheld
was the sardine, so we know what happened there.
OK, so relationships that we thought we understood
seemed to disappear.
This sort of thing is familiar in finance
where relationships are uncovered but often disappear
even before we try to exploit them.
OK.
So how do we address this?
The approach that I'm going to present today
is based on not only your state space reconstruction, which
I refer to here with a little less technical
but I think more descriptive name,
which is empirical dynamics.
So EDM, Empirical Dynamic Modeling,
is basically a holistic data-driven approach
for studying complex systems from their attractors.
It's designed to address nonlinear issues
such as mirage correlation.
I'm now going to play a brief video that I
think is going to explain all.
This is something that my son actually made for me when
I tried to explain it to him.
And he said, no, no, no, you can do this--
it doesn't take three hours to explain this to someone.
You can do this in like two minutes
with a reasonable video.
So he made this nice video for me.
The narration is by Robert May.
[VIDEO PLAYBACK]
- This animation illustrates the Lorentz attractor.
The Lorentz is an example of a coupled dynamic system
consisting of three differential equations, where each--
[END PLAYBACK]
Oh, technical difficulties.
Sorry.
Let me start it again.
Hold on.
[VIDEO PLAYBACK]
- This animation illustrates the Lorentz attractor.
The Lorentz is an example of a coupled dynamic system
consisting of three differential equations
where each component depends on the state and dynamics
of the other two components.
Think of each component, for example, as being species--
foxes, rabbits, grasses.
And each one changes depending on the state of the other two.
So these components shown here as the axes
are actually the state variables or the Cartesian coordinates
that form the state space.
Notice that when the system is in one lobe,
X and Z are positively correlated.
And when the system is in the lobe,
X and Z are negatively correlated.
The other wing of the butterfly.
We can view a time series thus as a projection
from that manifold onto a coordinate axis of the state
space.
Here we see the projection onto axis X and the resulting time
series recording displacement of X.
This can be repeated on the other coordinate axes
to generate other simultaneous time series.
And so these time series are really
just projections of the manifold dynamics
on the coordinate axes.
Conversely, we can recreate the manifold
by projecting the individual time series back into the state
space to create the flow.
On this panel, we can see the three time series, X, Y,
and Z, each of which is really a projection
of the motion on that manifold.
And what we're doing is the opposite here.
We are taking a time series and projecting them back
into the original three-dimensional state space
to recreate the manifold.
It's a butterfly attractor.
[END PLAYBACK]
OK.
To summarize, these time series are really observations
of motion on an attractor.
Indeed, the jargon term in dynamical systems
is to call a time series an observation function.
Conversely, you can actually create attractors
by taking the appropriate time series,
plotting them in the right space,
and generating some kind of a shape.
OK, this is really the basis of this empirical dynamic
approach.
What is important, I think, to understand here
is that the attractor and the equations
are actually equivalent.
Both contain identical information,
and both represent the rules governing the relationships
among variables.
And depending on when they are viewed,
these relationships can appear to change.
And this is what can give rise to mirage correlations.
So over the short term here, there might be correlations.
But over a longer term--
so for example, if it's in this lobe--
I'm very bad with machines.
All right.
If it's in that lobe, you'll get a positive relationship.
If it's in the lobe on this side,
you'll get a negative correlation.
If you sample the system sparsely
over long periods of time, you'd find no apparent correlation
at all, OK?
OK, let's look at another real example of this.
So this is an application that I was initially skeptical about,
mainly because I couldn't see how to get time series.
But luckily, I was wrong here.
These are experimental data obtained
by Gerald Pao from the Salk Institute
on expression levels of transcription factor SWI4
and cyclin CLN3.
This is in yeast.
If you view it statistically, so this is viewed statistically,
the relationship between these two variables,
there's absolutely no statistical relationship.
There's no cross-correlation.
However, if you connect these observations in time,
they're clearly inter-related.
So we see the skeleton of an attractor emerging.
So the way that they generated this data, actually, which--
so when I was originally approached about this,
and they said, well, we want to apply these methods
to gene expression.
And I said, but you can't make a time series
for gene expression.
And they said, oh, yes, we can.
And what they did in this case, because it was yeast,
they were able to shock cells, which synchronizes them
in their cell cycle, and then sample them
every 30 minutes for two days.
And so at each sample, they would
sequence several thousands of genes
and do this every 30 minutes for two days.
You can do a lot if you have post-docs and graduate
students, all right?
OK.
So we were able to get this thing to actually reflect
an attractor.
Very interesting.
Of course, if you randomize these observations in time,
you get absolutely nothing.
You still get singularities.
So you get these crossings in two dimensions.
However, if you include the cyclin CLB2,
the crossing disappear, OK?
So we have this nice cluster of three things,
that actually if you looked at them statistically,
appear to be uncorrelated, or essentially invisible
to bioinformatics techniques that are, in fact, dynamically
interacting.
So here is another short video clip
that I think presents what I consider
to be a really important basic theorem that
supports a lot of this empirical dynamics work.
[VIDEO PLAYBACK]
- There's a very powerful theorem proven by [INAUDIBLE]..
It shows generically that one can reconstruct a shadow
version of the original manifold simply by looking at one
of its time series projections.
For example, consider the three times series shown her.
These are all copies of each other.
They are all copies of variable eggs.
Each is displaced by an amount tau.
So the top one is unlagged, the second one is lag by tau,
and the blue one at the bottom is lag by two tau.
Takens' theorem then says that we
should be able to use these three time
series as new coordinates and reconstruct
a shadow of the original butterfly manifold.
This is the reconstructed manifold produced
from lags of a single variable, and you
can see that it actually does look
very similar to the butterfly attractor.
Each point in the three-dimensional
reconstruction can be thought of as a time segment
with different points capturing different signals
of [INAUDIBLE] of variable eggs.
This method represents a one-to-one map
between the original manifold, butterfly attractor,
and the reconstruction, allowing us
to recover states of the original dynamic system
by using lags of just a single time series.
[END PLAYBACK]
OK.
So to recap, the attractor really
describes how the variables relate to each other
through time.
And Takens' theorem says quite powerfully
that any one variable contains information about the others.
This fact allows us to use a single variable basically
to construct a shadow manifold using
time lags as proxy coordinates that has
a one-to-one relationship with the original manifold.
So constructing attractors, again, from time series data
is the real basis of the empirical dynamic approach.
And as we see, we can do this univariately
by taking time lags of one variable.
We can do this multivariately with a set
of native coordinates, and we can also
make mixed embeddings that have some time lags as well as
some multivariate coordinates.
So let's look at some examples.
So this is an example of using lags with the expression time
series.
This is a mammalian model.
Mouse fibroblast production of an insulin-like growth factor
binding protein.
And again, this is the case of synchronizing and then sampling
over a number of days.
So clearly gene expression is a dynamic process, which
is quite a radical departure, I think,
from normal bioinformatics approaches,
which are essentially static
OK.
Here we have another ecological example.
These are attractors constructed for sockeye salmon returns,
and this is for the Fraser River in Canada, which is
like the iconic salmon fishery.
And you can see for each one of these different spawning lakes,
you get an attractor that looks relatively similar.
They all look like Pringle chips, basically.
And what's interesting about this--
and I'll talk about this a little bit more later--
is that you can use these attractors
that you construct from data to make very good predictions.
And the fact that you can make predictions and make
these predictions out of sample, I think,
should give you some confidence that this is reasonable.
So again, I'm talking about a kind of modeling
where there really are almost no free parameters.
There's one in this case, right?
I'm assuming that I can't adjust the fact that I'm
observing this once a year.
So that's given.
Tau is given.
The time lag is given.
The only variable that I'm using here
that I need to kind of estimate is the number
of dimensions, so the number of embedding dimensions
that we need for this.
In this case, I'm showing it in three dimensions.
Not all of these attractors, of course,
are going to be three-dimensionals.
The ones that I'll show you tend to be,
only because you can see them and they're
easy to understand what's going on.
So the basic process is really involving very few
assumptions and with only one fitted parameter,
with that fitted parameter being the embedding dimension.
OK.
So the fact that I'm able to get to using--
this is again, just using lags--
something coherent in three dimensions
means that I might be able to construct a mechanistic model
that has three variables.
So maybe sea surface temperature, river discharge,
maybe spawning, smolts going into the ocean, something
like that.
OK.
So again, one of the most compelling features, I think,
of this general set of techniques
is that it can be used to forecast.
And the fact that you could forecast
was something that originally got
me interested in this area or this set of techniques.
And it kind of led me into finance,
so I worked for like half a decade as a managing
director for Deutsche Bank.
And things like this were used to manage
on the order of $2 billion a day in notional risk.
So it's very bottom line, it's very pragmatic, and verifiable
with prediction, all of which I find--
plus it's extremely economical.
There are very few moving parts.
OK.
So I'm going to quickly show you two
basic methods for forecasting.
There are many other possibilities that exist,
but these are just two very simple ones, simplex projection
and S-maps.
So simplex projection is basically a nearest neighbor
forecasting technique.
Now you can imagine having the number of nearest neighbors
to be a tunable parameter, but the idea here is to be minimal,
and the nearest neighbors are essentially determined
by the embedding dimension.
So if you have an embedding dimension of e,
you can always--
a point in an e dimensional space
can be an interior point in e plus one dimensions,
which means you just need e plus one neighbors.
And so e plus one-- so the number of neighbors
is determined.
It's not a free variable in this, OK?
So the idea then is to take these nearest neighbors
in this space, which are analogs,
project them forward, and see where they went,
and that'll give you an idea for where the system is headed.
OK.
So again, each point on this attractor
is a history vector or a history fragment, basically.
And so here is this point that I'm trying to predict from.
And I look at the nearest neighbors, and then I--
these are points in the past, right?
And now I say, where do they go next?
And so I get a spread of points going forward,
and I take the center of mass of that spread,
the exponentially weighted center of mass,
and that gives me a prediction.
So how do you predict the future?
You do it by looking at similar points in the past.
But what do you mean by similar?
What you mean by similar is that the points
have to be in the correct dimensionality.
So for example, if I'm trying to predict the temperature
at the end of Scripps Pier tomorrow,
the sea surface temperature, and it's
a three-dimensional process, and let's say the right lag should
be a week, then I'm not just going
to look at temperatures that are similar to today's temperature.
I'm going to look at temperatures where today's
temperature, the temperature a week ago,
and the temperature two weeks ago are most similar, right?
And so the knowing the dimensionality
is quite important for determining what the nearest
neighbors are, all right?
So you take the weighted average and that
becomes your prediction.
Here's an example.
This looks like white noise.
What I'm going to do is cut this data in half,
and I'm going to use the first half to build a model,
I'm going to predict on the second half.
So if I take time lag coordinates, and in this case,
again, I'm choosing on purpose three three-dimensional things,
because they're easy to show.
This is like taking a fork with three prongs,
laying it down on the time series,
and calling one x, the other one y, the other one z.
So I'm going to plot all those points going forward,
and this is the shape I get.
So you actually get what looked like white noise,
and it totally random actually was not.
In fact, I generated it from first differences
of [INAUDIBLE], OK?
So if we now use this simple zeroth order technique
and we try to predict that second half of the time series
that looked totally noisy, you can do quite well.
This is actually predicting to two points
into the future, two steps into the future.
OK.
So again, how did I know to choose three dimensions?
Basically you do this by trial and error.
You try like one, two, three.
And it peaks So this is, again, how well you can predict.
This is the Pearson correlation coefficient.
And this is trying different embedding dimensions,
trying a two-pronged fork, a three-pronged fork, so on.
And again, so the embedding with the best predictability
is the one that best unfolds the attractor, the one that best
resolves the singularities.
And this relies basically on the Whitney embedding theorem.
So if the attractor actually was a ball of thread, OK,
and I tried to embed this ball of thread in one dimension,
that would be like shining a light down across over a line.
Then at any point, I could be going right or left.
So there's singularities everywhere.
If I shine it down on two dimensions, I now have a disk.
At any point I can go right, left, up, down, so forth.
Everywhere is a singularity.
If I know embed it in three dimensions--
so the thread is one-dimensional, right?
If I embed it in three dimensions, all of a sudden,
I can see that I have individual threads.
And if you have these individual threads,
that allows you to make better predictions, right?
So this is how you can tell how well you've
embedded the attractor, how well you
can predict with the attractor.
OK.
All right.
So the other-- sort of the next order of complexity
is basically a first-order map, which
is a weighted autoregressive model where you're effectively
computing a plane along the manifold along this attractor
and using the coefficients of the Jacobian matrix
that you compute for this hyperplane,
basically, to give you predictions.
But when you're computing this plane,
there's a weighting function.
It's this weighting function that we're calling theta here.
And that weighting function determines how heavily you
weight points that are nearby on the attractor versus points
that are far away, OK?
So if theta is equal to zero, then all points
are equally weighted.
That's just like fitting a standard AR model
to a cloud of points, right?
All points are equally valid.
But if the attractor really matters,
then points nearby should be weighted more heavily
than points far away, OK?
So if there's actual curvature in there,
then if you weight more heavily, you're
taking advantage of that information, OK?
So this is if you crank theta up to 0.5,
your weighting points nearby more heavily,
so forth and so on.
OK.
This is a really simple test for non-linearity.
You can actually try increasing that theta,
the tuning parameter.
And if as you increase it the predictability goes up,
then that's an indication that you get an advantage
by acknowledging the fact that the function is
different at different parts on the attractor, which
is another way of saying the dynamics are state dependent,
which is another way of saying the manifold
has curvature to it, OK?
So curvature is actually ubiquitous in nature.
This is a study that my student [? Zach ?] [? Shee ?] did.
And if you look at 20th century records
for specific biological populations,
you find all of them exhibit non-linearity.
We didn't find non-linearity, actually,
for some of the physical measurements.
But again, we were just looking at the 20th century,
and it might've been too short to pick that up.
Other examples include other fish species, sheep, diatoms,
and an assortment of many other kinds of phenomena.
All show this kind of non-linearity.
It seems to be ubiquitous.
Wherever you look for it, it's actually rare
that you don't find it, OK?
So the fact that things are nonlinear is pretty important,
I think.
It affects the way that you should think about the problem
and analyze it.
And in fact, the non-linearity is a property
that I believe can be exploited.
This is an example of doing just that.
So this paper appeared last year in PRSB,
and it used S-maps, this technique that we just
saw, to show how species interactions vary
in time depending on where on the attractor they are, OK?
So it really showed how we can take
real-time measurements of the interactions that
are state dependent, OK?
And the basic idea is as follows.
So the S-map involves calculating a hyperplane
or a surface at each point as the system travels
along its attractor.
So this involves calculating the Jacobian matrix, whose elements
are partial derivatives that measure the effect of one
species on another.
So note that the embeddings here are multi-variate.
So these aren't lags of one variable,
but they're native variables, right?
So I want to know how the relationship
of each native variable affects the other variable
and how that changes through time.
So what I do is at each point, I compute a Jacobian matrix.
If this was an equilibrium system,
there would just be one point, and I
would be looking at the-- it's like
the standard linear stability analysis for an equilibrium
system.
But what I'm doing is I'm taking that analysis,
but I'm applying it to each as the system travels successively
along each point on the attractor.
So the coefficients are in effect
fit sequentially as the system travels along its attractor.
And they vary, therefore, according to the location
on the attractor.
So what's really nice about this is that it's something
that you can actually accomplish very easily on real data.
And here's an example.
This is data from a marine mesocosm that
was collected by Huisman, and what you want to focus on
is the competition between copepods and rotifers.
These are the two main consumers in this.
So these are both zooplanktons that eat phytoplankton.
And this is basically the partial
of how the callenoids vary with the rotifers.
And so you can see that the competition--
so this shows how the coefficients
are changing as you computed along as the system is
traveling along its attractor.
So what's the interesting thing, what
I think is interesting here is that I was totally surprised.
Competition is not a fairly smooth and long-term
relationship, right?
In classical ecology, it's regarded as a constant.
So two species compete, you compute their alpha.ij,
and that's the constant.
In fact, it's very episodic.
It seems to only occur like in these little bottlenecks, which
I think is-- so I mean, this is nature.
This is not my model.
This is what nature is telling me,
that you get competition in these little bottlenecks.
So that fact I found fairly surprising.
But what's even more interesting is
to ask the question, what is it about the system when
it does occur that causes this competition?
And it turns out that what you can do
is make a graph basically of how that coefficient--
this is terrible.
I think I got this when I talked at Stanford last fall.
OK.
All right.
All right, it's broken.
So you can make a plot of what the competition coefficient--
how the competition coefficient varies as a function of food
abundance.
And the obvious thing that you get here
is that when do you get competition?
When food is scarce.
I mean, duh.
That seems like it should be obvious.
But what wasn't clear before is how episodic this all is.
It's not sort of a gradual constant affair.
It's something that happens in these sudden bottlenecks.
So what we have then is a pretty good tool for probing changing
interactions.
And I can see other potential for this
in terms of looking for--
you can compute the matrix and maybe
compute something like an eigenvalue for the matrix
as it changes to look for changes where--
to look for instances where you were
about to enter a critical transition.
So this stuff really hasn't been written up yet.
You should go ahead and do it.
But I see a lot of potential for just
using this fairly simple approach, which again,
is very empirical, and it allows the data to tell you
what's actually happening.
OK.
So let's see how EDM deals with causation.
OK.
This is the formal statement of Granger causality.
So basically he's saying, I'm going
to try to predict Y2 from the universe
of all possible variables.
And this is the variance, my uncertainty in my prediction.
And it says that if however I remove Y1
and I'm trying to predict Y2, and this variance is greater,
than I know that Y1 was causal.
So it says if I exclude a variable
and I don't do as well at predicting, then
that variable was causal.
That's the formal definition of Granger causality.
The problem, however, is that this seems
to contradict Takens' theorem.
So Takens' theorem says the information
about other variables in the system
are contained in each other variable, OK?
So how can you remove a variable if that variable's information
is contained in the others?
So there is a little bit of a problem.
What's interesting is if you look at Granger's '68 paper
where he describes this, he says explicitly,
this may not work for dynamic systems.
So--
[LAUGHTER]
He was covered.
OK.
So I think this is a useful criterion
sort of as a kind of a rule of thumb, practical rule of thumb.
But it really is intended more for stochastic systems rather
than dynamic systems.
OK.
So in dynamic systems, time series variables
are causally related again if they're coupled and belong
to the same dynamic system.
If X causes Y, then information about X
must be encoded in this shadow manifold of Y.
And this is something that you can test with cross-mapping.
This was the paper that was published at the end of 2012
that describes the idea.
And I have one final video clip.
It's not narrated by Bob May.
I had my student [? Hal ?] [? Yee ?] do the narration
on this one.
But it'll explain it.
[VIDEO PLAYBACK]
- Takens' theorem gives us a one-to-one mapping between
the original manifold and reconstructed shadow manifolds.
Here we will explain how this important aspect of attractor
reconstruction can be used to [INAUDIBLE] two time series
variables belong to the same dynamic system
and are thus causally related.
This particular reconstruction is based on lags of variable x.
If we now do the same for variable y,
we find something similar.
Here we see the original manifold M, as well as
the shadow manifolds, Mx and My, created from lags of x and y
respectively.
Because both Mx and My map one-to-one
to the original manifold M, they also
map one-to-one to each other.
This implies that the points that are nearby
on the manifold My correspond to points that are also nearby
on Mx.
We can demonstrate this principle
by finding the nearest neighbors in My
and using their time indices to find
the corresponding points in Mx.
These points will be nearest neighbors on Mx
only if x and y are causally related.
Thus, we can use nearby points on My
to identify nearby points on Mx.
This allows us to use the historical record of y
to estimate the states of x and vice versa,
a technique we call cross-mapping.
With longer time series, the reconstructed manifolds
are denser, nearest neighbors are closer,
and a cross-map estimates increase in precision.
We call this phenomenon convergent cross-mapping
and use this convergence as a practical criterion
for detecting causation.
[END PLAYBACK]
OK.
So with convergent cross-mapping,
what we're trying to do is we're trying to recover states
of the affected variable--
we're trying to recover states of the causal variable
from the affected variable.
And so this is basic.
Let's see.
The idea is that instead of looking specifically
at the cause, we're looking at the effect
to try to infer what the cause was.
So basically from the victim, we can find something
about the aggressor or the perpetrator, right?
OK.
This little piece, I think, will give you
a little bit of intuition.
So these two time series are what you get if alpha is zero.
So this is y is red and x is blue.
And you can see that with alpha equal to zero,
they're independent.
If I crank up alpha, and then this is what I get.
So again, you can see that the blues time series is not
altered, but the red one, but y actually is.
And it's in this alteration of the time series
that I'm able, from the reconstructed manifold,
to be able to backtrack the values of the blue time series.
And so that shows that x was causal on y.
OK.
A necessary condition for a cross-map estimate for--
a necessary condition for a convergence
is to show that the cross-map estimate improves
with data length.
And so that's basically what we see here.
So as points get closer in the attractor,
your estimates should get better,
and so predictions should get better.
So let's look at some examples.
This is a classic predator/prey experiment
that Gauss made famous.
So didinium is the rotifer predator,
paramecium is the prey.
And you can see, you can get cross-mapping
in both directions, sure.
The predator is affecting the prey,
the prey is affecting the predator.
This sort of looks like maybe the predator
is affecting the prey more than the prey is
affecting the predator.
But if you look at this in a time lag way,
so this is looking at different prediction lags
for doing the cross-mapping, you find
that the effect of the predator on the prey
is almost instantaneous, which you kind of expect.
These are rotifers eating paramecia.
But the effect of the paramecia itself on the predator
is delayed, and it's delayed looks
like by about a day or so.
So you get sort of a sensible time delay here.
OK.
This is a field example.
These are sardines and anchovies that
have been sort of a mystery for quite a while.
They show reciprocal abundance patterns.
And it was thought that maybe they compete.
These are data for Southern California.
It may well be that they are competitive in other areas,
maybe the Japan sea.
But not in Southern California.
There's absolutely no evidence for mutual effect to sardines
and anchovies there.
However, if you look at sea surface temperatures,
you find that they're both influenced
by sea surface temperature, but probably
in slightly opposite ways.
So that's kind of a nice result for that problem.
OK, now final ecological example are these red tides.
Episodic red tides are a classic example
that no one has been able to predict.
They've been thought to be regime-like,
and the mechanism for this rapid transition
has remained a mystery for over a century.
So despite about a dozen or so Scripps theses
all showing by experiment that certain factors should
be important, none of them show a correlation.
So if you look at the field data,
you actually don't see the correlation
that you would expect if you had done the experiments.
So this was exactly the case that we saw, for example,
with sea surface temperature anomaly and chlorophyll.
So you get these little temporal correlations
that then disappear.
So the absence of environmental correlations
suggests that these events can't be
explained by linear dynamics.
And you can confirm this by doing an S-map test.
You find, in fact, chlorophyll is very nonlinear.
If you increase theta, it improves.
But the most convincing thing is that you can actually
find very good predictability using
a simple manifold constructed in forecasting using an S-map.
So the univariate construction, because you're just
looking at the one variable, is really
summarizing the internal dynamics,
the intrinsic dynamics.
And so if you just focus now on the red tides,
the correlation goes down.
So we actually can't predict these red tides quite as well
from just the simple internal dynamics, which
suggests that there may be stochastic variables coming in
to force the system, OK?
So we then did the obvious thing,
which was to apply CCM to these environmental variables that
were thought to be important but that showed no correlation.
And so that's what we did.
And these candidate variables fall
into two groups, those that describe nutrient history
and those that describe stratification.
If you look at correlation, you actually
find very little correlation in there at all.
But if you do cross-mapping, just about all of them
show a good cross-map scale, OK?
So just about all of them contained some information.
And so this was very encouraging.
This is actually a class project,
and there were eight of us involved.
And we had data.
We did all this analysis, and we had data going up to 2010.
The data from 2010 onward had not yet been analyzed.
We had all the samples, but they hadn't been analyzed.
And so we came up with our set of models,
and then we were able to process the data, and we all sort of--
there were 16 fingers being crossed.
And we did the out-of-sample test,
and this is the result, which was very good.
We actually found very good predictability
with a correlation coefficient of about 0.6.
So this is a really nice example of out-of-sample forecasting
using these methods.
So we've learned something about the mechanism,
that the mechanism has something to do
with water stratification, stability of the water column.
And on top of that, we're actually
able to forecast these red tides with some accuracy.
All right, so this is potentially
the most exciting application of these ideas,
and this is the last big piece that I want to talk about.
So this experiment, this is the experimental work
being done in the Verma Lab at the Salk Institute.
And this is the attractor that we saw earlier.
Remember, these things were all mutually
uncorrelated in a statistical sense,
but we found were causally related.
And so if you make an attractor using all three,
this is what you get.
These things are also uncorrelated, but very strongly
causally linked to the transcription regulator WHI5.
So this suggested that one could do an experiment with WHI5
to see how well this method of CCM
actually does at identifying uncorrelated causal links.
So this is an example showing the uncorrelated linkage.
You can see that WHI5 and SWI4 are completely uncorrelated.
Those are the original time series.
But if you do cross-mapping, you find in fact there's
a significant signal.
You can recover values of WHI5 from values of SWI4
on the attractor.
OK, so the experiment that this suggests
is if you alter the value of WHI5
artificially, if you experimentally
enhance WHI5, because it's causally related,
it should produce an effect on the dynamics
of these other genes.
And so that's what we did.
And so the black is while type, and the purple dotted one
is the manipulation.
So the manipulation clearly deformed the attractor.
And this is something that you can actually
quantify pretty easily.
OK, so if you repeat this procedure
for other genes showing a low correlation with WHI5--
this is the money panel right here--
you can find that 82% of the genes that were identified
by CCM to be causal--
these are all uncorrelated-- to be causal,
were actually verified by experiment
to be causal, which is really good,
because the industry standard for [INAUDIBLE] is 3%.
So this is better.
This is actually better.
The other thing that I think makes
this interesting is that these non-correlated genes that
are also causal are thought to be signal integrators,
and signal integrators may be really, really important
for gene expression.
So we'll see how this all goes.
So I think that this could have immediate practical importance,
because the networks that you generate
this way can provide good guidance for the experiments
that need to be done.
So you have 25,000 genes, and so you can do 25,000 [INAUDIBLE]
experiments.
That's just too much.
And so you need something to kind of narrow down
what to focus on, and this may be a reasonable thing.
All right.
So this is a mammalian example of the same sort of thing.
This is a group of genes that Verma has
studied for about 30 years--
so a very well studied group--
that have to do with the immune response.
And this is the network that you would get if you just
looked at cross correlations.
But this network turns out to be entirely wrong.
There's a very well known bi-directional feedback
between I kappa, B alpha, and relA.
And this is the network that you get with CCM.
So what's interesting is that this CCM network actually
identifies another link that looks
interesting between relA in June that was not previously known.
And so this link, because you have
this bi-directional feedback, should produce some kind
of limit cycle-like behavior.
And so if you make a phase portrait of these two,
you should see something that looks kind of limit cycle-like.
The same should be true here, OK?
I'm almost done.
All right.
So if you do this, this is the known link,
and we get something that looks kind of limit cycle-like.
This was the previously unknown link, OK,
and you do get this behavior.
And this was actually the incorrect link that
was suggested by correlation.
So kind of interesting.
All right.
So there are a bunch of recent studies
that have looked at this.
I'll just go through them really fast.
This one was focused on forecasting.
OK, hold on.
Let me go--
OK.
So this one had to do with the incidents of cosmic rays
in 20th century that's been used to suggest that climate warming
is natural and not due to man.
And what we did that was interesting
is that we found that if you look at over the 20th century,
there is no causal relationship between cosmic rays
and global warming.
However, if you look at the time scale year-to-year,
you find a causal signal.
So in fact, it does have a very short-term effect
on inter-year dynamics, but it doesn't explain the trend.
OK.
So this was a study on the Vostok ice core
to see if there is a direct observational--
if we get direct observational evidence for the effect
of greenhouse gases on warming.
And we found it, of course.
But the other thing that we found
that was kind of interesting was you actually
have a link in the other direction as well,
but it's delayed.
And so this is a more immediate effect.
This one takes hundreds of years to occur, OK?
And then this one focused on forecasting.
It was a great success story, because the models that we were
able to produce got some interest in the Canadian press,
and we made forecasts for 2014, 2015, and 2016
that were all pretty good.
So, so far, so good.
I don't know what's going to happen in 2017.
So this is a nice example of out-of-sample sample
forecasting.
The classical models, if you actually
try to include environmental variables,
do worse if you do that.
With these models, it does better.
OK, and then this one appeared last fall.
It was an application of these ideas to look at flu epidemics.
And what's interesting here is that we were actually
able to find a particular temperature threshold, 75
degrees, below which, absolute humidity
has a negative effect on flu incidence, above which,
absolute humidity has a positive influence.
And I think the hypothesized mechanism is below 75 degrees,
the main environmental stressor is viral envelope disruption
due to excess water, right?
Above 75 degrees, desiccation becomes
the main environmental stressor.
And so higher humidity helps actually
flu incidence at higher temperatures,
but it inhibits flu incidence at lower temperatures.
And so, of course there are many other factors
than absolute humidity, but this was one that came out.
And it may actually be that the proximal driver is
relative humidity, but you're asking, what's the--
but relative humidity varies depending on whether you're
inside or outside.
Absolute humidity is much more robust.
Absolute humidity outside is going to be about the same
as it is inside.
So yeah, an interesting nuance.
All right.
This paper won the 2016 William James Prize in Consciousness.
We'll stop at that.
All right.
So I'm just going to stop there.
All right.
And so these are my tasteless thematic closing slides.
This is a little politically incorrect.
In fact, all of them are politically incorrect.
My wife told me this was cute, so you can blame her.
All right.
And so this is with particular reference
to the fisheries models that are built
on assumptions of equilibrium.
There we go.
Yeah.
And then, as we all know, this is true.
Thank you.
[APPLAUSE]
All right, thank you very much for a great talk.
Questions.
Thank you for the nice talk.
I would like to ask you, what is your feeling
about the applicability of data-driven methods in general
in systems with high intrinsic dimensionality?
Let's say fluid flows, climate models.
In this case, how do you choose the variables to model?
And what is the effect of sparse data in phase space?
Have you considered such issues?
Yeah.
Well, I think that--
well, I've chosen examples of opportunity.
So the things that I've chosen have all
shown attractors that are relatively low dimensional.
They were taken from problems that you may not necessarily
have thought in advance should be low-dimensional,
so like gene expression I figured
should be very high-dimensional, but it turns out
that there are facets of it that certainly look relatively
low-dimensional.
So this is kind of a copout answer,
but really you just have to try.
You have to see if you have data.
So the place to start is just with data.
You say, well, maybe I don't have the perfect observations.
That's fine.
But you need to start with some set of observations,
and then you can build from there.
And you might find that there are maybe two or three
variables that are very tightly related
and something interesting might come out of that.
So it really is, it's kind of like following your nose.
I mean, you don't necessarily have the whole plan in advance,
but what it requires is an initial set
of data, the willingness to actually give it a try.
So again, the dimensionality that we're
getting in these models is not an absolute number.
So surely any one of these things, even
the fisheries models, probably really, really
high-dimensional in principle.
However, for all practical purposes,
you can do quite well, and you can
measure how much variance you can explain by how
much predictability you have.
You can do quite well with about four dimensions.
Four is not a bad number of dimensions
to use in some of these salmon returns models.
So you can think of a problem hypothetically
as being very high-dimensional, but if you have data,
and that data actually shows that in maybe six or seven
dimensions you can get some degree of predictability,
then I think you have a little bit of insight
into that problem.
You've gained a little bit of advantage on that problem.
Yeah.
OK.
So it looks as if there is a bit of an assumption underlying
some of this where you kind of have to assume the underlying
manifold remains stable while you're
collecting these data to populate this shadow manifold.
So are you working on methods for detecting shifts
in that underlying attractor, whether you're
on a non-stationary, nonlinear regime?
Yeah, I mean, that's a great question.
So whether something is stationary or not
can depend on how long you observe it.
So for example, if you have motion
in one lobe of the attractor and then it
flips to the other lobe, do you say this
is an unstationary process?
No.
It just depends on how long you've looked at it.
You're asking a really important question,
and it's something that you can answer practically pretty
much by windowing, sort values in the past
to see if you're getting essentially the same dynamics
as you go forward.
The danger with that, though, is that you
can have problems where the essential dynamics-- let's
say it's an annual cycle of an epidemic,
for example, were the essential dynamics,
say, during the outbreak are five-dimensional,
but as the thing recovers, it collapses down
to zero-dimensional, becomes stable for a period of time.
And so what you're asking, I mean,
it really is an important question,
but I believe there are some practical ways of addressing,
but there is no simple universal way of doing it.
So maybe windowing is one way of doing it.
But again, you have to be careful that by windowing you
haven't artificially created what
looks like a non-stationary process that actually
is stationary.
And in the end, the way that you judge how well you've done
this is how well you can predict.
So if your predictions actually start
to degrade as you're going forward,
then you have some reason to be suspicious.
Yeah, yeah.
Oh, thanks for a nice talk.
I have a question.
Have you ever tried that the methods fail in some cases,
like if you use some mathematical principles,
like in what kind of system this method will be successful,
and then in what kind of system this method will not
be successful?
Can you describe it in using some mathematical patterns?
Yeah.
So one kind of system where they would may not be successful
is where you don't really--
a system that's really fundamentally stochastic,
right, where, in fact, there are no deterministic rules
of any kind.
But those are systems that as scientists we
like to stay away from, right, because what is there
to explain?
So my answer to this would be that I tend to like problems
where I start looking at them, and they're
giving me little sugar tablets of results.
And so I keep going in that direction.
And personally, that's how I operate.
So I would stay away from a problem.
And maybe that's why I'm not encountering as many problems
that are totally intractable is that they haven't--
I'm like an ant following a gradient of sugar.
They haven't kind of led me in that direction.
But it's a good question.
I don't think this is going to work for everything obviously,
right?
But I've just had pretty good luck so far sort of.
But it's not just luck, because I'm actually
following a gradient.
So I'm attracted to the problems where it seems to be working.
Yeah.
So the gene expression problem, I had basically written off.
So when I was initially approached and I thought,
this could not possibly work, they walked away,
and I thought, oh, OK.
I won't see these people again.
But then they came back with data,
and they showed that it did work.
And then we have this really good collaboration
going right now.
Yeah.
Just along those lines, it's pretty obvious
that it won't work in cases where your observations simply
aren't sufficient to fully describe the system.
So yeah.
It was also on the tip of my tongue to ask if you were
working at Deutsche Bank in 2008, but I won't.
What?
If you were working at Deutsche Bank in 2008,
but I won't ask that.
Oh, no, no, no, no.
No.
No.
So yeah.
No, I was there from '96 to 2002.
OK, that was safe, then.
Got out in time.
Last question then is, has any of the stuff
been extended to [INAUDIBLE] categorical types of data?
I think that it's possible.
We are working on something right now
that we're calling static cross-mapping, which is trying
to do that sort of thing.
We have some initial results that look pretty good.
But no, I think that's a really important area.
So we don't always have time series.
And it's much harder to get that kind of data
than it is to get--
[INAUDIBLE]
Exactly.
But there's another kind of ordering, of course,
that you can put on this data.
And I think that like in ecology,
it's much harder to find time series
than it is to find cross-sectional studies where
you have lots of samples of lots of species.
And there is a method that was in the end
that I had to flip through and not show
that basically allows you to--
if you've observed a system for a short period of time,
so if you're limited by the length of your time series,
but if there are many interacting variables,
you have an advantage, and that advantage grows factorially
as you have more variables, which is strange,
because it goes counter to our idea
that complex systems should be a problem.
So the curse of complexity we can actually exploit.
So the fact that these things are interconnected basically
means that each variable provides
another view of the system.
And so if you have many interconnected variables,
you have many different views of the system.
[INAUDIBLE]
Well, yeah.
What this is saying is that this is
kind of a way to counter the problem of high dimensionality,
that if you have a lot of interacting dimensions,
you have the potential for many alternative views
of the problem.
So if you did an embedding using--
[INAUDIBLE] embedding using each dimension,
each one gives you another view of the problem.
You can then actually do these mixed embeddings that
combine to take lags plus combine other dimensions,
and you end up then with a factorial number
of representations of the system.
And so this is actually a good way to reduce noise in systems.
There's a paper that came out last summer in Science,
it's called "Multi-View Embedding,"
that tries to exploit this.
Yeah.
So a couple times during your talk,
you alluded to having found the right lag,
i.e. the right value of tau.
Are their values of tau that perform better than others
in practice, and why is this?
Yeah.
So there no doubt are.
In ecology, we never do--
we rarely have the luxury of having oversampled data.
And so by default, the lag is typically
just one, whatever the sampling interval was.
So in the limit of continuous data, the tau shouldn't matter?
Oh, no, no, no, the tau will matter.
So if you are a physicist recording
something almost continuously in time, then the tau does matter.
So now you have two free variables.
You have to fit tau and you have to fit e.
And what you're doing, you want to choose
tau that allows you to unfold the attractor maximally.
And the way that you can determine that maximal
unfolding is by prediction, simple prediction.
Thank you.
Yeah.
OK, one last question if anybody has one.
No?
So this might be a bit of a naive question,
but where can one learn more about this?
Because it seems like it's relatively new.
There's advancements all the time
in our understanding of the world with this tool.
What realm is it under?
Is it statistics or biology?
Or what are the places that are doing research with this?
Yeah, so it is relatively new.
My lab has produced a bunch of papers dealing with this.
There is a software package now, it's
called rEDM that's on CRAN that has a tutorial,
and it discusses some of this.
But no, I need to write a review paper or a book or something
that puts it all in one place, and that hasn't been done yet.
So yeah.
But the software package is good.
My student put it together.
So I had all this horrible research software
that really was not intended for human consumption.
It was like for me in my pajamas at home.
But he rewrote it in R, and we put
in a very nice sort of tutorial with it
to kind of explain some of the basics.
But it's amazingly easy to use, and the ideas
are actually quite intuitive.
And it's something actually that I
think a number-- it is gaining.
It seems to be accelerating in usefulness.
And the citations, for example, are just like doing that.
So I think, again, having something that looks good
or that sounds good or that seems interesting
is very different from having something that actually works.
My lab is pretty pragmatic.
I say, these things actually have to work.
To make it easy for people to understand how they work,
we have to provide our markdown so that everything
can be exactly reproduced, and the code has to be there.
I would encourage you to check out the rEDM code.
Yeah.
Yeah.
All right.
Thank you very much.
Please join me in thanking your speaker.
[APPLAUSE]