Remco van der Hofstad - The Structure of Complex Networks: Scale-Free and Small-World Random Graphs

Subtitles section Play video

[MUSIC]
My pleasure to introduce to you professor Remco van der Hofstad.
Actually I met Ramco in Brasil.
That's how we met first time.
But I heard about many of his great work.
Ramco is professor of probability at Eindhoven University.
He's doing a lot of work in the stochastic networks.
And you'll hear something today.
Ramco won many beautiful prizes.
Among other, Prix Henri Poincare, 2003, and Rolo Davidson in 2007.
He's also the director of EURANDOM, but
today he will be talking about structure of complex networks.
Small worth and scale-free random graphs.
Ramco, please. >> Thank you very much, and thank you for
the kind introduction.
So I'm a probabilist, working from the math department.
I'm interacting as well with engineers with various different backgrounds.
I'll be giving a talk that hopefully is at a relatively high level.
If there's anything that I'm saying that is unclear to you,
do ask because I'm giving this talk for you guys, and not for myself.
I'm supposed to know what I'm talking about.
Let's see whether that's indeed true.
And what I'll be talking about is structures of complex networks.
And this is very long line of research that we've been seeing all over the world.
Basically starting slightly before 2000, and
all of the empirical observation that have been found,
we'll discuss some of those as well, and then all of the efforts in modeling and
driving consequences of the observed phenomena.
And that's what I'll talking about today, all right.
Complex networks.
Here are two examples, certainly the one on the left
is a very famous one from a paper of Balabassi and others.
And this is a picture about
the interactions that exist between proteins in a yeast cell.
So first of all, there is quite a few proteins as you can see.
And there's quite a few interactions between them as well.
And the interactions come in different forms.
It could be that two proteins together form a third protein.
But it could also be that certain proteins act as catalysts for
reactions that others are being involved in.
So the edges here can really signify lots of different biological interactions.
And you can imagine that producing a picture like this
is an enormous amount of work in biology, because you've gotta figure out what all
of the different reactions are that are taking place in one of these yeast cells.
And one of the things we see is that it's a fairly complex picture, and
that's my very informal definition of what a complex network is.
It consists of many entities and
their many connections between them and it's pretty complex.
I don't think that there's any better definition of what a complex network is,
even though lots of people are using this terminology.
Now on the right, we see an artistic impression for
what the Internet Topology looks like in 2001.
I don't think that the artistic pictures are going to be changing a lot over time.
But again what you see is that, like here, we're on the outskirt,
you have things that are very loosely connected to the inside.
Which the inside being sort of a very highly connected bit,
sometimes called the core, certainly in Internet.
You see something very similar here.
Here, the connections are so dense that you can't even visualize them.
Whereas on the outskirts here, you see lots routers,
as we're talking about Internet here, that are only connected through one or
two links to the rest of the network.
Now, these networks come from completely different backgrounds.
And you can imagine that there's many more backgrounds to be observed.
Social networks, acquaintances, but also sexual relations, collaborations, etc.
Information networks, technological networks and biological networks.
So these networks come from various different backgrounds, so
it's actually quiet surprising that they share something.
And that's actually was one of the most profound empirical observations.
And many of these networks have things in common.
And that of course raises the question of whether these things that they
have in common are typical in such networks,
whether almost all of those networks have those properties, or
whether it is something very special, and if it's something very special,
what are the underlying principles that actually give rise to these properties?
So these are very broad questions in science.
I'm a humble mathematician, so we can certainly not answer all of those
questions for all of these different disciplines at the same time, but
it's just to indicate that network science is a very interdisciplinary field in which
lots of different communites look at their networks in their own different ways and
give rise to very interesting questions that are on the interplay with
mathematics.
And mathematicians can really say something sensible here and
we help these petitioners from different fields in answering their questions
All right, so very basic.
What are we talking about when we're talking about a network?
Well, in general, we're just talking about graphs, and
graphs consist of entities that are connected to one another.
So the entities are called vertices.
Sometimes nodes or sites, depending on the precise fuel that you're talking about.
And then you have connections between them.
And these are called edges, sometimes bonds, and
there's probably many more names for them.
And you should really think of these edges as being the building blocks of
the network.
They really indicate which vertices are interacting with which other vertex.
And you can think of an edge as being the building block of relational data.
Really think of an edge as indicating that there is some relation,
whatever the relation means, between the two vertices on the sides of the edge.
So that's what we are talking about here.
Now there's a lot of confusion about Internet and the World Wide Web, and
many people treat these two words as being exchangeable.
But they're really not.
They're really very different.
So when we're talking about the Internet,
we're really talking about the physical Internet.
So this means that there are routers, and
these routers are connected to one another by physical cables.
And that actually allows us to send emails, but
also to do our searches on the World Wide Web.
So that's what the Internet is.
Something physical, and therefore it's large, but it's not humongously large.
When we're talking about the World Wide Web, we're talking about something
which is much bigger because the vertices in the World Wide Web, or web pages, and
you connect two vertices with one another when there's a hyperlink between the two.
Now, we can build as many web pages as we wish, and there's
the estimates of how many web pages there are, are ranging in the trillions.
So this is a virtual object.
And it's much larger than the Internet as a physical system.
All right, but they're both manmade.
Now, what are some of the properties?
Well, if you think about the Internet, it's very large, it's chaotic.
Again, this word of a complex network, it's fairly complex.
Yet it's fairly homogeneous.
And there are studies about that.
So for example, if you look at the amount of connectivity,
let's say, within a continent.
And how many hops do you need in order to send the message between two sources
within a continent.
You don't really get a much different result compared to
the same question in the entire network.
So that's what I mean with it's fairly homogeneous.
Of course, it's also not quite homogeneous in the sense that
the routers that are there, some and certainly the cables that are there.
Some, of course, transport much more information than others.
So in that sense, it's not homogeneous at all.
It is connected by default because if the Internet would not be connected you would
not be able to communicate on it.
Now for the world wide web it's directed,
that makes a difference, it's much larger than the Internet by itself.
It's extremely hard to measure.
Just how the hell are we going to find, for example,
a uniform web page out of the collection of all web pages.
Well, even Google does not know all the web pages that are there.
The estimates are that Google probably knows about 50%,
maybe 60% of the web pages.
And the rest, it doesn't know.
I'll just give you an example of a part that probably not many people know about,
the dark web is an example.
And there's good reasons why people wanna keep the dark web dark, right?
So it doesn't wanna have Google accessing it.
So it's very difficult to measure.
There's no reason why it should be connected, and it isn't.
So it really is a completely different world between the two
networks, that are playing such a profound role in our lives.
So here's a very simplistic picture of what the web looks like.
So there is something called the strongly connected component.
So this consists of all the vertices for
which you can travel both ways between pairs of vertices.
Then there's all the stuff from which you can go to the strongly connected component
but not back.
There's all the stuff that you can reach from the strongly connected component.
But you cannot go back, and then there's all sorts of stuff in between.
So this is very simplistic sort of cartoon picture of what
the world wide web looks like.
And this is due to.
This is already quite a bit older,
which is also why these numbers are not that humongously large.
All right, so
I was already talking about the common features that these networks have.
And what are those?
Well the first is what physicists are calling the scale free property.
So scale free basically means that there's no typical scale.
Let's look at two different statistics which really are very different.
So if I look at the height of people in the Netherlands,
then this is pretty well concentrated.
So men on average are, I think, 185 in centimeters in the Netherlands.
I have no idea how much that is in feet and inches.
And if you add, let's say,
20 centimeters on either side, you have the majority of people.
Now there are people who are quite a bit taller like up to 210 and so on, but
that it's really rare.
If you would add 40 centimeters you basically have everybody.
So you could say that 180 or 185 is the typical height of a Dutch man.
Now on the other hand if you look at incomes that's a completely different
order of magnitude.
So I've understood that the model income in the Netherlands is 35,000 Euro.
That's $45,000 US.
But you have tons of people who actually earn ten times as much.
Have some people who earn 100 times as much.
You have few people who actually earn 1000 times as much.
That would be like having a Dutch man who is 180 meters tall.
Right, so you can immediately feel that there is two different sets of data.
That are just not combinable, that are completely different.
So you have the normal world where actually normal approximations work
pretty well.
And then you have another world where you see heavy tilt behavior.
You see values that are much larger than you would expect.
Now, that's the kind of property that we see also for the degrees in networks.
We see huge amount of variability.
Even though the average degree may be something like three or four or ten.
It could be that the maximum degree is 10,000 or even 100,000,
orders of magnitude larger.
Now physicists are saying that this is related to
power law in the degree distribution.
There's a lot of debate about this.
There is also a recent paper about Claussette and
one of his students that actually debates that this is true.
So, there it is, this is very interesting discussion, which,
which probably is going to have all sorts of effects.
But nobody argues the fact that you have huge amount of variability in
the degree distributions as they are.
So, this is the Internet movie database.
It's a network at the time that we were.
Investigating it, it had size roughly a million.
And what you see is that there's there are individuals who have degree ten
to the power of five.
So that's pretty humongous.
And here we're talking about the Internet.
This picture looks much smoother,
that's because some sort of binning has taken place here.
That is consistent with it being a parallel.
This is a picture by Krioukov from Northeastern University.
And again here you see lots of variability.
The majority, because it's in log-log scale this picture, as you can see,
the majority of vertices is actually here.
Let's say 90% is here.
As is here 90% of the data is here.
But then on the right here you have very rare
individuals in this case actors that have a humongous degree.
And here you have on the right, very rare routers that have a humongous degree.
So that's what is called scale free.
It's not entirely obvious why it's called scale free.
You could say that it's called scale free because there is a parallel.
But you could also somehow more inefficiently argue that,
actually, if you were to somehow group vertices together in blocks of five.
Then actually the degree distribution wouldn't change all that much,
at least this part wouldn't.
So there's also some sort of self-similarity or scale free-ness there.
Now we are plotting this in a log-log scale, and
that actually means that a straight line is very telling, for example, like here.
And a straight line is related to what is called a parallel,
meaning that the proportional vertices of degree k actually scales like an inverse
power of k.
And that actually means that, of course, this drops down to zero.
But it drops down relatively slowly.
And it actually means that you are bound to see very large values.
You should be thinking about the largest value in such a network corresponding to
a size, that is a network size to a positive power.
1/12-1, so that's actually pretty large.
Here is more unofficial but
math-y explanation of why you see a straight line.
Let's look at my degree sequence of the graph.
These nks is the the number of elements in my network
that have degree k, that have precisely k friends.
And if we were to have a relation like this,
then taking the log Which actually is precisely what a loglog plot does.
Taking a log means that we actually get a constant term that does not depend on k,
and then there is a linear term in log k.
So there's a linear relation between the log of Nk,
at least an asymptotic linear relation between the log of Nk and the log of k.
And that actually is precisely what we see here,
a linear relation between the log of pk, Nk and the log of the degree involved.
So power loss mean straight lines on a loglog plot.
All right, and that's what is being observed in many of these networks.
So the pictures that I'm drawing, just two examples.
But you actually have way more examples that look very, very similar.
So that's the first property that many of these networks seem to have in common.
The second thing is that almost all these networks turn out to be small worlds.
And again this is something that is not quite well defined but
it sort of means that the distances in searching network
even though the network is humongous, millions.
The distances within the network are not very large.
So that means that if I were to pick
a pair of individuals in my network soon that the network is connected.
And if I would look at the number of edges in the shortest path between
these individuals, I would get a number that is not very large.
And for social networks, this is related to six degrees of separation, it basically
says that the shortest paths are on average not more than seven inches, okay?
Now here are two examples.
This is distances in the strongly connected component in the world wide web.
This is the distance in the Internet movie
database of which I just showed the degree distribution in the loglog plot.
And what you see is that these distances indeed are not very large.
Here the maximum value appears to be 9.
Here the maximum value appears to be 7.
I say appears to be because it could be that there are very rare pairs here,
we don't really see that are creating the diameter.
The diameter might be much larger, right?
But on average, the distances are not very big.
So that's what is called the small-world phenomenon.
And as I said,
this is of course very closely related to six degrees of separation.
There is a quote where the lead actor in the movie play,
Six Degrees of Separation, says the following thing.
Everybody on this planet is separated only by six other people.
Six degrees of separation.
Between us and everybody else on this planet.
The president of the United States.
A gondolier in Venice.
It's not just the big names.
It's anyone.
A native in the rain forest.
An Eskimo.
I am bound to everyone on this planet by a trail of six people.
It's a profound thought.
So this quote was from a Hungarian writer and he wrote this in the beginning
of the 20th century, so already then people were believing this to be true.
But you know the world is getting smaller and smaller, so
this should be even more true now.
And if you want to understand where this originates from,
it actually is helpful to think in sort of social hierarchies.
So if I want to connect to somebody whom I don't know at all in the Netherlands,
how should I do that?
Well, there's one thing that I could do.
I could try to connect up to somebody very famous.
And if that other person can also connect up to that person that is very famous,
then we connect up to each other in relatively few steps.
So in the Netherlands, I would say take the king.
Well, for me, it works a little bit better to take the previous
queen who now no longer is queen because she abdicated.
But she was at the 50th birthday party of our university.
And there she actually shook hands with our Rector at the time whom I've shaken
hands to, so we're at distance two, I'm a distance two from the previous Queen.
So if you find somebody else who was also at distance, let's say at most three or
four from the Queen then I'm already there.
Now you might argue that maybe I'm not typical, well,
I would say that probably I am but nevermind.
But everybody in the Netherlands knows somebody who is famous, or
were famous, moves in higher social circles than the person him or herself.
It could be the director of your high school.
It could be director of your university.
It could be the mayor of your town if you happen to know this person.
But it could be anybody.
And these people tend to have a larger social network.
I'll get to you.
They have a larger social network and from them, you can again make this step up.
And by that you sort of move through the ranks of your social system and
in a few hops you can be at the top, whatever the top is.
I'm not saying that there's a social order here.
But that's basically the explanation of why distances are very short.
And actually we see that coming back in many of the,
sorta the mathematical models that we build for these networks.
And you can somehow move up very quickly from vertices with very low degree,
to vertices with very high degree in just a few steps.
Yes? >> Would you say with the invention of
Google and the Internet, now all of this.
My basic problem is that I find the term know to be really,
really subjective and. >> Yes I agree.
>> So now if I just Google the Queen of
Harlem, >> Yeah.
>> And that's it,
do I already know her?
>> Yeah. >> And
[INAUDIBLE]- >> So
I was implicitly using a slightly more concrete way of knowing,
namely having shaken hands. >> That makes it extremely concrete.
You either have shaken hands with a person or you have not.
Cuz otherwise you run into all sorts of problems which are a bit more social
science.
I mean, having a friend is not such a clear notion.
You might change your mind over time, and in fact, you might think somebody to be
a friend, whereas that person does not think you to be a friend.
So let's not go there.
But I mean, you could think about Facebook for example.
Then again, it's very clearly defined what it means to be a friend.
Even though it's perfectly obvious that our Facebook friends are not our friends
necessarily, right?
So we have to be very careful.
But this is a matter of definition I would say.
Does that answer your question?
Yeah, so indeed, Facebook is a very good example.
And Facebook, of course, is lying under scrutiny for all sorts of privacy issues.
But that's not what I'm going to be talking about.
It's a beautiful example of a network of which, well,
at least somebody has all the data.
If you're thinking about social networks,
actually mapping the social network is a big problem.
Where do you get your data from?
You cannot go and ask people on the street who their friends are.
You're very unlikely to actually be mapping a social network at all.
Because how likely are you going to be walking around on the streets
to actually find the friends of the persons that you've already asked?
So this is very difficult.
How do you do it?
So some people do this in a very small scale in high schools or whatever.
But Facebook or other virtual friendship networks
give you a possibility of doing it with data and the data is all there.
Now there has been a big study of Facebook data around 2011.
At the time, it contained 721 million active users.
That's actually pretty big.
Messing and doing computations with network data that is that big
is a nontrivial issue which is why these were computer scientists.
Actually, one of the big questions was how to determine what the diameter of
this network is.
Turns out to be 59, that's pretty big, much larger than 6, right?
It's difficult also because there are 69 billion friendship links, so
algorithms are being used here, clever algorithms.
But, of course, if you want to just know what
typical distances behave like, you can just take a statistics approach, right?
Just take pairs uniformly at random, bend them hypotenuse histogram.
If you have enough pairs, then you will probably be pretty close to the truth, but
you're not going to see these outliers, right?
It seems to stop here at 7.
Well, we now know that it actually goes all the way to 59, but you don't see that.
This is the plot, actually, distances are way smaller on Facebook.
It's on average something like 4.7 at the time that this was measured,
and that may be a sign of the fact that distances are indeed shrinking.
Also, it turns out to be relatively homogeneous.
If you look at the distance distribution over different parts of the network,
Italy, Sweden, and
the US were taken, then actually the plots look relatively similar to this.
So again, it has the same feature that the internet has, it's fairly homogenous.
If you look at the smaller part of it, and
actually it seems relatively representative of the entire thing.
All right, now let me give one example as to why network
science may actually be useful, because you may wonder about that.
And this is the friendship paradox.
This is well known, at least I know this very well.
If I look at my friends on, let's say, Facebook,
I always see that they have many more friends than I do.
Now that may actually mean that I'm socially not very capable,
that could actually very well be.
Let's not go there, but actually it turns out that this is in general true.
And this is interesting because it gives a mathematical explanation for
a feature that many of us have observed in the real world.
So, how do we go about showing this?
Well, first, it's actually useful to look at what the average
degree, the average number of friendships is of a random individual in my network.
So, what do I do?
I look at the proportion of individuals that have the Greek a multiplied by sum
over k, I get the expected value.
I can do this arithmetic and
you just twice the number of edges divided by the number of vertices in your network.
That's what it is.
But that's not quite sort of a random friendship.
So what does Wikipedia say about that?
Well, it's a modelling issue.
Basically, it says that the average number of friends that a typical friend has
can be modelled by choosing, uniformly at random, an edge of the graph and
an endpoint of that edge.
And again calculating the degree of the selected endpoint.
You should say the average degree, I'm a probablist, so the expectation, right?
If you do that, and if you call the degree chosen here
by D star which is a random variable, you can compute the expectation, and
it turns out to be the expected degree, same thing as here.
Pause a little bit,
something proportional to the variance of the degree divided by the expectation.
So unless, your in a situation where in the network everybody has the same degree,
then of course the two things are going to have same decision and
therefore the same average, unless that is the case, it will always be the case.
That this typical friend
has more friends than you do, okay, this is just a mathematical computation.
Now I wasn't particularly convinced by this because you know,
I think of myself here as being a uniform individual in this network.
But then if I look at one of my friends then it should be a uniform friend
of mine.
That's not what this is doing, right?
It's taking the uniform edge that does not have the same distribution, But it turns
out that you can do precisely the same computation and the answer is the same.
Distribution is different, but the expectation is still going up.
So, if I take a random individual from my network,
for any network irrespective of the precise topology of the network.
If I take a random individual, look at the expected degree that it has, and
compare that to the expected degree of one of its random neighbors,
that random neighbor on average is going to have higher degree.
So, your friends now have more friends than you do.
So, just to give an example that somehow this abstract notion and
the abstract setup of growth theory can actually help you
understand the phenomena in the real world.
[COUGH] Now, of course,
we do network science for a reason.
I mean we do that because we would like to look at this networks and
try to understand how they've originated or what are the properties or
in a social network, you tend to see more clustering.
And your friends tend to know one another a bit more than on average.
How do you quantify that?
So that's what you use network statistics for.
If you do empirical data, you collect lots of statistical network statistics.
So the average degree, for example, the degree of distribution,
these are all network statistics of some sort.
There's a filed that have been investigated as well.
So as I said clustering.
And that's just a three times the number of triangles
divided by the number of connected triplets.
And you could say that this is the proportion of wedges in your network for
which the people on the top of the wedge actually also know each other.
So if this is pretty large then actually you have lots of triangles
in your network.
That's typically what we see in social networks.
So if you were to compare this to any network model,
you can say my data is actually more clustered than I would expect on
the basis of whatever model you would be taking for your network.
That could be a useful conclusion.
Another thing that people have been looking at is
degree degree correlations or assortativity.
So do people with high degrees in your networks they are special
do they typically connect that up to people that also have a high degree or
rather a lower degree.
It's claimed that in technological networks,
it's typically of the second kind.
The high degree nodes are typically connect to low degree nodes.
Whereas in social networks,
it tends to be more assortative in the sense that the vertices with high
degree are connect to vertices with typically also high degree.
Okay, so this is distinctions in the network topology
that probably have a reason.
And one way of measuring this is using the assortativity coefficient
which is just the correlation between the edges,
between the degrees on either side of an edge So for every edge,
you write down what the degree of the vertex is on either of the sides and
you compute the correlation coefficient between that.
So that's what Newman proposed as the coefficient.
You have to be a little bit careful with this.
Because if you're doing correlation data, correlation studies,
Correlations only make sense in the world where we have finite variance.
But here, we're dealing with degrees, and degrees have infinite moments.
So we have to be very careful.
So we constructed some nice counter-examples that actually produce
rubbish results, obviously, in networks.
Still this Measure is used a lot.
And the paper has over 2,000 citations.
You have to be careful Alright, now another thing that a network
possibly allows you to do is to quantify or investigate who's important in it.
But then when I say important your immediate reaction should be,
important for what?
That's the whole point, right?
And it makes a big difference what you're important for.
So that actually has given rise to research
on what are called centrality measures, who is central in your network.
And there are different notions.
Closeness centrality basically means that you're on average close to
everybody in the network.
And you can quantify that by sum of distances between you and
everybody else in the network.
That's closeness centrality.
So vertices with low closeness centrality are really, in terms of graph distances,
very central to the network.
There's also something else, which is called Betweenness centrality.
And the people with high Betweenness centrality tend to be the people
who are actually connecting different communities.
They're also important.
But important in a different way, okay.
So, Betweenness is large for vertices that form bottlenecks.
If you have two communities that only share one individual then actually that
individual that is on the boundary between the two is going to have a very high.
Now the last version of a centrality measure is page rank, of course,
a lot of research has been devoted to this on computational issues but
also on interpretation issues.
And page rank basically measures the extent
to which a vertex is visited by a certain random walk.
But it's a random walk with restarts, so this is sometime called a bored surfer.
And this actually got Google started.
So it's not entirely clear whether this page stands for web page or it stands for
Larry Page, one of the founders of Google.
I certainly don't know the answer to that.
Whether Google still uses it is not entirely clear, Google does not disclose
what algorithms it's currently using in showing you the results of your searches.
But certainly, Google has had a very important impact in sort of how to do
searches, because it really meant a change in philosophy on how to search.
So previous searches were more like looking up
a telephone number in a telephone book.
Whereas Google was ordering things in terms of somehow their relative importance
and then based on theories.
It's a completely different way of searching.
Now, I've done searches before the Google era,
that may not apply to everybody here, but it was really a world of difference.
So when Google just started, the top few
items on the list were really the kind of things that you were looking for.
Whereas previously, you got all sorts of rubbish.
And you just had to look, scroll your way through all the answers
finding the thing that was actually relevant for you.
So a world of difference.
You see what algorithms can do for you, all right.
Now I'm a probablist and a mathematician.
And one of the things I do is think about how to model such networks.
Because that allows you to actually study some of the properties that we were just
looking at in toy models for the real world.
Now, we said something about two specific models.
These are all random graph models but
somehow the connections between vertices, they appear randomly.
Probability theory is a very good way of modeling complexity and this networks
are very complex, so it pretty reasonable to look at random graph models.
And this random graphs give rise to random connections between the vertices.
And you may wanna fix some parts of this network, for example,
the degree distribution or something else to let your random graph actually look
like the network model, if you're currently investigating.
So how to do that?
Now, there's two approaches here, two different modeling perspectives.
You can look at a network at a given time, Facebook now, and device or
model for that.
That's a static model, the network does not evolve.
But you can also look at dynamic models, basically trying to model how the network
got to where it currently is, by growth probably.
This networks are pretty big, okay.
So that's a dynamical model.
So I'll describe one model here, actually two, and
one pretty famous model preferential attachment model, also here.
Now, if we are modeling things with different models,
we are of course immediately raising the question whether the conclusions we're
drawing in the different models are the same.
It better be because if we have two ways of viewing reality and
they give different predictions then we don't know what to trust.
Now that's sometimes in physics called universality.
You want models with similar features to actually also behave similarly, and
if they don't you just don't know, which one to believe.
So that's an important line of research to compare properties of random graph
models that are being proposed to model the real world and
compare how they do, compare how they behave.
So the very simplest model that you can imagine is the graph.
So this is a static model, I'm not saying that it's an easy model but it's, that it
has easy properties I should say, but it certainly is an easy model to formulate.
So what do we do?
We start with a vertex set that it's fixed 1 up to n, and
then we just draw an edge between any pair of vertices with a given probability p,
independently of one another.
Now, most of my models are sparse, so you need to let p tend
to zero as a function of n, but that's okay, okay?
This has been investigated for a very long time.
It was introduced actually by Gilbert in 1959.
Erdos and
Renyi proposed a model that was very similar where you fix the number of edges.
Here we're doing it randomly, so this is sometimes called the binomial model.
And it was actually invented for a technique,
which is called the probabilistic method.
So suppose we want to prove something deterministic,
we want to count something or we wanna prove that certain graphs exist.
Now the easiest approach to do this is to show that, well,
the Erdős–Rényi graph has that property with positive probability,
and then there must be graphs that actually have that property as well.
That's a very simple example of what is called the probabilistic method.
And a little later we'll probably see one more.
It has one downside this model in the sense that it's completely egalitarian.
Everybody plays the same role.
And that's not true in the real world.
We see huge fluctuations in the number of connections that people make.
That's not going to be true in this model, so it's not a good model for
a skill free networks, so you have to change it.
And the easiest way to change the model is by,
just fixing the degrees to be whatever you want them to be, okay?
So this called the configuration model.
It was already introduced in the 80's, but
it was really popularized by Newman Stroggerson Watson in 2001.
Again, the same number of vertices it's a static setting.
And now for every.
Vertex, we prescribe what the degree is exactly.
So vertex one has five friends, vertex two has ten friends, etc, etc.
We know that example.
And sometimes we'll take these to be random variables, that's perfectly fine.
And of course, because of the scale free nature of many these networks, we often
restrict ourselves to power loss, but this basically sets the stage for the model.
And I haven't yet told you how the connections are taking place.
And I find that this always works better when visualizing this.
So this is an endeavor that we've been going through the network's program.
This is called the network's pages, and there we have some blogs and
articles that are meant for a broad audience interested in networks.
But we also have lots of demos.
That's the algorithm, very appropriate, that was from Eindhoven,
and lots of other Erdos-Renyi graph with distances,
configuration model, etc., just simulation tools.
So this is what the simulation tool looks like for the,
Configuration model.
Negative number of edges does not make sense, it's perfectly right about that.
All right, so what you can see here, if you look very carefully,
is you see here a number of little stubs coming out.
These are half edges, two half edges need to be paired,
and then they form an edge, very simple.
How do we do that?
Uniformly at random.
So we take a half edge and
we connect it to any of the other half edges that is there.
Pair it, remove it, repeat, okay?
And then something like this occurs, it's almost done, I think.
There you go.
That's a network.
And this is how it would be normally drawn.
So it does it completely uniformly at random, yes?
>> [INAUDIBLE] could result in two
verticers [INAUDIBLE] more than one edge? >> There you go.
Absolutely true.
Yes, even worse, it could result, that hasn't happened here,
which is pure chance, it could result in vertices being connected to themselves,
which is also typically not what we see in real world networks, okay?
We know that this is going to occur,
these multiple edges in the self loops, they're pretty rare.
So we're just going to ignore them.
Alternatively speaking,
what you could also do is say, I'm just going to draw this.
And if there are any multiple edges or
self loops, I just toss away the result, and then I do it again.
It could be that you have to do it very often, but theory says that you don't,
because the probability of simplicity is actually strictly positive.
That's nice.
And then the second nice thing is that if you were to condition on simplicity,
no self loops,
no multiple edges, actually the result is completely uniform at random.
That's a mathematical statement.
So this is a very interesting and very simple algorithm that will allow you to
draw a uniformed random graph with prescribed degrees, And that
actually is not so simple because we don't even know exactly how many there are.
Well, of course, when it's a network of size ten, you can just compute them, but
what if the network has size a million?
How many graphs are there of size a million where every vertex has
degree five?
Well, that's not so simple, right?
And then drawing one uniform at random and
you don't even know how many there are, that's not so simple either.
So this is another example of the probabilistic method.
With this configuration model,
you can actually compute asymptotically how many there are approximately.
>> [INAUDIBLE]
>> We do know it
approximately. >> What
is- >> For many,
>> [LAUGH]
>> Chapter seven, it's there.
>> [LAUGH]
>> Now, we do know it approximately, and
there are many ways to do it.
This is an example of how you can do it in a setting where the degrees are sparse.
But there are many other methods as well that allow you to do it when the degrees
are not sparse.
So for example, this has been started out using,
for regular graphs, random regular graphs where all the degrees are the same.
But similar methodology will work for degrees that are different.
So you can compute it approximately.
All right, so that was a demo.
So that's the model.
And this is the model in words.
Now you may be worrying about the fact that we're actually
choosing this order in one way or another.
Now, if you're a probabilist,
it turns out that this matching is completely exchangeable.
So the order doesn't matter.
The order of course does matter in the result that you're going to be getting,
but not in its law, not in its distribution.
That's actually pretty nice.
All right, so that's the configuration model.
Now I was talking about two separate aspects of complex networks,
namely scale free and small worlds.
Now here I'm fixing the degrees, so
I can no longer ask the question whether it's scale free.
It's scale free when the degrees that I put in myself are scale free.
And otherwise it's not, so that's not an interesting question anymore.
Buy you can ask the question about graph distances because there is no
reason whatsoever why this model would be a small world.
Now this is what you would get if you were
to put in IID degrees from a parallel distribution, and
then plot the occurrences of the different vertices of different degrees.
It looks exactly like what we see in real complex networks.
So that's a good sign.
So how about graph distances?
Well, it turns out that in this world,
there are two different domains depending on the precise degree distributions.
So there's one setting in which
we're always assuming that the average degree here is bounded, but
it turns out to depend on the second moment of the degrees.
If the second moment of the degrees is bounded, then we have one reality.
If the second moment of the degrees turns out to be unbounded,
we have another reality.
So when the second moment of the degrees is bounded,
distances grow logarithmically.
Now, of course, log over large number is much smaller number.
So you could interpret this as being a sign of the small world phenomenon.
But in the case where the second moment is infinite,
that means that there are vertices that have humongous degree that should,
of course, make distances shorter.
You would believe.
And indeed, that is the case,
then it turns out that graph distances are of the order of log log n.
Okay?
So it's either log n or it's log log n.
Compare this to settings that are more common to us, for
example you take a large torus.
If I want this to have n vertices,
then the side of the torus need to be square root of n.
For 2 versions, uniformly random.
The typical distance will be of the order square root of n.
Now networks are very large, so
the square root of something very large is still extremely large.
Here it's the logarithm that's much smaller.
So, small world, much smaller than it would be in a geometric world,
for example.
And this is a picture of that, this is a picture of log x
for sizes up to, let's say, the size of the world population.
This is a picture of log log x.
Now if I go and stand over here I can see that it's not a constant.
But you'll be hard pressed to see this, well, except for the beginning, of course,
where you could see something happening.
So for all practical purposes, log log n is a constant.
So small world.
All right, that was one model.
A second model is a more dynamic model.
It's called the preferential attachment model.
Now why is this interesting?
Well, it's interesting from a philosophical point of view.
Because we're seeing all of these complex networks around us.
And many of them empirically seem to have parallel degrees, why?
Well, we don't know.
So there's certainly demand for a mechanism that will explain,
let's say, from first principles, as physicists would say,
a very simple macroscopic rule that will immediately produce power laws.
That's what preferential attachment does.
So what does it do?
So this is from a paper of Albert and Barabasi,
which by now has almost 31,000 citations, so that's humongous.
So this has been an extremely influential paper,
even though they only did simulations.
And they didn't even define the model properly.
But, okay, it was extremely influential and
it also contained a very interesting idea.
This is the idea, suppose that you have a network that grows in time.
It's dynamic, so that means that at time n a vertex comes in at time n + 1.
A next vertex comes in etc, etc.
And we'll keep it very simple, we're just going to be assuming that
every vertex comes in with a fixed number of edges to give.
So it comes in, and it has 3 connections to give.
It gives it to vertices that are already there, how does it chose?
Well, it chooses in a way that is preferential to vertices that have high
degree.
In fact, it chooses it proportional to the degrees plus a constant of the vertices.
That's what it does here, okay?
So I have my nth vertex coming in, it has m edges.
And it gives them with probability proportional to the degree of the vertex
that I'm interested in, Plus a constant.
Now Albert and Barabasi didn't have this constant.
And actually the constant is very, very crucial, why?
Well, it turns out that this model with these very simple rules produces
random graphs with power law degree sequences.
This is a plot of what happens when m = 2.
This parameter = 0, so that's Albert-Barabasi model.
And n = 10 to the power of 6, again, a log log log plot of the degrees.
What you see is something which very strongly resembles a straight line.
And you have some stochastic fluctuations here,
just like in these real world networks.
We can prove mathematically that in this model the degree distribution satisfies
an asymptotic power law with an exponent that turns out to depend on all
the parameters of the model.
It's 3 + delta / m, and now you see why the delta is relevant.
It allows us to change this power law exponential.
And that's relevant because the empirical data suggests we get networks
with various differentials.
Yes?
>> [INAUDIBLE]
>> There is tons of different models.
There are models where this, indeed, occurs, here, it doesn't.
If m is greater than or equal to 2 with high probability after a long time,
the network will be connected.
So there is no phase transition like in graph or even the configuration model.
There are also models where somehow there's more randomness in how you
connect up your edges.
So rather than saying that you have precisely m edges you connected up
with this probability, you connect up to every vertex
with the probability that is this one we normalized.
And then you do see, sort of, occurrence of a giant component and
phase transitions and the like.
Does that answer your question?
Okay, otherwise you can ask me again, later on.
Now, again, we have a model now that, as for
the configuration model with power law degrees, produces power laws.
So the question is,
does this behave in a similar way as the configuration model with similar degrees?
Now physicists were saying that this is absolutely true, it has to be true.
But that's not mathematical proof.
So the rigorous result are of the following kind.
They're actually a bit weaker than the rigorous results for
the configuration model.
That's because this model is just inherently harder.
You have this nice, sort of,
uniform attachments probabilities in the configuration model.
Here, you don't really have that.
You really have to take the dynamics into account.
That just makes the model much more difficult.
But the results are very alike, the results in the configuration model.
Forget about the first theorem, look at the second theorem.
It says that if m is greater than or equal to 2 and tau is in between 3 and infinity,
that means that you have finite second moment of the degrees,
which is the same thing as saying that delta is strictly positive.
Actually, typical distances are of order log n.
We don't know the exact constant.
But it's still of the same order of magnitude as in the configuration model,
which we see here, so that's this case.
Whereas, if delta is negative, meaning that tau is in between 2 and 3,
graph distances grow like log log with a constant that we can compute
that looks a lot like the constant that we had here.
Except that we have a 2 here and a 4 here.
Distances are twice as large, roughly.
And the reason is very simple.
If I take a vertex of very big degree, I was already describing this example of
how you can connect up by connecting up to the queen or whatever,
somebody who has very high degree.
And if you do that, what you typically do is,
you start from a vertex that has very high degree, somewhere along your path.
And you're going to be connecting that to a vertex that has even higher degree.
And that you can do in the configuration model in this regime.
In the preferential attachment model it doesn't work that way.
If you have a vertex of very high degree,
it will not be collected directly to another vertex of much higher degree.
But it will be indirectly, so you need an intermediary vertex
in order to go from a degree to a vertex of a degree that is much higher.
That actually makes distances twice as large.
And that's precisely the effect that we see here.
So if you look at these results, they're certainly very similar as the ones for
the configuration model, and that's good news.
And actually there are tons of other results saying that, if you start with
random graph models, you see different behavior when the degrees have infinite
variance compared to the setting where you have finite variance.
Finite variance,
typically logarithmic distances, infinite variance typically log log distances So
that's the kind of universality we are after in this business.
Okay, there are many more models.
Certainly the models that I am talking about are not good.
Why not?
They have low clustering, they have very few short cycles.
That's not true in the real world.
They don't have any communities,
whereas we know that most complex networks do have communities.
And they're just too simple, we're recording nothing.
Just connections, right?
Whereas in a social network, certainly social scientists are interested in
all sorts of attributes, gender, age, maybe where people live, etc., etc.
So these models really are a caricature of reality,
yet we can learn something from them.
That's in a sense what applied math is about, learning something from models.
This I won't do.
Conclusions.
So I hope to have convinced you of the fact that networks
are an interesting example of where we can apply mathematical thought.
And network theory can be useful in order to interpret real-world phenomena.
So I was talking about centrality, for example, but
also the friendship paradox and you could go way further.
These are just examples.
It turns out that many of these networks,
certainly in the real-world, have unexpected commonality.
They're almost all small worlds and they seems to have high amount of
variability in the degree structure of the networks.
We can use random graph models in order to explain some of these properties, or
certainly study some of these properties.
And that immediately raises the question whether there's universality.
An example that we've seen is distances in two such models,
maybe the most popular models, behave in a very similar way,
qualitatively, logarithmic regimes in the same,
the logarithmic regimes are same and the log-log regimes are the same.
That's the kind of results we would like to prove, in general.
Now, of course I've been talking here at a relatively high level on
research in network science, you could say, but of course it doesn't stop here.
I've only looked at the topology.
But I haven't quite said anything about what
the consequences are of certain topological features.
And that's actually what a lot of research is focusing on.
So not just thinking about networks as they are and
their topological issues, but actually their functionality.
Now, one way of looking at this is that you can say that we can model
the complex networks using a random graph.
And oftentimes we can model their functionality using stochastic
processes on them.
And these stochastic processes can be anything.
So if you're interested in rumor spread, well,
you are going to have some sort of information diffusion process.
Which is typically modeled as stochastic process.
If you're interested in how information diffusion or searching things or
page rank or whatever, it's a very natural link to random walks.
So very often we're interested in the behavior of stochastic processes
on the network and a network is then modelled as a random graph.
Mathematically this is quite interesting because you have a stochastic
process on a random object.
So double randomness, how does this behave?
Examples, the spread of a disease, epidemic,
when is there going to be a pandemic.
Can we predict that from the basis of network topology?
How does information spread?
How do rumors spread through the network?
How is consensus reached within a network?
Is it reached?
That's another question.
How do objects synchronize on a network?
For example, this is an example that is very important, for example, in the brain,
where synchronization does take place.
How robust is our network to random failures,
either of the vertices in your network or of the edges in your network?
Now again you could say that is a stochastic process,
because links fall out by chance.
By chance you could say, but still.
How does one retrieve information, for example,
using page rank algorithms or the like?
So I believe that this is going to be a prominent example of applied math
that is going to keep us busy for the coming decades.
So inspiration, I was all ready showing the book, here's a picture.
The book is available as a PDF on my web page.
It's not the Cambridge University Press book, but
it is the latest version, in fact the students have been going over and going
over it then making the final changes that we made in the editorial process.
So the two texts should be identical, so you can just download it from there.
That's volume 1 if you can read this here.
There's a volume 2 that I'm currently working on,
but that will take some more time to finish.
But it can also be found on the same HTML page.
And I was already showing you the network pages, which is an endeavor to
reach a broader audience that is interested in network science.
If you want to contribute to this endeavor, please let me know.
We're very happy if you would.
If you just wanna have a look, then by all means go ahead.
Thank you very much for your attention. >> [APPLAUSE]
>> I have forgoten to mention in
the introduction that Remco is a very good speaker.
But by now you've figured it out I guess. >> [LAUGH]