 ## Subtitles section Play video

• The following content is provided under a Creative

• Your support will help MIT OpenCourseWare

• To make a donation or view additional materials

• from hundreds of MIT courses, visit MIT OpenCourseWare

• at ocw.mit.edu.

• PROFESSOR: So today, we're going to talk about the probability

• that a random variable deviates by a certain amount

• from its expectation.

• Now, we've seen examples where a random variable

• is very unlikely to deviate much from its expectation.

• For example, if you flip 100 mutually

• independent fair coins, you're very

• likely to wind up with close to 50

• heads, very unlikely to wind up with 25 or fewer

• We've also seen examples of distributions

• where you are very likely to be far from your expectation,

• for example, that problem when we had the communications

• channel, and we were measuring the latency of a packet

• crossing the channel.

• There, most of the time, your latency

• would be 10 milliseconds.

• But the expected latency was infinite.

• So you're very likely to deviate a lot from your expectation

• in that case.

• Last time, we looked at the variance.

• And we saw how that gave us some feel for the likelihood

• of being far from the expectation--

• high variance meaning you're more likely to deviate

• from the expectation.

• Today, we're going to develop specific tools

• for bounding or limiting the probability you

• deviate by a specified amount from the expectation.

• And the first tool is known as Markov's theorem.

• Markov's theorem says that if the random variable is always

• non-negative, then it is unlikely to greatly exceed

• its expectation.

• In particular, if R is a non-negative random variable,

• then for all x bigger than 0, the probability

• that R is at least x is at most the expected value of R,

• the mean, divided by x.

• So in other words, if R is never negative-- for example,

• say the expected value is smaller.

• Then the probability R is large will be a small number.

• Because I'll have a small number over a big number.

• So it says that you are unlikely to greatly exceed

• the expected value.

• So let's prove that.

• Now, from the theorem of total expectation

• that you did in recitation last week,

• we can compute the expected value of R

• by looking at two cases-- the case when R is at least x,

• and the case when R is less than x.

• That's from the theorem of total expectation.

• I look at two cases.

• R is bigger than x.

• Take the expected value there times the probability

• of this case happening plus the case when R is less than x.

• OK, now since R is non-negative, this is at least 0.

• R can't ever be negative.

• So the expectation can't be negative.

• A probability can't be negative.

• So this is at least 0.

• And this is trivially at least x.

• Because I'm taking the expected value of R

• in the case when R is at least x.

• So R is always at least x in this case.

• So its expected value is at least x.

• So that means that the expected value of R

• is at least x times the probability

• R is greater than x, R is greater or equal to x.

• And now I can get the theorem by just dividing by x.

• I'm less than or equal to the expected value

• of R divided by x.

• So it's a very easy theorem to prove.

• But it's going to have amazing consequences

• that we're going to build up through a series of results

• today.

• Any questions about Markov's theorem and the proof?

• All right, there's a simple corollary, which is useful.

• Again, if R is a non-negative random variable,

• then for all c bigger than 0, the probability

• that R is at least c times its expected value is at most 1

• and c.

• So the probability you're twice your expected value

• is at most 1/2.

• And the proof is very easy.

• We just set x to be equal to c times the expected

• value of R in the theorem.

• So I just plug in x is c times the expected value of R.

• And I get expected value of R over c times the expected value

• of R, which is 1/c.

• So you just plug in that value in Markov's theorem,

• and it comes out.

• All right, let's do some examples.

• Let's let R be the weight of a random person

• uniformly selected.

• And I don't know what the distribution of weights

• is in the country.

• But suppose that the expected value of R,

• which is the average weight, is 100 pounds.

• So if I average over all people, their weight is 100 pounds.

• And suppose I want to know the probability

• that the random person weighs at least 200 pounds.

• What can I say about that probability?

• Do I know it exactly?

• I don't think so.

• Because I don't know what the distribution of weights is.

• But I can still get an upper bound on this probability.

• What bound can I get on the probability

• that a random person has a weight

• of 200 given the facts here?

• Yeah.

• AUDIENCE: [INAUDIBLE]

• PROFESSOR: Yes, well, it's 100 over 200, right.

• It's at most the expected value, which is 100,

• over the x, which is 200.

• And that's equal to 1/2.

• So the probability that a random person weighs 200 pounds

• or more is at most 1/2.

• Or I could plug it in here.

• The expected value is 100.

• 200 is twice that.

• So c would be 2 here.

• So the probability of being twice the expectation

• is at most 1/2.

• Now of course, I'm using the fact

• that weight is never negative.

• That's obviously true.

• But it is implicitly being used here.

• So what fraction of the population

• now can weigh at least 200 pounds?

• Slightly different question.

• Before I asked you, if I take a random person,

• what's the probability they weigh at least 200 pounds?

• Now I'm asking, what fraction of the population

• can weigh at least 200 pounds if the average is 100?

• What is it?

• Yeah?

• AUDIENCE: At most 1/2.

• PROFESSOR: At most 1/2.

• In fact, it's the same answer.

• And why?

• Why can't everybody weigh 200 pounds,

• so it would be all the population

• weighs 200 pounds at least?

• AUDIENCE: [INAUDIBLE]

• PROFESSOR: Probability would be 1, and that can't happen.

• And in fact, intuitively, if everybody

• weighs at least 200 pounds, the average

• is going to be at least 200 pounds.

• And we said the average was 100.

• And this is illustrating this interesting thing

• that probability implies things about averages and fractions.

• Because it's really the same thing in disguise.

• The connection is, if I've got a bunch of people, say,

• in the country, I can convert a fraction

• that have some property into a probability

• by just selecting a random person.

• Yeah.

• AUDIENCE: [INAUDIBLE]

• PROFESSOR: No, the variance could be very big.

• Because I might have a person that weighs a million pounds,

• say.

• So you have to get into that.

• But it gets a little bit more complicated.

• Yeah.

• AUDIENCE: [INAUDIBLE]

• PROFESSOR: No, there's nothing being

• assumed about the distribution, nothing at all, OK?

• So that's the beauty of Markov's theorem.

• Well, I've assumed one thing.

• I assume that there is no negative values.

• That's it.

• AUDIENCE: [INAUDIBLE]

• PROFESSOR: That's correct.

• They can distribute it any way with positive values.

• But we have a fact here we've used, that the average was 100.

• So that does limit your distribution.

• In other words, you couldn't have a distribution where

• everybody weighs 200 pounds.

• Because then the average would be 200, not 100.

• But anything else where they're all positive

• and they average 100, you know that at most half can be 200.

• Because if you pick a random one,

• the probability of getting one that's 200 is at most 1/2,

• which follows from Markov's theorem.

• And that's partly why it's so powerful.

• You didn't know anything about the distribution, really,

• except its expectation and that it was non-negative.

• I'll give you some more examples.

• All right, here's another example.

• Is it possible on the final exam for everybody in the class

• to do better than the mean score?

• No, of course not.

• Because if they did, the mean would be higher.

• Because the mean is the average.

• OK, let's do another example.

• Remember the Chinese appetizer problem?

• You're at the restaurant, big circular table.

• There's n people at the table.

• Everybody has one appetizer in front of them.

• And then the joker spins the thing

• in the middle of the table.

• So it goes around and around.

• And it stops in a random uniform position.

• And we wanted to know, what's the expected number of people

• to get the right appetizer back?

• Does anybody remember?

• One.

• So you expect one person to get the right appetizer back.

• Well, say I want to know the probability that all n people

• got the right appetizer back.

• What does Markov tell you about the probability

• that all n people get the right appetizer back?

• 1/n.

• The expected value is 1.

• And now you're asking the probability

• that you get R is at least n.

• So x is n.

• So it's 1 in n.

• And what was the probability, or what is the actual probability?

• In this case, you know the distribution,

• that everybody gets the right appetizer back, all n.

• 1 in n.

• So in the case of the Chinese appetizer problem,

• Markov's bound is actually the right answer, right on target,

• which gives you an example where you can't improve it.

• By itself, if you just know the expected value,

• there's no stronger theorem that way.

• Because Chinese appetizer is an example where the bound

• you get, 1/n, of n people getting the right appetizer

• is in fact the true probability.

• OK, what about the hat check problem?

• Remember that?

• So there's n men put the hats in the coat closet.

• They get uniformly randomly scrambled.

• So it's a random permutation applied to the hats.

• Now each man gets a hat back.

• What's the expected number of men to get the right hat back?

• One, same as the other one.

• Because you've got n men each with a 1 in n chance,

• so it's 1.

• Markov says the probability that n men get the right hat back is

• at most 1 in n, same as before.

• What's the actual probability that all n

• men get the right hat back?

• AUDIENCE: [INAUDIBLE]

• PROFESSOR: 1 in n factorial.

• So in this case, Markov is way off the mark.

• It says 1 in n.

• But in fact the real bound is much smaller.

• So Markov is not always tight.

• It's always an upper bound.

• But it sometimes is not the right answer.

• And to get the right answer, often you

• need to know more about the distribution.

• OK, what if R can be negative?

• Is it possible that Markov's theorem holds there?

• Because I use the assumption in the theorem.

• Can anybody give me an example where it doesn't