Placeholder Image

Subtitles section Play video

  • The following content is provided under a Creative

  • Commons license.

  • Your support will help MIT OpenCourseWare

  • continue to offer high quality educational resources for free.

  • To make a donation or view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • PROFESSOR: So today, we're going to talk about the probability

  • that a random variable deviates by a certain amount

  • from its expectation.

  • Now, we've seen examples where a random variable

  • is very unlikely to deviate much from its expectation.

  • For example, if you flip 100 mutually

  • independent fair coins, you're very

  • likely to wind up with close to 50

  • heads, very unlikely to wind up with 25 or fewer

  • heads, for example.

  • We've also seen examples of distributions

  • where you are very likely to be far from your expectation,

  • for example, that problem when we had the communications

  • channel, and we were measuring the latency of a packet

  • crossing the channel.

  • There, most of the time, your latency

  • would be 10 milliseconds.

  • But the expected latency was infinite.

  • So you're very likely to deviate a lot from your expectation

  • in that case.

  • Last time, we looked at the variance.

  • And we saw how that gave us some feel for the likelihood

  • of being far from the expectation--

  • high variance meaning you're more likely to deviate

  • from the expectation.

  • Today, we're going to develop specific tools

  • for bounding or limiting the probability you

  • deviate by a specified amount from the expectation.

  • And the first tool is known as Markov's theorem.

  • Markov's theorem says that if the random variable is always

  • non-negative, then it is unlikely to greatly exceed

  • its expectation.

  • In particular, if R is a non-negative random variable,

  • then for all x bigger than 0, the probability

  • that R is at least x is at most the expected value of R,

  • the mean, divided by x.

  • So in other words, if R is never negative-- for example,

  • say the expected value is smaller.

  • Then the probability R is large will be a small number.

  • Because I'll have a small number over a big number.

  • So it says that you are unlikely to greatly exceed

  • the expected value.

  • So let's prove that.

  • Now, from the theorem of total expectation

  • that you did in recitation last week,

  • we can compute the expected value of R

  • by looking at two cases-- the case when R is at least x,

  • and the case when R is less than x.

  • That's from the theorem of total expectation.

  • I look at two cases.

  • R is bigger than x.

  • Take the expected value there times the probability

  • of this case happening plus the case when R is less than x.

  • OK, now since R is non-negative, this is at least 0.

  • R can't ever be negative.

  • So the expectation can't be negative.

  • A probability can't be negative.

  • So this is at least 0.

  • And this is trivially at least x.

  • Because I'm taking the expected value of R

  • in the case when R is at least x.

  • So R is always at least x in this case.

  • So its expected value is at least x.

  • So that means that the expected value of R

  • is at least x times the probability

  • R is greater than x, R is greater or equal to x.

  • And now I can get the theorem by just dividing by x.

  • I'm less than or equal to the expected value

  • of R divided by x.

  • So it's a very easy theorem to prove.

  • But it's going to have amazing consequences

  • that we're going to build up through a series of results

  • today.

  • Any questions about Markov's theorem and the proof?

  • All right, there's a simple corollary, which is useful.

  • Again, if R is a non-negative random variable,

  • then for all c bigger than 0, the probability

  • that R is at least c times its expected value is at most 1

  • and c.

  • So the probability you're twice your expected value

  • is at most 1/2.

  • And the proof is very easy.

  • We just set x to be equal to c times the expected

  • value of R in the theorem.

  • So I just plug in x is c times the expected value of R.

  • And I get expected value of R over c times the expected value

  • of R, which is 1/c.

  • So you just plug in that value in Markov's theorem,

  • and it comes out.

  • All right, let's do some examples.

  • Let's let R be the weight of a random person

  • uniformly selected.

  • And I don't know what the distribution of weights

  • is in the country.

  • But suppose that the expected value of R,

  • which is the average weight, is 100 pounds.

  • So if I average over all people, their weight is 100 pounds.

  • And suppose I want to know the probability

  • that the random person weighs at least 200 pounds.

  • What can I say about that probability?

  • Do I know it exactly?

  • I don't think so.

  • Because I don't know what the distribution of weights is.

  • But I can still get an upper bound on this probability.

  • What bound can I get on the probability

  • that a random person has a weight

  • of 200 given the facts here?

  • Yeah.

  • AUDIENCE: [INAUDIBLE]

  • PROFESSOR: Yes, well, it's 100 over 200, right.

  • It's at most the expected value, which is 100,

  • over the x, which is 200.

  • And that's equal to 1/2.

  • So the probability that a random person weighs 200 pounds

  • or more is at most 1/2.

  • Or I could plug it in here.

  • The expected value is 100.

  • 200 is twice that.

  • So c would be 2 here.

  • So the probability of being twice the expectation

  • is at most 1/2.

  • Now of course, I'm using the fact

  • that weight is never negative.

  • That's obviously true.

  • But it is implicitly being used here.

  • So what fraction of the population

  • now can weigh at least 200 pounds?

  • Slightly different question.

  • Before I asked you, if I take a random person,

  • what's the probability they weigh at least 200 pounds?

  • Now I'm asking, what fraction of the population

  • can weigh at least 200 pounds if the average is 100?

  • What is it?

  • Yeah?

  • AUDIENCE: At most 1/2.

  • PROFESSOR: At most 1/2.

  • In fact, it's the same answer.

  • And why?

  • Why can't everybody weigh 200 pounds,

  • so it would be all the population

  • weighs 200 pounds at least?

  • AUDIENCE: [INAUDIBLE]

  • PROFESSOR: Probability would be 1, and that can't happen.

  • And in fact, intuitively, if everybody

  • weighs at least 200 pounds, the average

  • is going to be at least 200 pounds.

  • And we said the average was 100.

  • And this is illustrating this interesting thing

  • that probability implies things about averages and fractions.

  • Because it's really the same thing in disguise.

  • The connection is, if I've got a bunch of people, say,

  • in the country, I can convert a fraction

  • that have some property into a probability

  • by just selecting a random person.

  • Yeah.

  • AUDIENCE: [INAUDIBLE]

  • PROFESSOR: No, the variance could be very big.

  • Because I might have a person that weighs a million pounds,

  • say.

  • So you have to get into that.

  • But it gets a little bit more complicated.

  • Yeah.

  • AUDIENCE: [INAUDIBLE]

  • PROFESSOR: No, there's nothing being

  • assumed about the distribution, nothing at all, OK?

  • So that's the beauty of Markov's theorem.

  • Well, I've assumed one thing.

  • I assume that there is no negative values.

  • That's it.

  • AUDIENCE: [INAUDIBLE]

  • PROFESSOR: That's correct.

  • They can distribute it any way with positive values.

  • But we have a fact here we've used, that the average was 100.

  • So that does limit your distribution.

  • In other words, you couldn't have a distribution where

  • everybody weighs 200 pounds.

  • Because then the average would be 200, not 100.

  • But anything else where they're all positive

  • and they average 100, you know that at most half can be 200.

  • Because if you pick a random one,

  • the probability of getting one that's 200 is at most 1/2,

  • which follows from Markov's theorem.

  • And that's partly why it's so powerful.

  • You didn't know anything about the distribution, really,

  • except its expectation and that it was non-negative.

  • Any other questions about this?

  • I'll give you some more examples.

  • All right, here's another example.

  • Is it possible on the final exam for everybody in the class

  • to do better than the mean score?

  • No, of course not.

  • Because if they did, the mean would be higher.

  • Because the mean is the average.

  • OK, let's do another example.

  • Remember the Chinese appetizer problem?

  • You're at the restaurant, big circular table.

  • There's n people at the table.

  • Everybody has one appetizer in front of them.

  • And then the joker spins the thing

  • in the middle of the table.

  • So it goes around and around.

  • And it stops in a random uniform position.

  • And we wanted to know, what's the expected number of people

  • to get the right appetizer back?

  • What was the answer?

  • Does anybody remember?

  • One.

  • So you expect one person to get the right appetizer back.

  • Well, say I want to know the probability that all n people

  • got the right appetizer back.

  • What does Markov tell you about the probability

  • that all n people get the right appetizer back?

  • 1/n.

  • The expected value is 1.

  • And now you're asking the probability

  • that you get R is at least n.

  • So x is n.

  • So it's 1 in n.

  • And what was the probability, or what is the actual probability?

  • In this case, you know the distribution,

  • that everybody gets the right appetizer back, all n.

  • 1 in n.

  • So in the case of the Chinese appetizer problem,

  • Markov's bound is actually the right answer, right on target,

  • which gives you an example where you can't improve it.

  • By itself, if you just know the expected value,

  • there's no stronger theorem that way.

  • Because Chinese appetizer is an example where the bound

  • you get, 1/n, of n people getting the right appetizer

  • is in fact the true probability.

  • OK, what about the hat check problem?

  • Remember that?

  • So there's n men put the hats in the coat closet.

  • They get uniformly randomly scrambled.

  • So it's a random permutation applied to the hats.

  • Now each man gets a hat back.

  • What's the expected number of men to get the right hat back?

  • One, same as the other one.

  • Because you've got n men each with a 1 in n chance,

  • so it's 1.

  • Markov says the probability that n men get the right hat back is

  • at most 1 in n, same as before.

  • What's the actual probability that all n

  • men get the right hat back?

  • AUDIENCE: [INAUDIBLE]

  • PROFESSOR: 1 in n factorial.

  • So in this case, Markov is way off the mark.

  • It says 1 in n.

  • But in fact the real bound is much smaller.

  • So Markov is not always tight.

  • It's always an upper bound.

  • But it sometimes is not the right answer.

  • And to get the right answer, often you

  • need to know more about the distribution.

  • OK, what if R can be negative?

  • Is it possible that Markov's theorem holds there?

  • Because I use the assumption in the theorem.

  • Can anybody give me an example where it doesn't