Placeholder Image

Subtitles section Play video

  • Welcome back to Mining of Massive Datasets.

  • In today's lecture we're going to look at a new topic, Online Algorithms.

  • And we're going to look at a particular application of Online Algorithms,

  • which is Performance-based Advertising.

  • The kind of advertising that you see on the web or on mobile.

  • Now, in the classic model of algorithms, which we're all familiar with,

  • you get to see the entire input and then you compute some function of it.

  • For example, you may be given a set of data points and

  • you might want to compute clusters on those data points.

  • In this context, we called a classic model of algorithms an offline algorithm.

  • An offline algorithm has access to its enti, entire input dataset.

  • The kind of algorithm that we're going to look at in today lect, today's lecture,

  • called Online Algorithms, don't get to see the entire data set at one one time.

  • They get to see the input one piece at a time.

  • They don't see all the input and yet, as they see a partial input they have to

  • make some kind of irrevocable deficient along the way.

  • They produce part of the output even when they've seen only part of the output.

  • And it's kind of similar to the data stream model but it's not identical.

  • There are certain subtle differences that'll become apparent as we go along.

  • We're going to start our discussion of Online Algorithms by looking at

  • a very simple problem called Bipartite Matching.

  • So, the Bipartite Matching problem, starts with the bipartite graph.

  • So, here we have a bipartite graph with, with two sets of nodes or vertices.

  • We've labelled one set of nodes Boys, and the other set of nodes Girls.

  • We could have equally well labeled them something else.

  • But it just so happens to be convenient to be, to label them Boys and

  • Girls in this context.

  • And in this case, let's say the edges between between the the vertices

  • signify compatibility between Boys and Girls.

  • Right?

  • And the goal of the Bipartite Matching problem is to

  • match as many compatible pairs of Boys and Girls as possible.

  • That is, find as many pairs of nodes as possible, subject to

  • the constraint that those pair, those pairs of nodes are connected by an edge.

  • For example, here we've found three pairs of nodes 1,a, 2,b, and 3,d.

  • And in each case, this is a compatible pair,

  • where it's an edge between between each pair of nodes that we have picked.

  • And notice that once you have picked 1,a, 2,b, and 3,d will be

  • unable to do anything with Boy 4 or Girl c because there is no edge between them.

  • And there are no other Boys or Girls that are available.

  • The only edge that Boy 4 has is to Girl a but Girl a is already paired with Boy 1.

  • So, it, so you can't pair them up.

  • So in this case we produced a matching of cardinality 3 because we

  • found 3 pairs then that it could match up.

  • So in the the cardinality of the matching in this case there's, is 3.

  • Now with the same graph, it turns out that we can actually, do better.

  • We can find a matching of cardinality 4 if we are a little bit more clever.

  • And here is an example of a matching of cardinality 4.

  • We've matched 1,c, 2,b, 3,d, and 4,a.

  • And once we do that, we've matched, every Boy with some Girl, and

  • every Girl with some Boy.

  • And this is what's called a perfect matching because it

  • matches every node to some other node in the in the graph.

  • To clarify our terminology perfect matching is a matching that

  • matches every vertex in the graph, or every node in the graph.

  • And maximum matching is a matching that is as large as possible for the given graph.

  • For example, in this case the matching M is both a perfect matching and

  • a maximum matching.

  • Every perfect matching has to be maximum because you can't,

  • obviously, find a matching that's larger than perfect.

  • however, there could be graphs where there is no perfect matching, but you,

  • but there's always a maximum matching which is the best that you can do for for

  • that given graph.

  • So in this case, the maximum matching has cardinality 4.

  • Whereas the earlier matching that we found had just cardinality 3,

  • and that's not a maximum matching.

  • Now, the goal of the bipartite matching problem is to find the maximum matching or

  • find a maximum matching, since the maximum matching is not necessarily unique.

  • The goal is to find a maximum matching for

  • a given bipartite graph, and if-if-if it's, if the max,

  • maximum matching happens to be a perfect one, we'll end up finding a perfect one.

  • If not, will just be a maximum matching.

  • Now, the maximum, the,

  • the matching problem is a well-studied problem in the field of algorithms.

  • And there's a, there is a very well known

  • an elegant polynomial-time offline algorithm that, you know, remember

  • an offline algorithm is when you have the entire graph available to the algorithm.

  • And it's based on this idea of augmenting path.

  • It's a very classic algorithm from 1973 by Hopcroft and Karp.

  • And if, you know, if you're curious you can look it up on,

  • on Wikipedia following that the link that I've given there.

  • Now however, suppose you don't know the entire graph in advance.

  • You only have a piece of the graph available to you at any point.

  • In the online version of the graph matching problem,

  • we're initially given the sets boys and girls, the two sets of nodes.

  • But you're not given all the edges between the nodes.

  • In fact, the edges are revealed one round at a time.

  • In each round one girl's choices are revealed.

  • That is, the edges between that girl and the boys are revealed.

  • Now, in each round, the algorithm has to make an irrevocable decision.

  • And the and that decision is to either pair the girl with some boy, right?

  • So we know all the edges from the girl, from that girl to, to the boy set.

  • So, we know all the compatible boys for that girl.

  • So we can either pair the girl with a boy.

  • Then we'll have to pick a boy who has not already been paired because that's,

  • that's part of the rules of the game.

  • If there is a boy among the girl's compatible set who has not already been

  • paired, we can choose to pair the girl with that boy or

  • we can decide to not pair the girl with any boy, and move on to the next girl.

  • But once, we've made the choice, we cannot go back.

  • We can't go back and pair the girl if he had not pair her at that point.

  • And we cannot you know, go back and

  • change the pairing of that girl, if you've already paired her.

  • So, the choice is irrevocable and made at that point.

  • For an example application that, you know, that is,

  • is kind of along these lines, and, and

  • motivates looking at this algorithm is is assigning tasks to servers.

  • Let's say, we have a large number of servers of different kinds and

  • tasks come along.

  • And each task can only be assigned to certain servers.

  • Then the goal is to assign those tasks to each,

  • each task to a server that can handle that task.

  • And sometimes we may just choose to reject the task,

  • because we have an overload situation or something like that.

  • So so the the online graph matching problem is in some sense similar to,

  • to the problem of assigning tasks to servers.

  • So here is an example of the of the Online Graph Matching Problem.

  • We start with the with the set of Boys.

  • And here's a here's a first Girl a.

  • And her choices are revealed.

  • So in this case a's choices you know a's compatible with 1 and and 4.

  • And at this point, if you have to make a choice either we decide to pair a with

  • 1 or 4, or we decide not to pair a with anyone.

  • Let's say we decided to pair a with 1 and the output 1,a as our first pair.

  • Now the second Girl node comes in and notice that

  • that b is compatible with nodes 2 and 3.

  • And once again, we have the same choice, and let's say we we choose to pair 2,b.

  • And now let's say c comes in.

  • And we have just the one edge, 1,c.

  • Now, we cannot pair c with c with 1, because 1 is already paired with a.

  • So our hand is forced and we don't output anything in this round.

  • And them d comes in and d there's only one edge, 3,d.

  • And so

  • we can output that that edge because in this case we decided to pair 3 and d.

  • Right so, this is an example of the Online Graph Matching scenario.

  • Now we what you want to do is we want to design an algorithm in this

  • online setting that makes a choice at every step.

  • That finds as large a matching as possible by the time everything is revealed,

  • all the edges are revealed, right?

  • and, how do we go about, doing something like that?

  • Now, it's a very complicated problem because we

  • don't know what's going to come up.

  • So, any choice that we make at the moment might paint us into a corner.

  • For example, that you know,

  • because we output 1,a when c came along, we were unable to match c.

  • Had we instead you know debated and not are our output eh,

  • a,4 then we could, inst, you know, we couldn't, then c came along.

  • We could have output 1,c but we sort of painted ourselves into a corner and

  • denied ourselves that choice by making an irrevocable decision, along the way.

  • So as you can see this is very very tough problem to solve in an optimal way and

  • we have no hope, really, of finding an optimal solution on maximum matching.

  • What we can hope instead, is to come up with some kind of heuristic

  • that gets us you know a fair distance of the way there.

  • A heuristic that gives us a good matching but not necessarily, the best one.

  • And something heuristic, is called a greedy heuristic.

  • And the Greedy Algorithm is a very, very simple algorithm.

  • And the Greedy Algorithm says, well, when you when a girl charges for

  • a video don't ever choose not to pay the girl if possible.

  • If there's any eligible boy,

  • if there's any boy with whom the girl is compatible, where there's an edge.

  • And that boy's not already paired then just pick a boy a bit early,

  • pick such, such a boy a bit early and pair that girl with that boy.

  • And if there is no eligible boy then don't pair a girl,

  • obviously because we can't do anything there.

  • So it's a very, very simple algorithm the greedy algorithm and

  • let's say we go for this algorithm the, the very interesting question is.

  • How good is a greedy algorithm?

  • Or how bad can it be in practice, how good can it be in practice?

  • How can we even analyze how good an algorithm is in this setting, right?

  • To sort of analyze how good an algorithm is in, in the online setting,

  • we've sort of come up with a new notion that's called the competitive ratio.

  • Where we compare the greedy algorithm, or

  • any algorithm in general to an offline algorithm.

  • So the way we analyze online algorithms is by comparing them with

  • the best offline algorithm.

  • So to define this formally suppose we have an input I, and remember

  • the offline algorithm feeds the whole input I and it can make an optimal choice.

  • In this case it'd be an optimal matching, and

  • let's say that optimal matching is M optimum.

  • Right?

  • whereas, the greedy algorithm sees only one input at a time, and so

  • it ends up picking a matching that's M greedy.

  • Now we're going to sort of in, in the most likely scenario we

  • know that the cardinality of M greedy is going to be less than our, less than or

  • equal to M optimal.

  • It can only be greater than M optimal,

  • because M optimal is the optimal matching and we can't do better than that.

  • The greedy algorithm is not always going to match the optimal matching, but

  • it's you know, going to be you know, less than the optimal matching or

  • in the best case scenario, equal to it.

  • Now, the competitive ratio of the greedy algorithm is defined to be

  • the ratio of the cardinality of M greedy to the cardinality of M optimal

  • over all possible inputs I, we take the minimum.

  • Right? So, it's kind of the worst case

  • performance of the greedy algorithm.

  • Right? We take the worst possible input I for

  • the greedy algorithm and we compute the, the ratio of the cardinality

  • of the greedy output to the optimal output in that in that scenario.

  • So this is what we mean by the, the competitive ratio of the greedy algorithm,

  • or in general, the competitive ratio of any online algorithm,

  • who's defined in a similar manner.

  • So let's see whether we can figure out what the competitive ratio of

  • the greedy algorithm is.

  • Right?

  • Now let's suppose the greedy algorithm doesn't actually produce

  • the optimum matching.

  • Now if the greedy algorithm actually produces the optimum matching,

  • then we know the complexity ratio is 1.

  • But suppose the greedy algorithm doesn't produce the optimum matching, so

  • it's actually less than the optimal matching.

  • So here's a, here's a scenario where the the opt, is the,

  • you can see that the optimal algorithm in this case that's denoted in

  • black by the black lines here produces a matching of cardinality 4.

  • The greedy algorithm, which is denoted by these red lines here produces a matching

  • of cardinality 3 and so M greedy is not equal to M opt in this case.

  • The set of boys is on the left, and the set of girls is on the, on the right.

  • [SOUND] Now what we're going to do is we're going to consider the set of girls G

  • that is matched in the optimal matching, but is not matched in the greedy matching.

  • Okay?

  • So here, for example, in this case the girl d

  • is actually matched in the optimal matching for d match to boy 4 but

  • the greedy algorithm actually doesn't match d.

  • It matches a, b, and c but it doesn't, match G match d and so

  • the set G consists of the single node d in this case.

  • But we define the set G to be the set of girls that are matched in

  • the optimal algorithm but not in the greedy algorithm.

  • Now, we're going to define the set B

  • to be the set of boys that are adjacent to the girls in the set G.

  • So in this case, the set G just has a single node.

  • The set of boys that are adjacent.

  • Are the are, are, are 3 and, and 4.

  • And so there are 2 boys in the,

  • in the set B that are adjacent to the girls in the set G.

  • Now the first observation that we make, is that the opt,

  • the the cardinality of the optimal matching is less than or

  • equal to the cardinality of the greedy matching plus the, the size of the set G.

  • Now this is by definition.

  • Because G is just a set of girls that are matched in M optimal, but not in M greedy.

  • So if you just add in the the,

  • the unmatched girls from M optimal to M greedy, we are going to get a match,

  • we are going to get the size of the optimal matching.

  • Right? So the,

  • the difference between the optimal matching and

  • the greedy matching is the set of girls, G.

  • And therefore, the size of the optimal matching can be no more than the size of

  • the greedy matching plus the size of the set G.

  • And you can see in this case, the size of the set G is 1.

  • The size of the optimal matching is 4.

  • The size of the greedy matching is 3.

  • And what this is saying is that 4 is less than or

  • equal to 3 plus 1 in this case, which is obviously true.

  • So, let's call this equation equation 1.

  • Now, the second observation that we make is that every boy,

  • B in the set B that are adjacent to the girls in G is already matched in M greedy.

  • Now, why is, why, why must this be the case?

  • Suppose there is a boy B adjacent to the girls in

  • G that is not matched in the in the, in the greedy matching.

  • now, if it's not, if, if, if, if there is there is a boy B in the, in the set

  • in the set a boys that is adjacent to the girls in G that's not matched.

  • For example let's say boy 3 were actually,

  • not were actually not matched in the in the greedy algorithm.

  • Then we could just output,

  • the greedy algorithm would have just output the, the 3,d in this case.

  • We would would have paired that boy with with that girl in G.

  • Because the rule of the greedy algorithm is,

  • if it is possible to make a match output it.

  • And be, because the greedy algorithm did not pair the girls in G with any boy at

  • all the only reason why that must be the case is that every boy B, that, you know,

  • was eligible to be matched with a girl in G was in fact already matched.

  • And therefore, you couldn't output that that boy.

  • And therefore it follows that every boy B that's adjacent to

  • girls G it's only matched M greedy.

  • So what they sa, what, what this really means is that the cardinality of

  • the greedy algorithm must be at least equal to B.

  • Because, since each of those boys is matched there's at least B pairs.

  • Cardinality B pairs in the greedy algorithm.

  • So, the cardinality of the greedy algorithm is greater than or

  • equal to the set B.

  • Let's call this equation equation two.

  • Moving on. So what do we have so far?

  • So far we, we, we have two two equations that we derived.

  • The first is that the M optimal is less than or equal to M greedy plus G and

  • the second is that M greedy is greater than or equal to B.

  • Now let's make another observation.

  • The optimal algorithm matches all the girls in G to boys in B, right?

  • This is by the definition of G.

  • We said G is a set of girls that were not matched in the greedy algorithm but

  • were matched in the optimal algorithm.

  • And therefore, the optimum algor, algorithm ma, matches all these girls.

  • and, clearly they can only be matched to boys in B,

  • because these, that's the only set of boys that are adjacent to the girls G.

  • Since the optimal algorithm matches all the girls in G to boys in B,

  • it follows that the cardinality of the set G should be less than or

  • equal to the cardinality of the set B,

  • because otherwise, we wouldn't match all the girls in G to boys in B.

  • Therefore they, they could be they should be a boys in B after our girls in G

  • otherwise we couldn't find the matching.

  • So for example in this case, notice that this set G has 1 girl but

  • the set B has 2 boys so this equation is saying that 1 is less than or

  • equal to 2, which is obviously true.

  • So, let's call that equation 3.

  • Now, when you combine equations 2 and 3, equation 2 says,

  • M greedy is greater than or equal to B well equation 3 says,

  • G is less than or equal to B, or in other words, B is greater than or equal to G.

  • So when you combine any equation 2 and 3 we get equation 4,

  • which says that where G is less than or equal to B.

  • But you know, that B is less or not equal to M greedy.

  • And so, we have G less than or equal to B, which is less than or equal to M greedy.

  • Right?

  • Let's call that equation 4.

  • So, so far, we have equation 1,

  • which says that M optimal is less than or equal to M greedy plus G.

  • And we just derived equation 4, which says that G is less than or

  • equal to B, which is less than or equal to M greedy.

  • Okay?

  • So.

  • When we combine 1 and 4,

  • we're just going to we're going to take 4 and substitute it in 1.

  • We have M optimal less than our M greedy plus G, but we know that G is less than

  • equal than M greedy, so M optimal is less than or equal to M greedy plus M greedy.

  • Which is just 2 times M greedy.

  • So, if you just take the ratio, we end up with the with this final equation here,

  • which says that M greedy divided by M optimal is greater than ha,

  • greater than or equal to a half.

  • And remember the competitive ratio is just a ratio,

  • of M greedy to M optimal in the worst case scenario.

  • And we've made no assumptions at all about the scenario, so this might, this is

  • any scenario and then in particular it could be the worst case scenario as well.

  • So in the worst case scenario the cardinality of the greedy algorithm

  • divided by the cardinality of the optimal algorithm cannot be worse than a half.

  • So, we proved that the competitive ratio of the greedy algorithm is at

  • least a half.

  • Right? It could be greater than a half, but

  • we've proved that the greedy algorithm does at least half,

  • as well as the optimal algorithm.

  • Now that's pretty good, because the greedy algorithm is a very,

  • very simple algorithm.

  • It doesn't even, have all the information available to it.

  • It's looking at one input at a time, and it's doing something really,

  • really simple, if you're able to show that it does at

  • least half as well as the optimal algorithm.

  • Now, we established a lower bound on the competitive ratio of the greedy algorithm.

  • Its competitive ratio is at least a half.

  • Now it turns out, that we can actually create worse-case scenario with you know,

  • where it performs half, [NOISE] as well as optimal.

  • So here's here's an example of a simple worse-case scenario.

  • Let's say you know girl a comes in and

  • the compatible boys are 1 and 4 and the output 1,a.

  • And then an old b comes in and the output 2,b.

  • And then c comes in and

  • we are unable to output anything because 1 is already paired.

  • D comes in, and they're unable to output anything because, 2 is already paired.

  • And so, in this case the greedy algorithm produces a matching of

  • finality 2 which is 1,a, 2,b.

  • But if you observe carefully, there is in fact a matching

  • a maximum matching of cardinality 4 in this in this set.

  • So for example, the, the optimal matching

  • in the set would be would be

  • 1,c 2,d 3,b, and 4,a.

  • Now that is a matching of cardinality 4,

  • where the greedy just produced a matching of cardinality 2.

  • so, in this case the greedy has done half, as well as optimal.

  • So what we've shown is that the that the upper bound

  • on the competitive ratio of the greedy algorithm is a half.

  • Now, we just showed in the, the prior analysis that, that the greedy,

  • that the lower bound on the greedy algorithm competitive ratio is a half.

  • Since, he's shown that both the upper bond and

  • the low, lower bond are half, it follows,

  • in fact, that the Greedy Algorithm has a competitive ratio of exactly a half.

Welcome back to Mining of Massive Datasets.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it