Subtitles section Play video Print subtitles Welcome back to Mining of Massive Datasets. In today's lecture we're going to look at a new topic, Online Algorithms. And we're going to look at a particular application of Online Algorithms, which is Performance-based Advertising. The kind of advertising that you see on the web or on mobile. Now, in the classic model of algorithms, which we're all familiar with, you get to see the entire input and then you compute some function of it. For example, you may be given a set of data points and you might want to compute clusters on those data points. In this context, we called a classic model of algorithms an offline algorithm. An offline algorithm has access to its enti, entire input dataset. The kind of algorithm that we're going to look at in today lect, today's lecture, called Online Algorithms, don't get to see the entire data set at one one time. They get to see the input one piece at a time. They don't see all the input and yet, as they see a partial input they have to make some kind of irrevocable deficient along the way. They produce part of the output even when they've seen only part of the output. And it's kind of similar to the data stream model but it's not identical. There are certain subtle differences that'll become apparent as we go along. We're going to start our discussion of Online Algorithms by looking at a very simple problem called Bipartite Matching. So, the Bipartite Matching problem, starts with the bipartite graph. So, here we have a bipartite graph with, with two sets of nodes or vertices. We've labelled one set of nodes Boys, and the other set of nodes Girls. We could have equally well labeled them something else. But it just so happens to be convenient to be, to label them Boys and Girls in this context. And in this case, let's say the edges between between the the vertices signify compatibility between Boys and Girls. Right? And the goal of the Bipartite Matching problem is to match as many compatible pairs of Boys and Girls as possible. That is, find as many pairs of nodes as possible, subject to the constraint that those pair, those pairs of nodes are connected by an edge. For example, here we've found three pairs of nodes 1,a, 2,b, and 3,d. And in each case, this is a compatible pair, where it's an edge between between each pair of nodes that we have picked. And notice that once you have picked 1,a, 2,b, and 3,d will be unable to do anything with Boy 4 or Girl c because there is no edge between them. And there are no other Boys or Girls that are available. The only edge that Boy 4 has is to Girl a but Girl a is already paired with Boy 1. So, it, so you can't pair them up. So in this case we produced a matching of cardinality 3 because we found 3 pairs then that it could match up. So in the the cardinality of the matching in this case there's, is 3. Now with the same graph, it turns out that we can actually, do better. We can find a matching of cardinality 4 if we are a little bit more clever. And here is an example of a matching of cardinality 4. We've matched 1,c, 2,b, 3,d, and 4,a. And once we do that, we've matched, every Boy with some Girl, and every Girl with some Boy. And this is what's called a perfect matching because it matches every node to some other node in the in the graph. To clarify our terminology perfect matching is a matching that matches every vertex in the graph, or every node in the graph. And maximum matching is a matching that is as large as possible for the given graph. For example, in this case the matching M is both a perfect matching and a maximum matching. Every perfect matching has to be maximum because you can't, obviously, find a matching that's larger than perfect. however, there could be graphs where there is no perfect matching, but you, but there's always a maximum matching which is the best that you can do for for that given graph. So in this case, the maximum matching has cardinality 4. Whereas the earlier matching that we found had just cardinality 3, and that's not a maximum matching. Now, the goal of the bipartite matching problem is to find the maximum matching or find a maximum matching, since the maximum matching is not necessarily unique. The goal is to find a maximum matching for a given bipartite graph, and if-if-if it's, if the max, maximum matching happens to be a perfect one, we'll end up finding a perfect one. If not, will just be a maximum matching. Now, the maximum, the, the matching problem is a well-studied problem in the field of algorithms. And there's a, there is a very well known an elegant polynomial-time offline algorithm that, you know, remember an offline algorithm is when you have the entire graph available to the algorithm. And it's based on this idea of augmenting path. It's a very classic algorithm from 1973 by Hopcroft and Karp. And if, you know, if you're curious you can look it up on, on Wikipedia following that the link that I've given there. Now however, suppose you don't know the entire graph in advance. You only have a piece of the graph available to you at any point. In the online version of the graph matching problem, we're initially given the sets boys and girls, the two sets of nodes. But you're not given all the edges between the nodes. In fact, the edges are revealed one round at a time. In each round one girl's choices are revealed. That is, the edges between that girl and the boys are revealed. Now, in each round, the algorithm has to make an irrevocable decision. And the and that decision is to either pair the girl with some boy, right? So we know all the edges from the girl, from that girl to, to the boy set. So, we know all the compatible boys for that girl. So we can either pair the girl with a boy. Then we'll have to pick a boy who has not already been paired because that's, that's part of the rules of the game. If there is a boy among the girl's compatible set who has not already been paired, we can choose to pair the girl with that boy or we can decide to not pair the girl with any boy, and move on to the next girl. But once, we've made the choice, we cannot go back. We can't go back and pair the girl if he had not pair her at that point. And we cannot you know, go back and change the pairing of that girl, if you've already paired her. So, the choice is irrevocable and made at that point. For an example application that, you know, that is, is kind of along these lines, and, and motivates looking at this algorithm is is assigning tasks to servers. Let's say, we have a large number of servers of different kinds and tasks come along. And each task can only be assigned to certain servers. Then the goal is to assign those tasks to each, each task to a server that can handle that task. And sometimes we may just choose to reject the task, because we have an overload situation or something like that. So so the the online graph matching problem is in some sense similar to, to the problem of assigning tasks to servers. So here is an example of the of the Online Graph Matching Problem. We start with the with the set of Boys. And here's a here's a first Girl a. And her choices are revealed. So in this case a's choices you know a's compatible with 1 and and 4. And at this point, if you have to make a choice either we decide to pair a with 1 or 4, or we decide not to pair a with anyone. Let's say we decided to pair a with 1 and the output 1,a as our first pair. Now the second Girl node comes in and notice that that b is compatible with nodes 2 and 3. And once again, we have the same choice, and let's say we we choose to pair 2,b. And now let's say c comes in. And we have just the one edge, 1,c. Now, we cannot pair c with c with 1, because 1 is already paired with a. So our hand is forced and we don't output anything in this round. And them d comes in and d there's only one edge, 3,d. And so we can output that that edge because in this case we decided to pair 3 and d. Right so, this is an example of the Online Graph Matching scenario. Now we what you want to do is we want to design an algorithm in this online setting that makes a choice at every step. That finds as large a matching as possible by the time everything is revealed, all the edges are revealed, right? and, how do we go about, doing something like that? Now, it's a very complicated problem because we don't know what's going to come up. So, any choice that we make at the moment might paint us into a corner. For example, that you know, because we output 1,a when c came along, we were unable to match c. Had we instead you know debated and not are our output eh, a,4 then we could, inst, you know, we couldn't, then c came along. We could have output 1,c but we sort of painted ourselves into a corner and denied ourselves that choice by making an irrevocable decision, along the way. So as you can see this is very very tough problem to solve in an optimal way and we have no hope, really, of finding an optimal solution on maximum matching. What we can hope instead, is to come up with some kind of heuristic that gets us you know a fair distance of the way there. A heuristic that gives us a good matching but not necessarily, the best one. And something heuristic, is called a greedy heuristic. And the Greedy Algorithm is a very, very simple algorithm. And the Greedy Algorithm says, well, when you when a girl charges for a video don't ever choose not to pay the girl if possible. If there's any eligible boy, if there's any boy with whom the girl is compatible, where there's an edge. And that boy's not already paired then just pick a boy a bit early, pick such, such a boy a bit early and pair that girl with that boy. And if there is no eligible boy then don't pair a girl, obviously because we can't do anything there. So it's a very, very simple algorithm the greedy algorithm and let's say we go for this algorithm the, the very interesting question is. How good is a greedy algorithm? Or how bad can it be in practice, how good can it be in practice? How can we even analyze how good an algorithm is in this setting, right? To sort of analyze how good an algorithm is in, in the online setting, we've sort of come up with a new notion that's called the competitive ratio. Where we compare the greedy algorithm, or any algorithm in general to an offline algorithm. So the way we analyze online algorithms is by comparing them with the best offline algorithm. So to define this formally suppose we have an input I, and remember the offline algorithm feeds the whole input I and it can make an optimal choice. In this case it'd be an optimal matching, and let's say that optimal matching is M optimum. Right? whereas, the greedy algorithm sees only one input at a time, and so it ends up picking a matching that's M greedy. Now we're going to sort of in, in the most likely scenario we know that the cardinality of M greedy is going to be less than our, less than or equal to M optimal. It can only be greater than M optimal, because M optimal is the optimal matching and we can't do better than that. The greedy algorithm is not always going to match the optimal matching, but it's you know, going to be you know, less than the optimal matching or in the best case scenario, equal to it. Now, the competitive ratio of the greedy algorithm is defined to be the ratio of the cardinality of M greedy to the cardinality of M optimal over all possible inputs I, we take the minimum. Right? So, it's kind of the worst case performance of the greedy algorithm. Right? We take the worst possible input I for the greedy algorithm and we compute the, the ratio of the cardinality of the greedy output to the optimal output in that in that scenario. So this is what we mean by the, the competitive ratio of the greedy algorithm, or in general, the competitive ratio of any online algorithm, who's defined in a similar manner. So let's see whether we can figure out what the competitive ratio of the greedy algorithm is. Right? Now let's suppose the greedy algorithm doesn't actually produce the optimum matching. Now if the greedy algorithm actually produces the optimum matching, then we know the complexity ratio is 1. But suppose the greedy algorithm doesn't produce the optimum matching, so it's actually less than the optimal matching. So here's a, here's a scenario where the the opt, is the, you can see that the optimal algorithm in this case that's denoted in black by the black lines here produces a matching of cardinality 4. The greedy algorithm, which is denoted by these red lines here produces a matching of cardinality 3 and so M greedy is not equal to M opt in this case. The set of boys is on the left, and the set of girls is on the, on the right. [SOUND] Now what we're going to do is we're going to consider the set of girls G that is matched in the optimal matching, but is not matched in the greedy matching. Okay? So here, for example, in this case the girl d is actually matched in the optimal matching for d match to boy 4 but the greedy algorithm actually doesn't match d. It matches a, b, and c but it doesn't, match G match d and so the set G consists of the single node d in this case. But we define the set G to be the set of girls that are matched in the optimal algorithm but not in the greedy algorithm. Now, we're going to define the set B to be the set of boys that are adjacent to the girls in the set G. So in this case, the set G just has a single node. The set of boys that are adjacent. Are the are, are, are 3 and, and 4. And so there are 2 boys in the, in the set B that are adjacent to the girls in the set G. Now the first observation that we make, is that the opt, the the cardinality of the optimal matching is less than or equal to the cardinality of the greedy matching plus the, the size of the set G. Now this is by definition. Because G is just a set of girls that are matched in M optimal, but not in M greedy.