 ## Subtitles section Play video

• Welcome back to Mining of Massive Datasets.

• In today's lecture we're going to look at a new topic, Online Algorithms.

• And we're going to look at a particular application of Online Algorithms,

• which is Performance-based Advertising.

• The kind of advertising that you see on the web or on mobile.

• Now, in the classic model of algorithms, which we're all familiar with,

• you get to see the entire input and then you compute some function of it.

• For example, you may be given a set of data points and

• you might want to compute clusters on those data points.

• In this context, we called a classic model of algorithms an offline algorithm.

• An offline algorithm has access to its enti, entire input dataset.

• The kind of algorithm that we're going to look at in today lect, today's lecture,

• called Online Algorithms, don't get to see the entire data set at one one time.

• They get to see the input one piece at a time.

• They don't see all the input and yet, as they see a partial input they have to

• make some kind of irrevocable deficient along the way.

• They produce part of the output even when they've seen only part of the output.

• And it's kind of similar to the data stream model but it's not identical.

• There are certain subtle differences that'll become apparent as we go along.

• We're going to start our discussion of Online Algorithms by looking at

• a very simple problem called Bipartite Matching.

• So, the Bipartite Matching problem, starts with the bipartite graph.

• So, here we have a bipartite graph with, with two sets of nodes or vertices.

• We've labelled one set of nodes Boys, and the other set of nodes Girls.

• We could have equally well labeled them something else.

• But it just so happens to be convenient to be, to label them Boys and

• Girls in this context.

• And in this case, let's say the edges between between the the vertices

• signify compatibility between Boys and Girls.

• Right?

• And the goal of the Bipartite Matching problem is to

• match as many compatible pairs of Boys and Girls as possible.

• That is, find as many pairs of nodes as possible, subject to

• the constraint that those pair, those pairs of nodes are connected by an edge.

• For example, here we've found three pairs of nodes 1,a, 2,b, and 3,d.

• And in each case, this is a compatible pair,

• where it's an edge between between each pair of nodes that we have picked.

• And notice that once you have picked 1,a, 2,b, and 3,d will be

• unable to do anything with Boy 4 or Girl c because there is no edge between them.

• And there are no other Boys or Girls that are available.

• The only edge that Boy 4 has is to Girl a but Girl a is already paired with Boy 1.

• So, it, so you can't pair them up.

• So in this case we produced a matching of cardinality 3 because we

• found 3 pairs then that it could match up.

• So in the the cardinality of the matching in this case there's, is 3.

• Now with the same graph, it turns out that we can actually, do better.

• We can find a matching of cardinality 4 if we are a little bit more clever.

• And here is an example of a matching of cardinality 4.

• We've matched 1,c, 2,b, 3,d, and 4,a.

• And once we do that, we've matched, every Boy with some Girl, and

• every Girl with some Boy.

• And this is what's called a perfect matching because it

• matches every node to some other node in the in the graph.

• To clarify our terminology perfect matching is a matching that

• matches every vertex in the graph, or every node in the graph.

• And maximum matching is a matching that is as large as possible for the given graph.

• For example, in this case the matching M is both a perfect matching and

• a maximum matching.

• Every perfect matching has to be maximum because you can't,

• obviously, find a matching that's larger than perfect.

• however, there could be graphs where there is no perfect matching, but you,

• but there's always a maximum matching which is the best that you can do for for

• that given graph.

• So in this case, the maximum matching has cardinality 4.

• Whereas the earlier matching that we found had just cardinality 3,

• and that's not a maximum matching.

• Now, the goal of the bipartite matching problem is to find the maximum matching or

• find a maximum matching, since the maximum matching is not necessarily unique.

• The goal is to find a maximum matching for

• a given bipartite graph, and if-if-if it's, if the max,

• maximum matching happens to be a perfect one, we'll end up finding a perfect one.

• If not, will just be a maximum matching.

• Now, the maximum, the,

• the matching problem is a well-studied problem in the field of algorithms.

• And there's a, there is a very well known

• an elegant polynomial-time offline algorithm that, you know, remember

• an offline algorithm is when you have the entire graph available to the algorithm.

• And it's based on this idea of augmenting path.

• It's a very classic algorithm from 1973 by Hopcroft and Karp.

• And if, you know, if you're curious you can look it up on,

• on Wikipedia following that the link that I've given there.

• Now however, suppose you don't know the entire graph in advance.

• You only have a piece of the graph available to you at any point.

• In the online version of the graph matching problem,

• we're initially given the sets boys and girls, the two sets of nodes.

• But you're not given all the edges between the nodes.

• In fact, the edges are revealed one round at a time.

• In each round one girl's choices are revealed.

• That is, the edges between that girl and the boys are revealed.

• Now, in each round, the algorithm has to make an irrevocable decision.

• And the and that decision is to either pair the girl with some boy, right?

• So we know all the edges from the girl, from that girl to, to the boy set.

• So, we know all the compatible boys for that girl.

• So we can either pair the girl with a boy.

• Then we'll have to pick a boy who has not already been paired because that's,

• that's part of the rules of the game.

• If there is a boy among the girl's compatible set who has not already been

• paired, we can choose to pair the girl with that boy or

• we can decide to not pair the girl with any boy, and move on to the next girl.

• But once, we've made the choice, we cannot go back.

• We can't go back and pair the girl if he had not pair her at that point.

• And we cannot you know, go back and

• change the pairing of that girl, if you've already paired her.

• So, the choice is irrevocable and made at that point.

• For an example application that, you know, that is,

• is kind of along these lines, and, and

• motivates looking at this algorithm is is assigning tasks to servers.

• Let's say, we have a large number of servers of different kinds and

• tasks come along.

• And each task can only be assigned to certain servers.

• Then the goal is to assign those tasks to each,

• each task to a server that can handle that task.

• And sometimes we may just choose to reject the task,

• because we have an overload situation or something like that.

• So so the the online graph matching problem is in some sense similar to,

• to the problem of assigning tasks to servers.

• So here is an example of the of the Online Graph Matching Problem.

• We start with the with the set of Boys.

• And here's a here's a first Girl a.

• And her choices are revealed.

• So in this case a's choices you know a's compatible with 1 and and 4.

• And at this point, if you have to make a choice either we decide to pair a with

• 1 or 4, or we decide not to pair a with anyone.

• Let's say we decided to pair a with 1 and the output 1,a as our first pair.

• Now the second Girl node comes in and notice that

• that b is compatible with nodes 2 and 3.

• And once again, we have the same choice, and let's say we we choose to pair 2,b.

• And now let's say c comes in.

• And we have just the one edge, 1,c.

• Now, we cannot pair c with c with 1, because 1 is already paired with a.

• So our hand is forced and we don't output anything in this round.

• And them d comes in and d there's only one edge, 3,d.

• And so

• we can output that that edge because in this case we decided to pair 3 and d.

• Right so, this is an example of the Online Graph Matching scenario.

• Now we what you want to do is we want to design an algorithm in this

• online setting that makes a choice at every step.

• That finds as large a matching as possible by the time everything is revealed,

• all the edges are revealed, right?

• and, how do we go about, doing something like that?

• Now, it's a very complicated problem because we

• don't know what's going to come up.

• So, any choice that we make at the moment might paint us into a corner.

• For example, that you know,

• because we output 1,a when c came along, we were unable to match c.

• Had we instead you know debated and not are our output eh,

• a,4 then we could, inst, you know, we couldn't, then c came along.

• We could have output 1,c but we sort of painted ourselves into a corner and

• denied ourselves that choice by making an irrevocable decision, along the way.

• So as you can see this is very very tough problem to solve in an optimal way and

• we have no hope, really, of finding an optimal solution on maximum matching.

• What we can hope instead, is to come up with some kind of heuristic

• that gets us you know a fair distance of the way there.

• A heuristic that gives us a good matching but not necessarily, the best one.

• And something heuristic, is called a greedy heuristic.

• And the Greedy Algorithm is a very, very simple algorithm.

• And the Greedy Algorithm says, well, when you when a girl charges for

• a video don't ever choose not to pay the girl if possible.

• If there's any eligible boy,

• if there's any boy with whom the girl is compatible, where there's an edge.

• And that boy's not already paired then just pick a boy a bit early,

• pick such, such a boy a bit early and pair that girl with that boy.

• And if there is no eligible boy then don't pair a girl,

• obviously because we can't do anything there.

• So it's a very, very simple algorithm the greedy algorithm and

• let's say we go for this algorithm the, the very interesting question is.

• How good is a greedy algorithm?

• Or how bad can it be in practice, how good can it be in practice?

• How can we even analyze how good an algorithm is in this setting, right?

• To sort of analyze how good an algorithm is in, in the online setting,

• we've sort of come up with a new notion that's called the competitive ratio.

• Where we compare the greedy algorithm, or

• any algorithm in general to an offline algorithm.

• So the way we analyze online algorithms is by comparing them with

• the best offline algorithm.

• So to define this formally suppose we have an input I, and remember

• the offline algorithm feeds the whole input I and it can make an optimal choice.

• In this case it'd be an optimal matching, and

• let's say that optimal matching is M optimum.

• Right?

• whereas, the greedy algorithm sees only one input at a time, and so

• it ends up picking a matching that's M greedy.

• Now we're going to sort of in, in the most likely scenario we

• know that the cardinality of M greedy is going to be less than our, less than or

• equal to M optimal.

• It can only be greater than M optimal,

• because M optimal is the optimal matching and we can't do better than that.

• The greedy algorithm is not always going to match the optimal matching, but

• it's you know, going to be you know, less than the optimal matching or

• in the best case scenario, equal to it.

• Now, the competitive ratio of the greedy algorithm is defined to be

• the ratio of the cardinality of M greedy to the cardinality of M optimal

• over all possible inputs I, we take the minimum.

• Right? So, it's kind of the worst case

• performance of the greedy algorithm.

• Right? We take the worst possible input I for

• the greedy algorithm and we compute the, the ratio of the cardinality

• of the greedy output to the optimal output in that in that scenario.

• So this is what we mean by the, the competitive ratio of the greedy algorithm,

• or in general, the competitive ratio of any online algorithm,

• who's defined in a similar manner.

• So let's see whether we can figure out what the competitive ratio of

• the greedy algorithm is.

• Right?

• Now let's suppose the greedy algorithm doesn't actually produce

• the optimum matching.

• Now if the greedy algorithm actually produces the optimum matching,

• then we know the complexity ratio is 1.

• But suppose the greedy algorithm doesn't produce the optimum matching, so

• it's actually less than the optimal matching.

• So here's a, here's a scenario where the the opt, is the,

• you can see that the optimal algorithm in this case that's denoted in

• black by the black lines here produces a matching of cardinality 4.

• The greedy algorithm, which is denoted by these red lines here produces a matching

• of cardinality 3 and so M greedy is not equal to M opt in this case.

• The set of boys is on the left, and the set of girls is on the, on the right.

• [SOUND] Now what we're going to do is we're going to consider the set of girls G

• that is matched in the optimal matching, but is not matched in the greedy matching.

• Okay?

• So here, for example, in this case the girl d

• is actually matched in the optimal matching for d match to boy 4 but

• the greedy algorithm actually doesn't match d.

• It matches a, b, and c but it doesn't, match G match d and so

• the set G consists of the single node d in this case.

• But we define the set G to be the set of girls that are matched in

• the optimal algorithm but not in the greedy algorithm.

• Now, we're going to define the set B

• to be the set of boys that are adjacent to the girls in the set G.

• So in this case, the set G just has a single node.

• The set of boys that are adjacent.

• Are the are, are, are 3 and, and 4.

• And so there are 2 boys in the,

• in the set B that are adjacent to the girls in the set G.

• Now the first observation that we make, is that the opt,

• the the cardinality of the optimal matching is less than or

• equal to the cardinality of the greedy matching plus the, the size of the set G.

• Now this is by definition.

• Because G is just a set of girls that are matched in M optimal, but not in M greedy.