Placeholder Image

Subtitles section Play video

  • Welcome back to Mining of Massive Datasets.

  • In today's lecture we're going to look at a new topic, Online Algorithms.

  • And we're going to look at a particular application of Online Algorithms,

  • which is Performance-based Advertising.

  • The kind of advertising that you see on the web or on mobile.

  • Now, in the classic model of algorithms, which we're all familiar with,

  • you get to see the entire input and then you compute some function of it.

  • For example, you may be given a set of data points and

  • you might want to compute clusters on those data points.

  • In this context, we called a classic model of algorithms an offline algorithm.

  • An offline algorithm has access to its enti, entire input dataset.

  • The kind of algorithm that we're going to look at in today lect, today's lecture,

  • called Online Algorithms, don't get to see the entire data set at one one time.

  • They get to see the input one piece at a time.

  • They don't see all the input and yet, as they see a partial input they have to

  • make some kind of irrevocable deficient along the way.

  • They produce part of the output even when they've seen only part of the output.

  • And it's kind of similar to the data stream model but it's not identical.

  • There are certain subtle differences that'll become apparent as we go along.

  • We're going to start our discussion of Online Algorithms by looking at

  • a very simple problem called Bipartite Matching.

  • So, the Bipartite Matching problem, starts with the bipartite graph.

  • So, here we have a bipartite graph with, with two sets of nodes or vertices.

  • We've labelled one set of nodes Boys, and the other set of nodes Girls.

  • We could have equally well labeled them something else.

  • But it just so happens to be convenient to be, to label them Boys and

  • Girls in this context.

  • And in this case, let's say the edges between between the the vertices

  • signify compatibility between Boys and Girls.

  • Right?

  • And the goal of the Bipartite Matching problem is to

  • match as many compatible pairs of Boys and Girls as possible.

  • That is, find as many pairs of nodes as possible, subject to

  • the constraint that those pair, those pairs of nodes are connected by an edge.

  • For example, here we've found three pairs of nodes 1,a, 2,b, and 3,d.

  • And in each case, this is a compatible pair,

  • where it's an edge between between each pair of nodes that we have picked.

  • And notice that once you have picked 1,a, 2,b, and 3,d will be

  • unable to do anything with Boy 4 or Girl c because there is no edge between them.

  • And there are no other Boys or Girls that are available.

  • The only edge that Boy 4 has is to Girl a but Girl a is already paired with Boy 1.

  • So, it, so you can't pair them up.

  • So in this case we produced a matching of cardinality 3 because we

  • found 3 pairs then that it could match up.

  • So in the the cardinality of the matching in this case there's, is 3.

  • Now with the same graph, it turns out that we can actually, do better.

  • We can find a matching of cardinality 4 if we are a little bit more clever.

  • And here is an example of a matching of cardinality 4.

  • We've matched 1,c, 2,b, 3,d, and 4,a.

  • And once we do that, we've matched, every Boy with some Girl, and

  • every Girl with some Boy.

  • And this is what's called a perfect matching because it

  • matches every node to some other node in the in the graph.

  • To clarify our terminology perfect matching is a matching that

  • matches every vertex in the graph, or every node in the graph.

  • And maximum matching is a matching that is as large as possible for the given graph.

  • For example, in this case the matching M is both a perfect matching and

  • a maximum matching.

  • Every perfect matching has to be maximum because you can't,

  • obviously, find a matching that's larger than perfect.

  • however, there could be graphs where there is no perfect matching, but you,

  • but there's always a maximum matching which is the best that you can do for for

  • that given graph.

  • So in this case, the maximum matching has cardinality 4.

  • Whereas the earlier matching that we found had just cardinality 3,

  • and that's not a maximum matching.

  • Now, the goal of the bipartite matching problem is to find the maximum matching or

  • find a maximum matching, since the maximum matching is not necessarily unique.

  • The goal is to find a maximum matching for

  • a given bipartite graph, and if-if-if it's, if the max,

  • maximum matching happens to be a perfect one, we'll end up finding a perfect one.

  • If not, will just be a maximum matching.

  • Now, the maximum, the,

  • the matching problem is a well-studied problem in the field of algorithms.

  • And there's a, there is a very well known

  • an elegant polynomial-time offline algorithm that, you know, remember

  • an offline algorithm is when you have the entire graph available to the algorithm.

  • And it's based on this idea of augmenting path.

  • It's a very classic algorithm from 1973 by Hopcroft and Karp.

  • And if, you know, if you're curious you can look it up on,

  • on Wikipedia following that the link that I've given there.

  • Now however, suppose you don't know the entire graph in advance.

  • You only have a piece of the graph available to you at any point.

  • In the online version of the graph matching problem,

  • we're initially given the sets boys and girls, the two sets of nodes.

  • But you're not given all the edges between the nodes.

  • In fact, the edges are revealed one round at a time.

  • In each round one girl's choices are revealed.

  • That is, the edges between that girl and the boys are revealed.

  • Now, in each round, the algorithm has to make an irrevocable decision.

  • And the and that decision is to either pair the girl with some boy, right?

  • So we know all the edges from the girl, from that girl to, to the boy set.

  • So, we know all the compatible boys for that girl.

  • So we can either pair the girl with a boy.

  • Then we'll have to pick a boy who has not already been paired because that's,

  • that's part of the rules of the game.

  • If there is a boy among the girl's compatible set who has not already been

  • paired, we can choose to pair the girl with that boy or

  • we can decide to not pair the girl with any boy, and move on to the next girl.

  • But once, we've made the choice, we cannot go back.

  • We can't go back and pair the girl if he had not pair her at that point.

  • And we cannot you know, go back and

  • change the pairing of that girl, if you've already paired her.

  • So, the choice is irrevocable and made at that point.

  • For an example application that, you know, that is,

  • is kind of along these lines, and, and

  • motivates looking at this algorithm is is assigning tasks to servers.

  • Let's say, we have a large number of servers of different kinds and

  • tasks come along.

  • And each task can only be assigned to certain servers.

  • Then the goal is to assign those tasks to each,

  • each task to a server that can handle that task.

  • And sometimes we may just choose to reject the task,

  • because we have an overload situation or something like that.

  • So so the the online graph matching problem is in some sense similar to,

  • to the problem of assigning tasks to servers.

  • So here is an example of the of the Online Graph Matching Problem.

  • We start with the with the set of Boys.

  • And here's a here's a first Girl a.

  • And her choices are revealed.

  • So in this case a's choices you know a's compatible with 1 and and 4.

  • And at this point, if you have to make a choice either we decide to pair a with

  • 1 or 4, or we decide not to pair a with anyone.

  • Let's say we decided to pair a with 1 and the output 1,a as our first pair.

  • Now the second Girl node comes in and notice that

  • that b is compatible with nodes 2 and 3.

  • And once again, we have the same choice, and let's say we we choose to pair 2,b.

  • And now let's say c comes in.

  • And we have just the one edge, 1,c.

  • Now, we cannot pair c with c with 1, because 1 is already paired with a.

  • So our hand is forced and we don't output anything in this round.

  • And them d comes in and d there's only one edge, 3,d.

  • And so

  • we can output that that edge because in this case we decided to pair 3 and d.

  • Right so, this is an example of the Online Graph Matching scenario.

  • Now we what you want to do is we want to design an algorithm in this

  • online setting that makes a choice at every step.

  • That finds as large a matching as possible by the time everything is revealed,

  • all the edges are revealed, right?

  • and, how do we go about, doing something like that?

  • Now, it's a very complicated problem because we

  • don't know what's going to come up.

  • So, any choice that we make at the moment might paint us into a corner.

  • For example, that you know,

  • because we output 1,a when c came along, we were unable to match c.

  • Had we instead you know debated and not are our output eh,

  • a,4 then we could, inst, you know, we couldn't, then c came along.

  • We could have output 1,c but we sort of painted ourselves into a corner and

  • denied ourselves that choice by making an irrevocable decision, along the way.

  • So as you can see this is very very tough problem to solve in an optimal way and

  • we have no hope, really, of finding an optimal solution on maximum matching.

  • What we can hope instead, is to come up with some kind of heuristic

  • that gets us you know a fair distance of the way there.

  • A heuristic that gives us a good matching but not necessarily, the best one.

  • And something heuristic, is called a greedy heuristic.

  • And the Greedy Algorithm is a very, very simple algorithm.

  • And the Greedy Algorithm says, well, when you when a girl charges for

  • a video don't ever choose not to pay the girl if possible.

  • If there's any eligible boy,

  • if there's any boy with whom the girl is compatible, where there's an edge.

  • And that boy's not already paired then just pick a boy a bit early,

  • pick such, such a boy a bit early and pair that girl with that boy.

  • And if there is no eligible boy then don't pair a girl,

  • obviously because we can't do anything there.

  • So it's a very, very simple algorithm the greedy algorithm and

  • let's say we go for this algorithm the, the very interesting question is.

  • How good is a greedy algorithm?

  • Or how bad can it be in practice, how good can it be in practice?

  • How can we even analyze how good an algorithm is in this setting, right?

  • To sort of analyze how good an algorithm is in, in the online setting,

  • we've sort of come up with a new notion that's called the competitive ratio.

  • Where we compare the greedy algorithm, or

  • any algorithm in general to an offline algorithm.

  • So the way we analyze online algorithms is by comparing them with

  • the best offline algorithm.

  • So to define this formally suppose we have an input I, and remember

  • the offline algorithm feeds the whole input I and it can make an optimal choice.

  • In this case it'd be an optimal matching, and

  • let's say that optimal matching is M optimum.

  • Right?

  • whereas, the greedy algorithm sees only one input at a time, and so

  • it ends up picking a matching that's M greedy.

  • Now we're going to sort of in, in the most likely scenario we

  • know that the cardinality of M greedy is going to be less than our, less than or

  • equal to M optimal.

  • It can only be greater than M optimal,

  • because M optimal is the optimal matching and we can't do better than that.

  • The greedy algorithm is not always going to match the optimal matching, but

  • it's you know, going to be you know, less than the optimal matching or

  • in the best case scenario, equal to it.

  • Now, the competitive ratio of the greedy algorithm is defined to be

  • the ratio of the cardinality of M greedy to the cardinality of M optimal

  • over all possible inputs I, we take the minimum.

  • Right? So, it's kind of the worst case

  • performance of the greedy algorithm.

  • Right? We take the worst possible input I for

  • the greedy algorithm and we compute the, the ratio of the cardinality

  • of the greedy output to the optimal output in that in that scenario.

  • So this is what we mean by the, the competitive ratio of the greedy algorithm,

  • or in general, the competitive ratio of any online algorithm,

  • who's defined in a similar manner.

  • So let's see whether we can figure out what the competitive ratio of

  • the greedy algorithm is.

  • Right?

  • Now let's suppose the greedy algorithm doesn't actually produce

  • the optimum matching.

  • Now if the greedy algorithm actually produces the optimum matching,

  • then we know the complexity ratio is 1.

  • But suppose the greedy algorithm doesn't produce the optimum matching, so

  • it's actually less than the optimal matching.

  • So here's a, here's a scenario where the the opt, is the,

  • you can see that the optimal algorithm in this case that's denoted in

  • black by the black lines here produces a matching of cardinality 4.

  • The greedy algorithm, which is denoted by these red lines here produces a matching

  • of cardinality 3 and so M greedy is not equal to M opt in this case.

  • The set of boys is on the left, and the set of girls is on the, on the right.

  • [SOUND] Now what we're going to do is we're going to consider the set of girls G

  • that is matched in the optimal matching, but is not matched in the greedy matching.

  • Okay?

  • So here, for example, in this case the girl d

  • is actually matched in the optimal matching for d match to boy 4 but

  • the greedy algorithm actually doesn't match d.

  • It matches a, b, and c but it doesn't, match G match d and so

  • the set G consists of the single node d in this case.

  • But we define the set G to be the set of girls that are matched in

  • the optimal algorithm but not in the greedy algorithm.

  • Now, we're going to define the set B

  • to be the set of boys that are adjacent to the girls in the set G.

  • So in this case, the set G just has a single node.

  • The set of boys that are adjacent.

  • Are the are, are, are 3 and, and 4.

  • And so there are 2 boys in the,

  • in the set B that are adjacent to the girls in the set G.

  • Now the first observation that we make, is that the opt,

  • the the cardinality of the optimal matching is less than or

  • equal to the cardinality of the greedy matching plus the, the size of the set G.

  • Now this is by definition.

  • Because G is just a set of girls that are matched in M optimal, but not in M greedy.