Subtitles section Play video Print subtitles (upbeat ambient music) - I'm hoping that I'm gonna tell you something that's interesting and, of course, I have this very biased view, which is I look at things from my computational lens and are there any computer scientists in the room? I was anticipating not, but okay, there are, so there's one, maybe every now and then I'll ask you a question, no, no, no, I'm just kidding, but, so, and then so my goal here is gonna be to basically, actually just give you a flavor of what is machine learning, this is my expertise, and so just, actually, again, to get a sense of who's in the room, like, if I picked on someone here, like raise your hand if you would be able to answer that question, like, what is machine learning? Okay, a handful, no, actually one, or two. Great, okay, so I just want to give you a sense of that, and I'm gonna, you know, most of this is gonna be pretty intuitive, I'll try to make little bits of it concrete that I think will be helpful, and then I'll tell you how we use machine learning to improve guide designs, specifically for knockdown experiments, but I think a lot of it is probably useful for more than that, but we haven't sort of gone down that route, and so I can't say very much about that. And please interrupt me if something doesn't make sense or you have a question, I'd rather do that so everybody can kind of stay on board rather than some, you know, it makes less and less sense the longer I go. Alright, so machine learning, actually, during my PhD, the big, one of the big flagship conferences was peaking at around 700 attendees, and when I go now, it actually is capped, like, it's sold out at 8,000 like, months in advance, 'cause this field is just like, taken off, basically it's now lucrative for companies, and it's become a really central part of Google, Microsoft, Facebook, and all the big tech companies, so this field has changed a lot, and kind of similar to CRISPR, there's an incredible amount of hype and buzz and ridiculous media coverage and so it's a little bit funny, in fact, that I'm not working at these two kind of, very hyped up areas. But anyway, so, you know, people in just the mainstream press now, you're always hearing about artificial intelligence and deep neural networks, and so these are like, so I would say machine learning is a sub-branch of artificial intelligence, and a deep neural network is sort of an instance of machine learning, and so like, what really is this, this thing? So it kind of overlaps sometimes with traditional statistics, but the, like, in terms of the machinery, but the goals are very different and, but, really like the core, fundamental concept here is that we're gonna sort of pause at some model, so maybe like, think linear regression is a super simple model, and you can like, expose it to data, it has some parameters, right, the weights, and then we essentially want to fit those weights, and that's the training, that's literally the machine learning. So I'm sorry if that sounds super simple and not like, God-like, like machine learning and everything working magically, but that really is what it is, and, right, and so let me just also give you like, sort of drive home that point. So we're gonna pause at some sort of model, and so here I'm giving you the simplest example because I think most people here work with linear regression at some point in their life, and so you can think of this as a predictive model in the sense that if I give it a bunch of examples of Y and X, and I learn the parameter of beta, then for future examples where I don't have Y but I only have X, I can just compute, X times beta, and I get a prediction of why. So that's the sense in which I call this a predictive model, and that's very much how machine learning people tend to think of it, where statisticians are often very focused on what is beta, what are the confidence intervals around beta and things like this. So like, there's, that's the sense in which there's a lot of overlap, but the goals are kind of quite different. We want to like, use real data and make predictions, so here it's gonna be predictions about guides, and which guides are effective at cutting and at knockout. Right, and so it has these free parameters, and we call these things that we put in here features, and so in the case of guide design, the question is gonna be, what features are we gonna put in there that allow us to make these kinds of predictions, and, so I'm gonna get into that in a little bit, but just as an example to make this concrete, it might be how many GCs are in this 30mer guide, or guide plus context. Right, and like I said, we're gonna call, we're gonna give it some data, and so in this case, the data for guide design is gonna be data from (mumbles), there's a community that's now publicly available where there are examples, for example, what the guide was and how effective the knockout was, or what the cutting frequency was. For example, I get a good, a bunch of these examples, and then that's gonna enable me to somehow find a good beta, and of course we're not, actually, we do sometimes use linear regression, but I'll tell you a little bit more about, more sort of complex and richer models that let us do a lot more, and then the goal is going to be to fit this beta in a good way, and like, I'm not gonna do some deep dive on that here, but in the one way that you are publicly familiar with is just means squared error, and when you find the beta that minimizes this for your example training data, then you get some estimate of beta and you hope that on unseen examples when you do X times beta, it gives you a good prediction. So is that sort of make it somewhat concrete, what I mean by a predictive model and how you could view linear regression as a predictive model in how you might use this for guide design? Okay, so obviously I'll tell you a lot more. So, right, but linear regression is just sort of the simplest possible example, and so in our work we actually use, some of the time, what are called classification or regression trees, and so in contrast to here where you might have, say, this, you might have a bunch of these features, right, like how many GCs were in my guide, and then another feature might be, was there an A in position three, and you can put in as many as you want, and then you get all these betas estimated. So it's very simple, because in that case, none of these features can interact with each other, right, you just, you know you just add X times beta one plus X times beta two, so we call this like, a linear additive model. In contrast, these trees allow very sort of deep interactions among the features, so this might be how many GCs, so, of course, this is just, I didn't, this is not suited to the features I just described, but this might be some feature like, I don't know, proportion of GCs, 'cause now it's fractional, and then it, this algorithm, which is gonna train the betas, so find a good value beta, well, sort of through a procedure that I'm not gonna go into detail for all these models, how it works, but it's going to somehow look at the data and determine that it should first split on the second feature at this value, and then it will sort of keep going down that. It says, "Now partition the examples "in my training data like this." And then on the second feature in this way, until you end up at the sort of leaves of this tree, and these leaves are the predictions. And so when you do it for the training data, whichever training examples here end up at this leaf, you basically take their mean, and that's now the prediction for that leaf, and if you take a new example, you basically just pipe it through this, these sort of rules, and you end up with that kind of prediction. This, simplified, but I think it's a good conceptual view, and this is just another way of thinking about it is if you only had two features and you drew them, like, one against the other, then effectively, every time you make a branch here, you're kind of cutting up this space. So that's also just another way to think about it. And so, also, so this is, now all over the press nowadays, and whenever I give these talks, there's a bunch of young, hungry grad students who say, "Did you do deep neural networks?" 'Cause that's what everybody wants to do now, and so deep neural networks, they're kind of like a really fancy linear regression. So you could think of these as the Xs in linear regression, you can think of this as, imagine there's only one thing out here, I should have done a different picture, but that's just Y. And so this again is a mapping where you give it the Xs, you do a bunch of stuff, and out you get a Y here, except linear regression, you know, is this very simple thing, and now you can see, there's all these other kinds of, we call these like, hidden nodes, and so there's this complicated mess now of parameters, beta, and again, I'm not gonna go into it, I just want to give you a sense that like, linear regression is this very simple thing, and there's a lot of other models that let you do much, much better prediction, and these are typically the kinds of models that we use, because they're more powerful if we care about prediction. But the flip side is they're actually very hard to interpret, and so if you want to ask a question, like, was it the GC feature that is most important in your guide design, which is what I always, you know, get a question like this, and we can do our best to kind of poke and prod at this machine, but it's always a little bit ad hoc, it's hard, the more complicated the model, then the, you know, the better we might predict and the less interpretable it is, and so there's always this kind of tension. So right, and so what are some of the challenges? So I've sort of shown you sort of some, like, increasing amount of complexity in some models, and so one of the big difficulties is, if I pause at a very complex model with a lot of parameters, then I need a lot of data in order to actually fit those parameters, and if I don't have enough data, what's gonna happen, is the parameters, you're gonna find this very specific setting of the parameters that effectively memorize the training data, and the problem then is you give it a new example that you really care about, you say, I want to knock out that gene, and it's never seen that gene, 'cause it's sort of memorized, but we say it's like, over fit to the data, it doesn't actually generalize well to these unseen examples.