Placeholder Image

Subtitles section Play video

  • Chris Cotsapas: I’d like to thank the organizers for the

  • opportunity to come talk to you guys about what weve been thinking about in my lab.

  • So, what I’m going to talk about primarily is stuff that’s going on. So, all of this

  • is unpublished. Feel free to think about it, share it, whatever. But it’s very much work

  • in progress. Some of it is hot off the press. So, do take it with a pinch of salt. So, what

  • we think about a lot is autoimmune diseases in my lab. And we kind of want to think about

  • which genes go wrong in disease, and we think about these regulatory genes. But actually

  • what were interested in are the causal genes. And my pointer doesn’t work. I can

  • use this pointer. It’s all coming up Chris today.

  • So, were thinking more about causality than anything else. So, when we say dysregulation,

  • were interested in pathogenesis, right? That’s ultimately what were after. And

  • so, just a 30,000 foot view of the immune system. If you remember, you start with a

  • stem cell. You have two major lineages in the immune system, that the lymphoid and the

  • myeloid lineages. So, things like macrophages are all the way down here. And your T cells

  • and B cells are all the way down here. If you think of you think of them as adaptive

  • versus innate. And what happens is every now and then, this goes wrong. So, the immune

  • system’s primary function is to protect the body from things that are foreign. And

  • so it’s got this amazing capacity to tell the difference between your cells and the

  • rest of world. And it’s really good at this, but occasionally it screws up. And it kind

  • of -- what happens is that it starts attacking certain tissues.

  • So, if it doesn’t like myelin, you get multiple sclerosis. The immune systems manage to go

  • into the brain and attack the myelin sheath [spelled phonetically] very specifically around

  • neurons, chew it up, and you get lesions into your brain. You can get things like skin attacks

  • which give you Sjogren’s syndrome, scleroderma, you can get type 1 diabetes, which we now

  • know is an immune disease. If it doesn’t like aspects of the GI tract, you wind up

  • with Crohn’s disease, ulcerative colitis, or celiac disease if it doesn’t like the

  • epithelia joint; specific joint dislikes, should we say, give you rheumatoid arthritis

  • or ankylosing spondylitis. And if it just doesn’t like DNA, if it doesn’t like nucleic

  • acid, it attacks everything, then you wind up something called lupus, right? What’s

  • really interesting is that these are very, very, specific dislikes. So, MS is not rheumatoid

  • arthritis. It’s a very specific attack against myelin. It’s not a specific attack against

  • anything else.

  • And what we really want to understand is what these diseases are. So, something’s going

  • wrong with the immune system. We don’t really understand what it is. What we do know is

  • that all of these diseases are common. Theyre complex genetic diseases. There’s a large

  • portion of heritability. They track in families. But theyre not Mendelian. It’s not one

  • catastrophic mutation, right? And, of course, as GWAS came along, I’m going to talk about

  • multiple sclerosis, which is something that I work on. But you can take this as read for

  • any immune disease. As GWAS came along, we hadn’t really gotten a lot of traction on

  • the genetics of these diseases. And then, sort of we barely managed to identify two

  • loci in the genome in one of the first GWAS studies. Then a little while later, we managed

  • to get another one. A meta-analysis of these two sets of studies from international consortia

  • kind of gave six new hits, and were starting to climb this power curve of discovery.

  • Then a further meta-analysis with more markers and a few more samples gave us an additional

  • three new hits. Even more samples gave us another 25 new hits. The immunochip gave us

  • 47. That took us up to 100. And our current studies, which are about 16,000 cases, 26,000

  • controls and replication in another 36,000 samples, weve got another 100 odd new hits.

  • So, were standing at around 200 loci right now in GWAS, right? That explains -- including

  • the HLA -- it explains about 55 percent of the heritability. We estimate that in the

  • common space there’s probably another 600 to 800 loci that we don’t know about yet.

  • We kind of do know about them. Theyre not genome-wide significant yet. But we know theyre

  • there. And we know the approximate complexity of the disease is about 1,000 independent

  • variants.

  • And so, when ENCONDE came along and we did -- we were a very small part of this paper

  • from John Stam sort of showing that in Crohn’s disease and in multiple sclerosis, there is

  • strong enrichment of the risk SNPs on regulatory regions active in very specific subsets of

  • the immune cells. And in multiple sclerosis in particular, you can see CD3 cells, CD19s,

  • B lymphocytes, and CD14s, which is interesting. There’s a lot of pathogenesis coming out

  • of T cells as well. But these are more B cell like. And so, dysregulation in multiple sets

  • of immune cells seems to be an issue here. But this kind of sends us chasing down this

  • idea that is now extremely common. And this is one of the great right, right? So, 10 years

  • ago GWAS wasn’t going to work. And five years ago, everyone was asking why we haven’t

  • solved disease yet. Five years ago, everything was coding. And now, everything is now regulatory.

  • And it seems really obvious. But even two, three years ago, this was not that obvious.

  • And so, this chases us down -- starts us chasing down this rabbit hole of which genes are getting

  • dysregulated and how does that cause disease. And so, that’s what we are going to talk

  • about today -- further evidence that in specific immune cells, you get dysregulation that maps

  • into specific transcription factor binding sites as is from Kyle Farh and Brad Bernstein

  • showing that the MS SNPs are particularly enriched for NF-kappa B transcription factor

  • ChIP-seq peaks for instance. And so, there’s something that’s fairly specific dysregulation

  • in immune cells, which is great in bulk, hard when you actually want to identify specific

  • effects on specific genes in specific cells. And so, that’s the task at hand. And so,

  • when you look at some of the loci, you know, you put up a GWAS locus. Here’s a classic

  • locus in MS. Well, there’s NF-kappa B one and mannose-binding protein A. And you could

  • sort of make a case for mannose-binding protein A, but really everyone’s going to assume

  • that NF-kappa B one is one is the appropriate gene. And it turns out that that’s right

  • for various reasons. And so you can start working on that because you kind of are reasonably

  • sure that’s the gene.

  • When you look at another locus of course, that gets a lot more difficult. Youve got

  • this big association peak. There’s a bunch of genes in here, and the problem isn’t

  • that theyre not good candidates. There’s a bunch of good candidates in here. ORMDL3

  • is here. IKZF3, which is Helios, which is a transcription factor that controls T regulatory

  • cell differentiation is there. A bunch of other immune cells. And so, youre kind

  • of going, “What’s going on here?” So, we kind of thought, “Okay. If there is regulation,

  • and we have SNPS, how do we unite the genetics with the epigenomics?” And a lot of people

  • are thinking about this. Youre going to hear a lot more stories about this. Youre

  • already heard some. Here’s how weve been thinking about it.

  • So, were kind of amateur math geeks, and so we thinking about how we can transfer some

  • of this probability and do some functional fine mapping. So, you have a set of SNPs in

  • the genome. Were going to talk about hypersensitive sites now. But instead of DHS, you can think

  • of any regulatory mark. Weve been working a lot with hypersensitive sites because we

  • like them. Theyre stable. Theyre nice. They tell you a lot. Were going to expand

  • this to the other sets. But think about DHS for now. And youve a gene in the locus.

  • So, this is my like tiered view of a locus.

  • So, each of these guys is associated to disease. And -- oh, this is going to chop off my -- thanks.

  • Oh well. So, what that says is posterior probability of association or PPA, okay? So, when you

  • do a GWAS for each of these SNPs, you get a P value of whether it’s associated to

  • disease or not. You can convert that simple P value into basically a posterior probability

  • which tells you, what is the likelihood that this SNPs is the one driving the signal, okay?

  • Were not going to talk about the math magic that underlies that. I’ll bore you with

  • it in person over a coffee if you like. But basically, for each of these SNPs, you can

  • do a magical transformation and get the probability that that’s the SNP that’s driving signal.

  • If it’s very associated, and nothing else is associated, it’s going to be really probable

  • to drive the signal. If there’s a whole bunch of SNPs that are equally associated,

  • youre going to have to spread the probability that it’s caused all over all of those guys,

  • right? That’s the intuition here. So, of course, some of these SNPs are actually on

  • DHSs. And so, you can transfer that probability. I can’t even talk anymore, sorry. That probability

  • to the DHS. You could also do something fancy like say this guys is about this far away

  • from this DHS, so I’m going to give it some proportion here. That’s -- were not doing

  • that right now. But basically, what I can do is come up with a way to score every regulatory

  • region for what their probability of explaining what the association in that region is, right?

  • And if I sum every one of those -- of course not every SNP is on those -- but if I sum

  • all of these posteriors, that gives me the global probability that, in this locus, association

  • is mediated by these regulatory regions. Doesn’t have to be all of it. But if most of the signal

  • is on DHSs, then youre going to get a high percentage, right? It’s going to be close

  • to one. If it doesn’t look like it’s being mediated by regulatory regions, youre going

  • to get a low proportion.

  • So much is easy. What’s cool is you can get think about how you correlate these guys

  • to the genes they control. So, if I had a magic way of saying, “Well, this DHS is

  • correlated to this gene this much, then I can wait how much of the posterior of association

  • gets transferred into this gene, right?” So, if this guy’s perfectly correlated -- if

  • this is what determines whether this gene is expressed -- then if this explains all

  • of the association to a trait, then presumably, it’s active on this gene. Because the DHS

  • isn’t just a DHS. It’s regulating something, right? So, that’s the intuition. And you

  • partition this all this way. And what it says here is CP times PPA, okay? So, that’s just

  • the correlation posterior between this DHS and this gene times how much weight youve

  • given it from the association data. And that way, you wind up building this model of this

  • gene posterior. So, if I sum all of these, all of the contribution of each DHS from the

  • SNPs going into this gene, I can get a sense of what the probability that this gene is

  • driving association in this region is. And I can do that for any gene.

  • So, I now derive a score basically for how likely this gene is to be pathogenic, if that

  • pathogenesis is mediated by DHS regions. And we know theyre enriched, so that’s a

  • reasonable hypothesis, okay? It’s not the only way to do it, but it’s one way to think

  • about this. And so, you have to solve a couple of technical problems to do this. One is,

  • youve got to correlate your DHSs to your genes. And so, that’s really simple. You

  • just observe if there’s a peak, and what the level of expression of a gene is, and

  • then you do a correlation, on-off versus level of expression of a gene. And you do that for

  • each DHS you find.

  • Two issues. First of all, youve got to decide what the same DHS is. And secondly,

  • you need measurements where youve measured both DHS and gene expression, okay? So, to

  • do this thing, we use an alignment approach. This is what real DHS data looks like out

  • of hotspot. These are peaks. This is an arbitrary part of the genome and your job is to figure

  • which ones of these represent the same element across samples. Were not terribly good

  • at that as human beings. Fortunately, computers are a lot better at this than we are.

  • So, you can put it in a clustering approach and kind of decide that these look the same

  • that are a little jittered, but they kind of look similar. And then these guys are kind

  • of the same, but youre may be a little less confident because there’s more spread.

  • And these guys are kind of the same as well, but there’s even more spread, okay? And

  • the way we do this is with mark-off clustering. It’s a way to cluster stuff. There are other

  • ways to do it. It work reasonably well. And the way you think about this -- oh, and that’s

  • gotten chopped up as well. That’s brilliant. Okay. So, one way you might want to do this

  • is to say, is this detectable? And so, you go into the Roadmap data, and fortunately

  • there are replicates.

  • And here’s my assertion. If I see a peak here in replica one of a tissue, then I should

  • expect to see that peak in replica two of a tissue as well, right? Biologically replication

  • just as we do in any other experiment. Really simple. And so once I decide this is my cluster,

  • that’s what comes out of the algorithm, you don’t just go and apply that mindlessly

  • to data. That’s not how you do analysis, right? You check and you see what you can

  • detect. And of course, the wider and the sloppier this peak is, the less likely it is to be

  • true. And so you can do a statistical test. And so, once youve decided what the cluster

  • is, if there’s a peak anywhere in that cluster, you mark that sample as a one. And if there’s

  • no peak, you mark it as a zero. If you have replicates where the labels somewhere over

  • here on that wall, you can then say, “Okay, do I" -- "if I see ones in both replicates

  • I’m going to score that tissue as a two. I’m going to score it as a one if there’s

  • only replicate." So, if itsdiscordant. "And I must -- I’m going to score it as

  • a zero if there’s none there.” And then you can do a test.

  • So, I’ve done this without knowing about replicas. And then I add the information about

  • what goes with what and I ask, “Are they consistent?” So, if I get things likeLook,

  • in cell type one, I get a one. And in two, I get a one. I get all ones.” That suggests

  • this isn’t consistent. It’s not replicating. And if I get a lot of twos a lot of zeros

  • and very few ones, that looks consistent. So, it’s replicating. It’s either not

  • there or it’s there. And so I can do a statistical test. It’s not terribly important what the

  • test is. It’s a simple chi-square approximation. We do this over 57 tissue replicates. So,

  • from Roadmap. And we find that just feeding this in when we cluster, we can get about

  • a million out of 1.99 million. So, about 54 percent of our clusters pass are fairly stringent

  • threshold -- a fairly lenient threshold. And that’s because very often these things are

  • kind of diffuse. The clusters don’t really look good. And so, were probably not doing

  • great at the clustering, and it’s unreliable, right? There’s also a bunch of singleton

  • in these data that get thrown out because they don’t replicate. But most of this is

  • actually the clustering.

  • So, we can get about a million features about the genome. And we don’t worry about recovering

  • more stuff and improving the clustering. Right now, were just working with these million.

  • So, these other thing is, youve got relatively low power. And so, what’s nice about this

  • is this -- what you can clearly read here -- what you can do is estimate how much the

  • heritability youre still explaining. So, this is just a sanity check. If you use all

  • of these clusters, it’s about 14 percent of the genome, and it explains a proportion

  • of heritability. And what I want to know is if I reduce this to the half of the clusters

  • that I’m using now, what proportion of heritability am I still explaining? And to a first approximation,

  • what you can see here is in red is all the peaks and in blue is just the clusters that

  • we define. Pretty much were capturing all of the signal. It varies as wiggle room. There’s

  • a little bit of error on these things, but were capturing just about all of the heritability.

  • But weve gone from 14 percent of genome to 8 percent of the genome.

  • So, rather than do the 500 base pair either side, which is what most of previous heritability

  • estimates have done, which a lot of the summary papers have kind of shown, “Oh, there’s

  • enrichment in DHS or in regulatory regions or whatever.” But they actually bracket

  • each feature by 500 bases. And so, they cover 50 percent of the genome. So yes, all of the

  • heritability is explained by 50 percent of the genome. I’m telling you that a lot of

  • the heritability’s explained by eight percent of the genome. So, it’s a lot bit more specific.

  • And so, the second challenge is to now correlate these guys, now that weve decided what

  • clusters are, to correlate them to gene expression. So, you need matched data. We use 22 sets

  • of matched DHS and exon array data from Roadmap again. And the problem is, there’s massive

  • inflation because gene expression data of course is highly correlated. And so you just

  • get this massive inflation in the expected distribution of these tests. And we can correct

  • this. We just go through and normalize it and basically, you kind of start off with

  • this massive inflation. I’m showing you lambda here. It’s supposed to be a nice

  • straight line here. And we can correct all of that out.

  • So, now that we have all of these statistics, we can go back and do our little approach.

  • So, now we have this part. We already have this part from credible interval, set mapping,

  • and posterior estimation. And we can now estimate gene-wide scores. And so, big red exclamation

  • point here you can see means this is really fresh, as in last Friday’s results. Hot

  • off the presses. Here is a region. It -- were talking about MS GWAS. This is actually the

  • immunochip data from 2013. Chromosome six. One megabase region. And I’m doing this

  • for all of the genes in the region. DHSs explain 94.5 percent of the signal in the region.

  • So, whatever it is, it is really, really likely to be acting through a DHS, right? MDN1, which

  • is one of the genes in the region, explained 55.5 percent, not of this 94 percent, but

  • of the total signal in the region. That’s how it feeds through, right? So, that’s

  • what I’m doing. So, BACH2 is 16 percent. Between these two genes, you’d be hard-pressed

  • to say that any of these other genes are really sort of pushing the signal, but it’s probably

  • this one. So, this is a way to prioritize genes based on regulatory potential. Now,

  • it’s really important to look at this number as well. If this number is low, you kind of

  • think, “Well, it’s not really likely to be regulatory in the first place.” If it

  • is, it’s going to be one of these guys, but it’s not likely to be. In this case,

  • it’s really likely to be, right?

  • So, if you look at another region, this region that I showed you before, the IKZF3 or MDL3.

  • Ah, brilliant. Theyve chopped off. Okay, that read 0.029, 0.022, and theyre ranked

  • and it goes down from there, okay? So, youll see that in this region, about 30 percent

  • of the association signal is explained by DHS clusters as weve defined them, okay?

  • So, it’s not a lot of it. That 30 percent is now basically smeared over a whole bunch

  • of genes. There’s no one gene that explains that signal. So, even if you accept that I’m

  • willing to take this 30 percent as a gamble, there’s no one obvious gene you look at

  • it. And the reason for that is actually that we suspect that what’s going on in this

  • region is there's an entire element -- sort of something like an accessibility element.

  • Some people call them super enhancers. They can mean different things to different folks.

  • What we suspect is going on is there’s an element that sets whether the locus is accessible

  • or not accessible, and that affects the transcription of multiple genes.

  • And so, what the effect may be is actually that youre changing whether this entire

  • locus is available or not available. And there’s a whole bunch of genes in there that then

  • do different things and set a risk state or multiple risk states. And so, sadly it’s

  • not always one locus one gene, but these are probable going to be really interesting. It’s

  • unclear whether we can solve such loci. But theyre going to be really interesting.

  • So, youre going to get examples like this. And it doesn’t work all the time. This approach

  • won’t work all the time because not all loci are simple in the one gene thing. So

  • were going to have think harder about these.

  • So, I’m going to switch gears in the dying moments and just give you another flavor of

  • how were thinking about the other way around of epigenomics. So, so far weve talked

  • about how to analyze these data and make inferences so we can then go and work on certain genes.

  • But what we really want to know at some stage, if changes to gene regulation are what is

  • creating disease states in the immune system -- well, youre not born with an immune

  • disease. Most of these diseases occur in the third, even fourth decade of life. So, what’s

  • the risk state in immune states predisposed you to disease? That’s a hard question.

  • That’s a really hard question.

  • So, I told you before that you can see in multiple sclerosis a fairly enrichment in

  • NF-kappa B binding sites that are near associated SNPs, okay? So, there seems to be something

  • about NF-kappa B. And I also told you there’s an NF-kappa B one locus that harbors a lot

  • of -- a very strong association. So, when you look at MS patients versus controls, if

  • you look at CD4 cells, you find that in response to stimulus, in response to TNF-alpha, ex

  • vivo CD4 cells actually signal much more strongly through NF-kappa B. And this is measure of

  • phosphorylation of P65, which is one of the NF-kappa B subunits, okay? If you look at

  • how inducible CD4 cells, how easy it is to activate CD4 cells from MS patients versus

  • controls, you find that these are controls. The black circles, the filled ones are MS

  • patients. You find that in general the CD4 cells are easier to activate through NF-kappa

  • B. You can just hit them, and theyll go. Correlation’s not causation. This could

  • be an epiphenomenon of disease state.

  • And so, what we did is we took this NF-kappa B one and we stratified people by genotype

  • there. There’s no implication of causality for the SNP we used. It’s actually one of

  • these really haplotypes that identical. We just used it to stratify risk, non-risk. And

  • were looking at opposite homozygotes. And so, when you look at the three genotype classes

  • without stimulation, this is your baseline -- I’m sorry it’s chopped off again -- but

  • this is your baseline I-kappa B degradation. So, I-kappa B get degrade when NF-kappa B

  • signaling starts. And what you see is a baseline that’s 100 percent. And by genotype, you

  • find that there’s a different in how strong NF-kappa B -- I-kappa B degradation is, suggesting

  • that there’s different amounts of signaling going on in these cells.

  • If you do the obverse and look at the phosphorylation of the P65 subunit, again you see the same

  • sort of thing, that this GG which is the risk state over-phosphorylates compared to the

  • other genotypes suggesting that there’s more signaling through NF-kappa B happening

  • for unit activation. That’s kind of interesting, but actually if you look at the expression

  • by western -- so, this is protein expression -- if you try and quantitate how much P50,

  • which is an NF-kappa B subunit youre seeing, you see like a 20 fold increase in how much

  • P50 exists. Just a baseline in these cells. What’s really interesting is that after

  • activation, if you measure in nuclear localization of phosphorylated NF-kappa B, you see that

  • there’s about a threefold change between, with the GG risk homozygote putting a lot

  • of phosphorylated NF-kappa B into the nucleus following stimulus in CD4 cells, compared

  • to the A.

  • And so, what it looks like, for a given dose of stimulus, if you have the risk genotype,

  • you signal a lot more, which probably does two things. It probably decreases the activation

  • threshold to kick these cells over into an activated state. And it may also smear the

  • phenotype that you see, because there’s so much transcription factor going into the

  • nucleus that it’s activating everything, right? And well talk about that in a second.

  • This is not quite as simple as just as a single effect that the NF-kappa B one locus. If you

  • look at the TNF receptors, there are two subunits. There’s a variant in the first subunit,

  • TNFR 1, now called TNFSF1A. There’s a coding variant where, if you hit cells with TNF-alpha,

  • you get different amount of signaling through the TNF receptor which leads to different

  • amount of phosphorylation of NF-kappa B. Again, youre getting different amount of signaling.

  • I won’t belabor this. It also turns out there’s a whole bunch of other genes in

  • the MS risk loci that are directly related to the NF-kappa B signaling.

  • And so, I suspect one of the things that’s happening here is youre getting this global

  • effect on NF-kappa B signaling. It’s not that simple as just a linear effect. But there

  • are multiple things that feed into NF-kappa B signaling at least in CD4 cells maybe in

  • multiple other subunits that are kind of really setting the rheostat of how the immune system

  • responds. And maybe that’s how -- partly how risk is determined. And so -- oh great,

  • these are chopped off as well. So, here’s the model, right? Sort of with external stimulus,

  • you get phosphorylation of NF-kappa B. NF-kappa B translocates to the nucleus and it does

  • what it does best. It activates a bunch of its targets. It activates a transcription

  • and that leads to activation, proliferation, and survival of these genes.

  • Here’s what happened when you change this. If you increase phosphorylation, youre

  • going to get more NF-kappa B going into the nucleus. That’s probably going to activate

  • its targets more easily. There’s probably spillover, right? NF-kappa B only activates

  • a subset of its targets in any one given cell, or cell type. It’s got a bunch of other

  • targets, which it doesn’t activate because the cofactors aren’t there, right? Transcriptional

  • activation is a multi-cofactor process. If youve got enough excess, even though the

  • kinetics of these promoters are bad, there’s going to be shuttling on those promoters,

  • and youre going to get leaky transcription. And so, this I believe. This is an assertion

  • at this stage, right? But I think youre going to get context and appropriate gene

  • activation as a result of just putting a lot more of NF-kappa B into the nucleus. I showed

  • you before, or right at the beginning, that there’s also risk variation that localize

  • close to NF-kappa B binding sites in the genome. So promoters where those variants exist, youre

  • going to get differential activation of those promoters in a way that’s probably unrelated

  • to the total amount of NF-kappa B. But that’s an additional modulation.

  • And so, here’s what you can do. You can take a bunch of cells, take people who are

  • risk variant homozygotes, and people who are non-risk homozygotes. So, people who will

  • have different amounts of NF-kappa B in the nucleus. Hit them with TNF. In 15 minutes,

  • you get signaling, so you measure how much phosphorylation youre getting. In 30 minutes,

  • you get translocation, so you can actually ChIP-seq, and see where NF-kappa B is going

  • into the nucleus. Within two hours, you get gene activation in CD4 cells, so you can do

  • enhancer mapping on RNA-seq, and see what’s changing in the regulation between these two

  • groups. And within three days, you can get cell phenotype by producing the full activation

  • stimulus and you can measure that by flow and actually see what these cells are doing.

  • So, these are the level of experiments that you really need to do. And this is what were

  • doing right now to actually see what the differential risk states are in various T cells.

  • And I’m way overtime, so I’m going to stop. And I will just acknowledge a bunch

  • of my colleague at the IMSGC, the International Genetics Consortium, a lot of partners including

  • Brad and John, where we do a lot of these genomics things and people in my lab. Most

  • of the causal mapping is from Parisa, a post-doc in my lab. All of the immunology is from Will

  • Housley who is a fellow with David Hafler and with me. And I will stop and take a couple

  • of question if I’m allowed. Thank you very much.

  • [applause]

  • Male Speaker: Great work. So quick question with regards

  • to -- maybe youve seen Peter Schetury [spelled phonetically] describe MeVs a couple of years.

  • So, multiple enhancer variants, where there’s multiple DHS or whatever --

  • Chris Cotsapas: Right, right.

  • Male Speaker: -- measurement. And there’s a wide range

  • there. You can have MeVs with only two DHS, with three, with four, with five. So, I’m

  • just wondering how you take that into your account in your pipeline. Are the genes that

  • you find at the top of your list biased toward risk loci -- or risk locus that have high

  • MeV nature as opposed to those that have a lower MEV nature? And the second thing, there’s

  • a large collection of risk locus that actually are singletons. They will only have like one

  • DHS sites in them. How are you taking those into account?

  • Chris Cotsapas: Right, so this is not about a single peak.

  • This is about whether the peak is consistent across cell types, right? If there is only

  • one peak in the entire collection of samples, and you don’t see it in the replicate, we

  • throw it out. It’s a singleton in one sample. If I have two CD4s, and I only see a peak

  • in one CD4 and never in any another cell type, I’m throwing that one out.

  • Male Speaker: Right. So, that’s going to be true for one

  • DHS site. But -- an MEV would have two of those or three of those.

  • Chris Cotsapas: Sure. So, what were not doing is a combinatorics

  • of the clusters yet. Were basically not thinking about MeVs. But the correlation should

  • still be there. It should be multiple of them correlating. So that gets naturally taken

  • in to the correlation towards the gene, because all three of those DHSs should be correlated.

  • Male Speaker: Right. They might not go to the same genes.

  • So, you might have one risk locus, multiple genes regulated differently by different subset

  • of their DHS.

  • Chris Cotsapas: Right. But if theyre regulating different

  • genes, then the risks that theyre imparting should only go to the genes that they regulate.

  • Male Speaker: Yes.

  • Chris Cotsapas: Does that make sense?

  • Male Speaker: Yes.

  • Chris Cotsapas: Because youre trying to figure out which

  • gene is being altered by whatever the risk effect is. And so, if DHS one is correlated

  • to gene three, I don’t care about transmitting its risk quotient to gene two. Because it’s

  • not -- there’s no evidence that it’s controlling it.

  • Male Speaker: Great.

  • [applause]

  • [end of transcript]

Chris Cotsapas: I’d like to thank the organizers for the

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it