Placeholder Image

Subtitles section Play video

  • The following content is provided

  • under a Creative Commons license.

  • Your support will help MIT OpenCourseWare continue

  • to offer high quality educational resources for free.

  • To make a donation or view additional materials

  • from hundreds of MIT courses, visit MIT OpenCourseWare

  • at ocw.mit.edu.

  • PROFESSOR: So as you recall last time

  • we talked about chromatin structure and chromatin

  • regulation.

  • And now we're going to move on to genetic analysis.

  • But before we did that, I want us to touch on two points

  • that we talked about briefly last time.

  • One was 5C analysis.

  • Who was it that brought up-- who was the 5C expert here?

  • Anybody?

  • No?

  • Nobody wants to own 5C.

  • OK.

  • But as you recall, we talked about ChIA-PET

  • as one way of analyzing any to any interactions in the way

  • that the genome folds up and enhancers talk to promoters.

  • And 5C is a very similar technique.

  • I just wanted to show you the flow

  • chart for how the protocol goes.

  • There is a cross linking.

  • A digestion with a restriction enzyme

  • step, followed by a proximity ligation step,

  • which gives you molecules that had been brought together

  • by an enhancer, promoter complex, or any other kind

  • of distal protein-protein interaction.

  • And then, what happens is that you design specific timers

  • to detect those ligation events.

  • And you sequence the result of what

  • is known as ligation mediated amplification.

  • So those primers are only going to ligate

  • if they're brought together at a particular junction, which

  • is defined by the restriction sites lining up.

  • So, 5C is a method of looking at which regions of the genome

  • interact and can produce these sorts of results,

  • showing which parts of the genome

  • interact with one another.

  • The key difference, I think, between chIA-PET and 5C

  • is that you actually have to have these primers designed

  • and pick the particular locations you want to query.

  • So the primers that you design represent query locations

  • and you can then either apply the results to a microarray,

  • or to high throughput sequencing to detect these interactions.

  • But the essential idea is the same.

  • Where you do proximity based ligation

  • to form molecules that contain components

  • of two different pieces of the genome

  • that have been brought together for some functional reason.

  • The next thing I want to touch upon

  • was this idea of the CpG dinucleotides

  • that are connected by a phosphate bond.

  • And you recall that I talked about the idea

  • that they were symmetric.

  • So you could have methyl groups on the cytosines in such a way

  • that, because they could mirror one another,

  • they could be transferred from one strand of DNA

  • to the other strand of DNA, during cell replication

  • by DNA methyltransferase.

  • So it forms a more stable kind of mark and as you recall,

  • DNA methylation where something occurred

  • in lowly expressed genes and typically in regions

  • of the genome that are methylated.

  • Other histone marks are not present

  • and the genes are turned off.

  • OK.

  • So those were the points I wanted

  • to touch upon from last lecture.

  • Now we're going to embark upon an adventure,

  • looking for the answer to, wear is missing heritability found?

  • So it's a big open question now in genetics.

  • In human genetics, which is that we really

  • can't find all the heritability.

  • And as a point of introduction, the narrative

  • arc for today's lecture is that, generally speaking,

  • you're more like your relatives than random people

  • on the planet.

  • And why is this?

  • Well obviously you contain components of your mom

  • and dad's genomes.

  • And they are providing you with components of your traits.

  • And the heritability of a trait is

  • defined by the fraction of phenotypic variance

  • that can be explained by genetics.

  • And we're going to talk today about computational models that

  • can predict phenotype from genotype.

  • And this is very important, obviously,

  • for understanding the sources of various traits and phenotypes.

  • As well as fields such as pharmacogenomics

  • that try and predict the best therapy for a disease

  • based upon your genetic makeup.

  • So, individual loci in the genome

  • that contribute to quantitative traits

  • are called quantitative trait locis, or QTLs.

  • So we're going to talked about how to discover them

  • and how to build models of quantitative traits using QTLs.

  • And finally, as I said at the outset,

  • our models are insufficient today.

  • They really can't find all of the heritability.

  • So we're going to go searching for this missing heritability

  • and see where it might be found.

  • Computationally, we're going to apply a variety of techniques

  • to these problems.

  • A preview is, we're going to build

  • linear models of phenotype and we're

  • going to use stepwise regression to learn these models using

  • a forward feature selection.

  • And I'll talk about what that is when

  • we get to that point of the lecture.

  • We're going to derive test statistics for discovering

  • which QTLs are significant and which QTLs are not,

  • to include in our model.

  • And finally, we're going to talk about how

  • to measure narrow sense heritability and broad sense

  • heritability in environmental variance.

  • OK.

  • So, one great resource for traits that are fairly simple.

  • That primarily are the result of a single gene mutation,

  • or where a single gene mutation plays a dominant role,

  • is something called Online Mendelian Inheritance in Man.

  • And it's a resource.

  • It has about 21,000 genes in it right now.

  • And it's a great way to explore what human genes function

  • is in various diseases.

  • And you could query by disease.

  • You can query by gene.

  • And it is a very carefully annotated and maintained

  • collection that is worthy of study,

  • if you're interested in particular disease genes.

  • We're going to be looking at more complex analyses today.

  • The analyses we're going to look at

  • are where there are many genes that

  • influence a particular trait.

  • And we would like to come up with general methods

  • for discovering how we can de novo from experimental data--

  • discover all the different genes that participate.

  • Now just as a quick review of statistics,

  • I think that we've talked before about means in class

  • and variances.

  • We're also going to talk a little bit

  • about covariances today.

  • But these are terms that you should

  • be familiar with as we're looking today

  • at some of our metrics for understanding heritability.

  • Are there any question about any of the statistical metrics that

  • are up here?

  • OK.

  • So, a broad overview of genotype to phenotype.

  • So, we're primarily going to be working

  • with complete genome sequences today,

  • which will reveal all of the variance that

  • are present in the genome.

  • And it's also the case that you can subsample a genome

  • and only observe certain variance.

  • Typically that's done with microarrays

  • that have probes that are specific to particular markers.

  • The way those arrays are manufactured

  • is that whole genome sequencing is done at the outset, and then

  • high prevalence variance, at least

  • common variance, which typically are

  • at a frequency of at least 5% in the population

  • are queried by using a microarray.

  • But today we'll talk about complete genome sequence.

  • An individual's phenotype, we'll say

  • is defined by one or more traits.

  • And a non-quantitative trait is something perhaps as simple as

  • whether or not something is dead or alive.

  • Or whether or not it can survive in a particular condition.

  • Or its ability to produce a particular substance.

  • A quantitative trait, on the other hand,

  • is a continuous variable.

  • Height, for example, of an individual

  • is a quantitative trait.

  • As is growth rate, expression of a particular gene,

  • and so forth.

  • So we'll be focusing today on estimating quantitative traits.

  • And as I said, a quantitative trait or loci,

  • is a marker that's associated with a quantitative trait

  • and could be used to predict it.

  • And you can sometimes hear about eQTLs,

  • which are expression quantitative trait loci.

  • And they're loci that are related to gene expression.

  • So, let's begin then, with a very simple genetic model.

  • It's going to be haploid, which means, of course,

  • there's only one copy of each chromosome.

  • Yeast is the model organism we're

  • going to be talking about today.

  • It's a haploid organism.

  • And we have mom and dad up there.

  • Mom on the left, dad on the right in two different colors.

  • And you can see that mom and dad in this particular example,

  • have n different genes.

  • They're going to contribute to the F1 generation, to junior.

  • And the relative color is white for mom, black for dad,

  • are going to be used to describe the alleles,

  • or the allelic variance that are inherited

  • by the child, the F1 generation.

  • And as I said, a specific phenotype

  • might be alive or dead in a specific environment.

  • And note that I have drawn the chromosomes to be disconnected.

  • Which means that each one of those genes

  • is going to be independently inherited.

  • So the probability in the F1 generation

  • that you're going to get one of those from mom or dad

  • is going to be a coin flip.

  • We're going to assume that they're

  • far enough away that the probability of crossing over

  • during meiosis is 0.5.

  • And so we get a random assortment

  • of alleles from mom and dad.

  • OK?

  • So let us say that you go off and do an experiment.

  • And you have 32 individuals that you produce out of a cross.

  • And you test them, OK.

  • And two of them are resistant to a particular substance.

  • How many genes do you think are involved in that resistance?

  • Let's assume that mom is resistant and dad is not.

  • OK.

  • If you had two that were resistant out of 32,

  • how many different genes do you think were involved?

  • How do you estimate that?

  • Any ideas?

  • Yes?

  • AUDIENCE: If you had 32 individuals

  • and say half of them got it?

  • PROFESSOR: Two, let's say.

  • One out of 16 is resistant.

  • And mom is resistant.

  • AUDIENCE: Because I was thinking that if it was half of them

  • were resistant, then you would maybe guess one gene,

  • or something like that.

  • PROFESSOR: Very good.

  • AUDIENCE: So then if only eight were

  • resistant you might guess two genes, or something like that?

  • PROFESSOR: Yeah.

  • What you say is, that if mom's resistant, then

  • we're going to assume that you need

  • to get the right number of genes from mom to be resistant.

  • Right?

  • And so, let's say that you had to get four genes from mom.

  • What's the chance of getting four genes from mom?

  • AUDIENCE: Half to the power of four.

  • PROFESSOR: Yeah, which is one out of 16, right?

  • So, if you, for example had two that were resistant out of 32,

  • the chances are one in 16.

  • Right?

  • So you would naively think, and properly so,

  • that you had to give four genes from mom to be resistant.

  • So the way to think about these sorts

  • of non-quantitative traits is that you

  • can estimate the number of genes involved.

  • The simply is log base 2 over the number

  • of F1s tested over the number of the F1s with the phenotype.

  • It tells you roughly how many genes

  • are involved in providing a particular trait,

  • assuming that the genes are unlinked.

  • It's a coin flip, whether you get them or not.

  • Does everybody see that?

  • Yes?

  • Any questions at all about that?

  • About the details?

  • OK.

  • Let's talk now about quantitative traits then.

  • We'll go back to our model and imagine

  • that we have the same set-- actually

  • it's going to a different set of n genes.

  • We're going to have a coin flip as to

  • whether or not you're getting a mom gene or a dad gene.

  • OK.

  • And each gene in dad has an effect size of 1 over n.

  • Yes?

  • AUDIENCE: I just wanted to check.

  • We're assuming that the parents are homozygous for the trait?

  • Is that correct?

  • PROFESSOR: Remember these are haploid.

  • AUDIENCE: Oh, these are haploid.

  • PROFESSOR: Right.

  • So they only have one copy of all these genes.

  • All right.

  • Yes?

  • AUDIENCE: [INAUDIBLE] resistant and they're [INAUDIBLE].

  • That could still mean that dad has

  • three of the four genes in principle.

  • PROFESSOR: The previous slide?

  • Is that where what you're talking about?

  • AUDIENCE: [INAUDIBLE] knew about it.

  • So really what you mean is that dad does not

  • have any of the genes that are involved with resistance.

  • PROFESSOR: The correct.

  • I was saying that dad has to have all of gene--

  • that the child has to have all of the genes that

  • are operative to create resistance.

  • We're going to assume an AND model.

  • He must have all the genes from mom.

  • They're involved in the resistance pathway.

  • And since only one out of a 16 progeny

  • has all those genes from mom, right, it

  • appears that given the chance of inheriting something from mom

  • is 1/2, that it's four genes you have to inherit from mom.

  • Because the chance of inheriting all four is one out of 16.

  • AUDIENCE: [INAUDIBLE] in which case--

  • PROFESSOR: No, I'm assuming the dad doesn't have any of those.

  • But here we're asking, what is the difference

  • in the number of genes between mom and dad?

  • So you're right, that the number we're computing

  • is the relative number of genes different between mom and dad

  • you require.

  • And so it might be that dad's a reference

  • and we're asking how many additional genes mom brought

  • to the table to provide with that resistance.

  • But that's a good point.

  • OK.

  • OK.

  • So, now let's look at this quantitative model.

  • Let's assume that mom has a bunch of genes that contribute

  • zero to an effect size and dad-- each gene

  • that dad has produces an effect of 1 over n.

  • So the total effect size here for dad is 1.

  • So the effect of mom on this particular quantitative trait

  • might be zero.

  • It might be the amount of ethanol produced

  • or some other quantitative value.

  • And dad, on the other hand, since he has n genes,

  • is going to produce one, because each gene contributes

  • a little bit to this quantitative phenotype.

  • Is everybody clear on that?

  • So, the child is going to inherit genes

  • to our coin flip between mom and dad, right.

  • So the first fundamental question

  • is, how many different levels are there

  • in our quantitative phenotype in our trait?

  • How many different levels can you have?

  • AUDIENCE: N + 1?

  • PROFESSOR: N + 1, right, because you can either inherit

  • zero, or up to n genes from dad.

  • And it gets you n plus 1 different levels.

  • OK.

  • So, what's the probability then-- well,

  • I'll ask a different question.

  • What's the expected value of the quantitative phenotype

  • of a child?

  • Just looking at this.

  • If dad's one and mom's zero, and you have a collection of genes

  • and you do a coin flip each time,

  • you're going to get half your genes from mom

  • and half your genes from dad.

  • Right.

  • And so the expected trait value is 0.5.

  • So for these added traits, you're

  • going be at the midpoint between mom and dad.

  • Right.

  • And what is the probability that you

  • inherit x copies of dad's genes?

  • Well, that's n choose x, times 1 minus .5 n to the minus

  • x times 0.5 to the x.

  • A simple binomial.

  • Right.

  • So if you look at this, the probability

  • of the distribution for the children

  • is going to look something like this,

  • where this is the mean, 0.5.

  • And the number of distinct values is going to be n plus 1.

  • Right.

  • So the expected value of x is 0.5 and turns out

  • that the expected value, or the variance of x minus 0.5, which

  • is the mean squared, is going to be 0.25 over n.

  • So I can show you this on the next slide.

  • So you can see, this could be ethanol production,

  • it could be growth rate, what have you.

  • And you can see that the number of genes that you're

  • going to get from dad follows this binomial distribution

  • and gives you a spread of different phenotypes

  • in the child's generation, depending

  • upon how many copies of dad's genes that you inherit.

  • But does this make sense to everybody?

  • Now would be a great time to ask any questions

  • about the details of this.

  • Yes?

  • AUDIENCE: Can you clarify what x is?

  • Is x the fraction of genes inherited--

  • PROFESSOR: The number of genes you inherit from dad.

  • The number of genes.

  • So it would zero, one, two, up to n.

  • AUDIENCE: Shouldn't the expectation of n [INAUDIBLE]

  • x be n/2?

  • PROFESSOR: I'm sorry.

  • It is supposed to be n/2.

  • But the last two expectations are

  • some of the number of genes you've inherited from dad.

  • Right, that's correct.

  • Yeah, this slide's wrong.

  • Any other questions?

  • OK.

  • So this is a very simple model but it tells us

  • a couple of things, right.

  • Which is that as n gets to be very large,

  • the effect of each gene gets to be quite small.

  • So something could be completely heritable,

  • but if it's spread over, say 1,000 genes,

  • then it will be very difficult to detect,

  • because the effect of each gene would be quite small.

  • And furthermore, the variance that you see in the offspring

  • will be quite small as well, right,

  • in terms of the phenotype.

  • Because it's going to be 0.25/n in terms of the expected value.

  • So as n gets larger, the number genes that

  • contribute to that phenotype increase,

  • the variance is going to go down linearly.

  • OK.

  • So we should just keep this in mind

  • as we're looking at discovering these sort of traits

  • and the underlying QTLs that can be used to predict them.

  • And finally, I'd like to point out one other detail which

  • is that, if genes are linked, that is,

  • if they're in close proximity to one another in the genome

  • and it makes it very unlikely there's

  • going to be crossing over between them,

  • then they're going to act as a unit.

  • And if they act as a unit, then we'll get marker correlation.

  • And you can also see, effectively,

  • that the effect size of those two genes

  • is going to be larger.

  • And in more complicated models, we obviously

  • wouldn't have the same effect size for each gene.

  • The effect size might be quite large for some genes,

  • might be quite small for some genes.

  • And we'll see the effects of marker correlation

  • in a little bit.

  • So the way we're going to model this is we're going to-- this

  • is a definition of the variables that we're

  • going to be talking about today.

  • And the essential idea is quite simple.

  • So the phenotype of an individual-- so p sub

  • i is the phenotype of an individual,

  • is going to be equal to some function of their genotype

  • plus an environmental component.

  • This function is the critical thing that we want to discover.

  • This function, f, is mapping from the genotype

  • of an individual to its phenotype.

  • And the environmental component could

  • be how well something is fed, how much sunlight it gets,

  • things that can greatly influence things like growth

  • but they're not described by genetics.

  • But this function is going to encapsulate

  • what we know about how the genetics

  • of a particular individual influences a trait.

  • And thus, if we consider a population of individuals,

  • the phenotypic variance is going to be

  • equal to the genotypic variance plus the environmental variance

  • plus two times the covariance between the genotype

  • in the environment.

  • And we're going to assume, as most studies do,

  • that there is no correlation between genotype

  • and environment.

  • So this term disappears.

  • So what we're left with is that the observed phenotypic

  • variance is equal to the genotypic variance

  • plus the environmental variance.

  • And what we would like to do is to come up with a function

  • f, that best predicts the genotypic component

  • of this equation.

  • There's nothing we can do about environmental variance.

  • Right.

  • But we can measure it.

  • Does anybody have any ideas how we

  • could measure environmental variance?

  • Yes?

  • AUDIENCE: Study populations in which

  • there's some kind of controlled environment.

  • So you study populations that one population

  • is one with a homogeneous.

  • And another one was a completely different one.

  • PROFESSOR: Right.

  • So what we could do is we could use controls.

  • So typically what we could do is we could study in environments

  • where we try and control the environment exactly

  • to eliminate this as much as we possibly can, for example.

  • As we'll see that we also can do things

  • like study clones, where individuals

  • have exactly the same genotype.

  • And then, all of the variance that we observe--

  • if this term vanishes because the genotypes are identical,

  • it is due to the environment.

  • So typically, if you're doing things

  • like studying humans, since cloning humans isn't really

  • a good idea to actually measure environmental variance,

  • right, what you could do is you can look at identical twins.

  • And identical twins give you a way

  • to get at the question of how much environment variance

  • there is for a particular phenotype.

  • So in sum, this is replicates what I have here

  • on the left-hand side of the board.

  • And note that today we'll be talking

  • about the idea of discovering this function,

  • f, and how well we can discover f,

  • which is really important, right.

  • It's fundamental to be able to predict phenotype

  • from genotype.

  • It's an extraordinarily central question in genetics.

  • And when we do the prediction, there are two kinds of-- oh,

  • there's a question?

  • AUDIENCE: Could you please explain again

  • why the co-variance drops out or it goes away.

  • PROFESSOR: Yeah, the co-variance drops out

  • because we're going to assume that genotype

  • and environment are independent.

  • Now if they're not independent, it won't drop out.

  • But making that assumption-- and of course, for human studies

  • you can't really make that assumption completely, right?

  • And one of the problems in doing these sorts of studies

  • is that it's very, very easy to get confounded.

  • Because when you're trying to decompose

  • the observed variance and height, for example.

  • You know, there's what mom and dad provided to an individual

  • in terms of their height, and there's also

  • how much junior ate, right.

  • And whether he went to McDonald's a lot, or you know,

  • was going to Whole Foods a lot.

  • You know, who knows, right?

  • But this component and this component,

  • it's easy to get confounded between them

  • and sometimes you can imagine that genotype

  • is related to place of origin in the world.

  • And that has a lot to do with environment.

  • And so this term wouldn't necessarily disappear.

  • OK.

  • So there are two kinds of heritability

  • I'd like to touch upon today.

  • And it's important that you remember there are two kinds

  • and one is extraordinarily difficult to recover

  • and the other one is in some sense, a more constrained

  • problem, because we're much better at building models

  • for that kind of heritability estimate.

  • The first is broad-sense heritability,

  • which describes the upper bound for phenotypic prediction given

  • an arbitrary model.

  • So it's the total contribution to phenotypic variance

  • from genetic causes.

  • And we can estimate that, right.

  • And we'll see how we can estimate it in a moment.

  • And narrow-sense heritability is defined as,

  • how much of the heritability can we describe

  • when we restrict f to be a linear model.

  • So when f is simply linear, as the sum of terms,

  • that describes the maximum narrow-sense heritability we

  • can recover in terms of the fraction of phenotypic

  • variance we can capture in f.

  • And it's very useful because it turns out

  • that we can compute both broad-sense and narrow-sense

  • heritability from first principles-- I

  • mean from experiment.

  • And the difference between them is part of our quest today.

  • Our quest is, to answer the question,

  • where is the missing heritability?

  • Why can't we build an Oracle f that perfectly

  • predicts phenotype from genotype?

  • So on that line-- I just want to give you some caveats.

  • One is that we're always talking about populations when we're

  • talking about heritability because it's

  • how we're going to estimate it.

  • And when you hear people talk about heritability,

  • oftentimes they won't qualify it in terms

  • of whether it's broad-sense or narrow-sense.

  • And so you should ask them if you're

  • engaged in a scientific discussion with them.

  • And as we've already discussed, sometimes estimation

  • is difficult because of matching environment and eliminating

  • this term, the environmental term

  • can be a challenge when you're out of the laboratory.

  • Like when you're dealing with humans.

  • So, let's talk about broad-sense heritability.

  • Imagine that we measure environmental variants simply

  • by looking at environmental twins or clones, right.

  • Because if we, for example, take a bunch of yeast

  • that are genotypically identical.

  • And we grow them up separately, and we

  • measure a trait like how well they respond

  • to a particular chemical or their growth rate,

  • then the variance we see from each individual to individual

  • is simply environmental, because they're genetically identical.

  • So

  • we can, in that particular case, exactly

  • quantify the environmental variance

  • given that every individual is genetically identical.

  • We simply measure all the growth rates

  • and we compute the variance.

  • And that's the environmental variance.

  • OK?

  • As I said for humans, the best we can do is identical twins.

  • Monozygotic twins.

  • You can go out and for pairs of twins that are identical,

  • you can measure height or any other trait that you like

  • and compute the variance.

  • And then that is an estimate of the environmental component

  • of that, because they should be genetically identical.

  • And big H squared-- broad-sense is always

  • capital H squared and narrow-sense is always

  • little h squared.

  • Big H squared, which is broad-sense

  • heritability is very simple then.

  • It's the phenotypic variance, minus the environmental

  • variance, over the phenotypic variance.

  • So it's the fraction of phenotypic experience

  • that can be explained from genetic causes.

  • Is that clear to everybody?

  • Any questions at all about this?

  • OK.

  • So, for example, on the right-hand hand side

  • here, those three purplish squares

  • have three different populations,

  • which are genotypically identical.

  • They have two genes, a little a, a little a, big A, a little A,

  • and big A, big A. And each one is a variance of 1.0.

  • out So since there are genetically identical,

  • we know that the environmental variance has to be 1.0.

  • On the left-hand side, you see the genotypic variance.

  • And that reminds us of where we started today.

  • It depends on the number of alleles you get of big A,

  • as to what the value is.

  • And when you put all of that together,

  • you get a total variance of 3.

  • And so big H squared is simply the genotypic variance,

  • which is 2, over the total phenotypic variance, which

  • is 3.

  • So big H squared is 2/3.

  • And so that is a way of computing

  • broad-sense heritability.

  • Now, if we think about our models,

  • we can see that narrow-sense heritability

  • has some very nice properties.

  • Right.

  • That is, if we build and add a model of phenotype,

  • to get at narrow-sense heritability.

  • So if we were to constraint f here to be linear,

  • it's simply going to be a very simple linear model.

  • For each particular QTL that we discover,

  • we assign an effect size beta to it,

  • or a coefficient that describes its deviation

  • from the mean for that particular trait.

  • And we have an offset, beta zero.

  • So our simple linear model is going to take all the discovery

  • QTLs that we have-- take each QTL

  • and discover which allelic form it's in.

  • Typically it's considered either in zero or one form.

  • And then add a beta j, where j is the particular QTL

  • deviation from mean value.

  • Add them all together to compute the phenotype.

  • OK.

  • So, this is a very simple additive model

  • and a consequence of this model is

  • that if you think about an F1 or a child of two parents,

  • as we said earlier, a child is going to inherit roughly half

  • of the alleles from mom and half of the alleles from dad.

  • And so for additive models like this,

  • the expected value of the child's trait value

  • is going to be the midpoint of mom and dad.

  • And that can be derived directly from the equation

  • above, because you're getting half of the QTLs

  • from mom and half of the QTLs from dad.

  • So this was observed a long time ago, right,

  • because if you did studies and you looked at the deviation

  • from the midpoint of parents for human height.

  • You can see that the children fall pretty

  • close to mid-parent line, where the y-axis here

  • is the height in inches and that suggests

  • that much of human height can be modeled by a narrow-sense based

  • heritability model.

  • Now, once again, narrow-sense heritability

  • is the fraction of phenotypic variance explained

  • by an additive model.

  • And we've talked before about the model itself.

  • And little h squared is simply going

  • to be the amount of variance explained

  • by the additive model over the total phenotypic variance.

  • And the additive variance is shown on the right-hand side.

  • That equation boils down to, you take the phenotypic variance

  • and you subtract off the variance that's environmental

  • and that cannot be explained by the additive variance,

  • and what you're left with is the additive variance.

  • And once again, coming back to the question

  • of missing heritability, if we observe

  • that what we can estimate for little h squared

  • is below what we expect, that gap

  • has to be explained somehow.

  • Some typical values for theoretical h squared.

  • So this is not measured h squared

  • in terms of building a model and testing it like this.

  • But what we can do is we can theoretically

  • estimate what h squared should be,

  • by looking at the fraction of identity between individuals.

  • Morphological traits tend to have

  • higher h squared for the fitness traits.

  • So human height has a little h square of about 0.8.

  • And for those ranchers out there in the audience,

  • you'll be happy to know that cattle yearly weight has

  • heritability of about 0.35.

  • Now, things like life history which are fitness traits

  • are less heritable.

  • Which would suggest that looking at how long your parents lived

  • and trying to estimate how long you're going to live

  • is not as productive as looking at how tall you

  • are compared to your parents.

  • And there's a complete table that I've

  • included in the slides for you to look at,

  • but it's too small to read on the screen.

  • OK, so now we're going to turn to computational models

  • and how we can discover a model that figures out

  • where the QTLs are, and then assigns that function f to them

  • so we can predict phenotype from genotype.

  • And we're going to be taking our example from this paper

  • by Bloom, et al, which I posted on the Stellar site.

  • And it came out last year and it's

  • wonderful study in QTL analysis.

  • And the setup for this study is quite simple.

  • What they did was, is they took two different strains

  • of yeast, RM and BY, and they crossed them

  • and produced roughly 1,000 F1s.

  • And RM and BY are very similar.

  • They are about, I think it's about 35,000 snips

  • between them.

  • Only about 0.5% of their genomes are different.

  • So they're really close.

  • Just for point of reference, you know, the distance between me

  • and you is something like one base for every thousand?

  • Something like that.

  • And then they assayed all those F1s.

  • They genotyped them all.

  • So to genotype them, what you do is

  • you know what the parental genotypes are

  • because they sequence both parents.

  • The mom and dad, so to speak, at 50x coverage.

  • So they knew the genome sequence is completely

  • for both mom and dad.

  • And then for each one of the 1,000 F1s

  • they put them on a microarray and what

  • is shown on the very bottom left is

  • a result of genotype in an individual

  • where they can see each chromosome

  • and whether it came from mom or from dad.

  • And you can't see it here, but there

  • are 16 different chromosomes and the alternating purple and

  • yellow colors show whether that particular part of the genome

  • came from mom or from dad.

  • So they know for each individual, its source.

  • From the left or the right strain.

  • OK.

  • And they have a thousand different genetic makeups.

  • And then they asked, for each one of those individuals,

  • how well could they grow in 46 different conditions?

  • So they exposed them to different sugars,

  • to different unfavorable environments and so forth.

  • And they measured growth rate as shown on the right-hand side.

  • Or right in the middle, that little thing

  • that looks like a bunch of little dots of various sizes.

  • By measuring colony size, they could

  • measure how well the yeast were growing.

  • And so they had two different things, right.

  • They had the exact genotype of each individual,

  • and they also had how well it was

  • growing in a particular condition.

  • And so for each condition, they wanted

  • to associate the genotype of the individual

  • to how well it was growing.

  • To its phenotype.

  • Now, one fair question is, of these different conditions,

  • how many of them were really independent?

  • And so to analyze that, they looked

  • at the correlation between growth rates

  • across conditions to try and figure out whether or not

  • they actually had 46 different traits they were measuring.

  • So this is a correlation matrix that

  • is too small to read on the screen.

  • The colors are somewhat visible, where the blue colors

  • are perfect correlation and the red colors

  • are perfect anti-correlation.

  • And you can see that in certain areas of this grid,

  • things are more correlated, like what

  • sugars the yeast liked to eat.

  • But suffice to say, they had a large collection

  • of traits they wanted to estimate.

  • So, now we want to build a computational model.

  • So our next step is figuring out how

  • to find those places in the genome that

  • allows us to predict, how well, given a trait,

  • the yeast would grow.

  • The actual growth rate.

  • So the key idea is this-- you have genetic markers, which

  • are snips down the genome and you're

  • going to test a particular marker.

  • And if this is a particular trait,

  • one possibility is that-- let's say

  • that this marker could be either 0 or 1.

  • Without loss of generality, it could

  • be that here are all the individuals where

  • the marker is zero.

  • And here are all the markers where the marker is 1.

  • And really, fundamentally, whether an individual

  • has a 0 or a 1 marker, it doesn't really

  • change its growth rate very much.

  • OK?

  • It's more or less identical.

  • It's also possible that this is best

  • modeled by two different means for a given trait.

  • That when the marker is 1, you're growing-- actually

  • this is going to be the growth rate on the x-axis.

  • The y-axis is the density.

  • That you're growing much better when you have a 1

  • in that marker position than a zero.

  • And so we need to distinguish between these two cases

  • when the marker is predictive of growth rate

  • and when the marker is not predictive of growth rate.

  • And we've talked about lod likelihood tests before

  • and you can see one on the very top.

  • And you can see there's an additional degree of freedom

  • that we have in the top prediction versus the bottom

  • because we're using two different means that

  • are conditioned upon the genotypic value

  • at a particular marker.

  • So we have a lot of different markers indeed.

  • So we have-- let's see here, the exact number.

  • I think it's about 13,000 markers they had in this study.

  • No.

  • 11,623 different unique markers they found.

  • That they could discover, that weren't linked together.

  • We talked about linkage earlier on.

  • So you've got over 11,000 markers.

  • You're going to do a lod likelihood

  • test to compute this lod odds score.

  • Do we have to worry about multiple hypothesis correction

  • here?

  • Because you're testing over 11,000

  • markers to see whether or not they're

  • significant for one trait.

  • Right.

  • So one thing that we could do is imagine that what we did was

  • we scrambled the association between phenotypes

  • and individuals.

  • So we just randomized it and we did that a thousand times.

  • And each time we did it, we computed the distribution

  • of these lod scores.

  • Because we have broken the association between phenotype

  • and genotype, the lod scores which

  • we should be seeing if we did this randomization,

  • should correspond to essentially noise.

  • But we would see it random.

  • So it's a null distribution we can look at.

  • And so what we'll see is a distribution of lod scores.

  • This is the lod.

  • This is the probability from a null, a permutation test.

  • And since we actually have done the randomization

  • over all 11,000 markers, we can directly draw a line

  • and ask what are the chances that a lod score would

  • be greater than or equal to a particular value at random?

  • And we can pick an area inside this tail,

  • let's say 0.05, because that's what

  • the authors of this particular paper used

  • and ask what value of a lod score

  • would be very unlikely to have by chance?

  • It turns out in their first iteration, it was 2.63.

  • That a lod score over 2.63 had a 0.05 chance

  • or less of occurring in randomly permuted data.

  • And since a permuted data contained all of the markers,

  • we don't have to do any multiple hypothesis correction.

  • So you can directly compare the statistic

  • that you compute against a threshold

  • and accept any marker or QTL that has a lod score greater,

  • in this case then 2.63 and put it in your model.

  • And everything else you can reject.

  • And so you start by building a model out

  • of all of the markers that are significant

  • at this particular level.

  • You then assemble the model and you can now

  • predict phenotype from genotype.

  • But of course, you're going to make errors, right.

  • For each individual, there's going to be an error.

  • You're going to have a residual for each individual that

  • is going to be the phenotype minus the genotype

  • of the individual.

  • So this is the error that you're making.

  • So what these folks did was that you first

  • look at predicting the phenotype directly,

  • and you pick all the QTLs that are significant at that level.

  • And then you compute the residuals

  • and you try and predict the residuals.

  • And you try and find additional QTLs

  • that are significant after you have picked the original ones.

  • OK.

  • So why might this produce more QTLs then the original pass?

  • What do you think?

  • Why is it that trying to predict the residuals is

  • a good idea after you've tried to predict

  • the phenotype directly?

  • Any ideas about that?

  • Well, what this is telling us, is

  • that these QTLs we're going to predict now

  • were not significant enough in the original pass,

  • but when we're looking at what's left over, after we subtract

  • off the effect of all the other QTLs,

  • other things might pop up.

  • But in some sense, we're obscured by the original QTLs.

  • Once we subtract off their influence,

  • we can see things that we didn't see before.

  • And we start gathering up these additional QTLs

  • to predict the residual components.

  • And so they do this three times.

  • So they predict the original set of QTLs

  • and then they iterate three time on the residuals

  • to find and fit a linear model that predicts a given

  • trait from a collection of QTLs that they discover.

  • Yes?

  • AUDIENCE: Sorry.

  • I'm still confused.

  • The second round? [INAUDIBLE] done three additional times?

  • Is that right?

  • So the--

  • PROFESSOR: Yes.

  • AUDIENCE: Is it done on the remainder of QTL

  • or on the original list of every--

  • PROFESSOR: Each time you expand your model

  • to include all the QTLs you've discovered up to that point.

  • So initially, you discover a set of QTLs, call that set one.

  • You then compute a model using set one

  • and you discover the residuals.

  • AUDIENCE: [INAUDIBLE].

  • PROFESSOR: Correct.

  • Well, residual [INAUDIBLE] so you use

  • set one to build a model, a phenotype.

  • So set one is used here to compute this, right.

  • And so set one is used.

  • And then you compute what's left over

  • after you've discovered the first set of QTLs.

  • Now you say, we still have this left to go.

  • Let's discover some more QTLs.

  • And now you discover set two of QTLs.

  • OK.

  • And that set two then is used to build a model that has set one

  • and set two in it.

  • Right.

  • And that residual is used to discover

  • set three and so forth.

  • So each time you're expanding the set of QTLs

  • by what you've discovered in the residuals.

  • Sort of in the trash bin so to speak.

  • Yes?

  • AUDIENCE: Each time you're doing this randomization

  • to determine lod cutoff?

  • PROFESSOR: That's correct.

  • Each time you have to redo the randomization

  • and get to the lod cutoff.

  • AUDIENCE: But does that method actually

  • work the way you expect it on the second pass, given that you

  • have some false positives from the pass

  • that you've now subtracted from your data?

  • PROFESSOR: I'm not sure I understand the question.

  • AUDIENCE: So the second time you do this randomization,

  • and you again come up with a threshold,

  • you say, oh, above here there are 5% false positives.

  • PROFESSOR: Right.

  • AUDIENCE: But could it be that that estimate is actually

  • significantly wrong based the fact that you've subtracted off

  • false positives before you do that process?

  • PROFESSOR: I mean, in some sense, what's

  • your definition of a false positive?

  • Right.

  • I mean it gets down to that because we've

  • discovered there's an association between that QTL

  • and predicting phenotype.

  • And in this particular world it's useful for doing that.

  • So it's hard to call something a false positive in that sense,

  • right.

  • But you're right, you actually have

  • to reset your threshold every time

  • that you go through this iteration.

  • Good question.

  • Other questions?

  • OK.

  • So, let's see what happens when you do this.

  • What happens is that if you look down the genome,

  • you discover a collection.

  • For example, this is growth in E6 berbamine.

  • And you can see the significant locations

  • in the genome, the numbers 1 through 16 of the chromosomes

  • and the little red asterisks above the peaks

  • indicate that that was a significant lod score.

  • The y-axis is a lod score.

  • And you can see the locations in the genome

  • where we have found places that were associated with growth

  • rate in that particular chemical.

  • OK.

  • Now, why is it, do you think, that in many of those places

  • you see sort of a rise and fall that is somewhat gentle

  • as opposed to having an impulse function

  • right at that particular spot?

  • AUDIENCE: Nearby snips are linked?

  • PROFESSOR: Yeah, nearby snips are linked.

  • That as you come up to a place that is causal,

  • you get a lot of other things are linked to that.

  • And the closer you get, the higher the correlation is.

  • So that is for 1,000 segregants in the top.

  • And what was discovered for that particular trait,

  • was 15 different loci that explained

  • 78% of the phenotypic variance.

  • And in the bottom, the same procedure

  • was used, but was only used on 100 segregants.

  • And what you can see is that, in this particular case,

  • only two loci were discovered that explain

  • 21% of the variance.

  • So the bottom study was grossly under powered.

  • Remember we talked about the problem of finding

  • QTLs that had small effect sizes.

  • And if you don't have enough individuals

  • you're going to be under-powered and you can't actually

  • identify all of the QTLs.

  • So this is a comparison of this.

  • And of course, one of the things that you don't know

  • is the environmental variance that you're fighting against.

  • Because the number of individuals

  • you need, depends both on the number of potential loci

  • that you have.

  • The more loci you have, the more individuals you need to fight

  • against the multiple hypotheses problem,

  • which is taken care of by this permutation implicitly.

  • And the more QTLs that contribute

  • to a particular trait, the smaller they might be.

  • And there you need more individuals

  • to provide adequate power for your test.

  • And out of this model, however, if you

  • look at for all the different traits, the predictive insight

  • versus the observed phenotype, you

  • can see that the model does a reasonably good job.

  • So the interesting things that came out of the study

  • were that, first of all, it was possible to look

  • at the effect sizes of each QTL.

  • Now, the effect size in terms of fraction of variance explained

  • of a particular marker, is the square of its coefficient.

  • It's the beta squared.

  • So you can see here the histogram of effect sizes,

  • and you can see that most QTLs have very small effects

  • on phenotype where phenotype is scaled between 0 and 1

  • for this study.

  • So, most traits as described here

  • have between 5 and 29 different QTL loci in the genome.

  • They're used to describe them with a median of 12.

  • Now, the question the authors asked,

  • was if they looked at the theoretical h squared that they

  • computed for the F1s, how well did their model do?

  • And you can see that their model does very well.

  • That, in terms of looking at narrow sense heritability,

  • they can recover almost all of it, all the time.

  • However, the problem comes here.

  • Remember we talked about how to compute

  • broad-sense heritability by looking at clones

  • and computing environmental variance directly.

  • And so they were able to compute broad-sense heritability

  • and compare that the narrow-sense heritability

  • that they were able to actually achieve in the study.

  • And you can see there are substantial gaps.

  • So what could be making up those gaps?

  • Why is it that this additive model can't explain growth rate

  • in a particular condition?

  • So, the next thing that we're going to discover

  • are some of the sources of this so-called missing heritability.

  • But before I give you some of the stock answers

  • that people in the field give, since this is part of our quest

  • today to actually look into missing heritability,

  • I'll put it to you, my panel of experts.

  • What could be causing this heritability to go missing?

  • Why can't this additive model predict growth rate accurately,

  • given it knows the genotype exactly?

  • Yes.

  • AUDIENCE: [INAUDIBLE] that you wouldn't

  • detect from looking at the DNA sequence.

  • PROFESSOR: So epidemic factors-- are

  • you talking about protein factors or are you

  • talking about epigenetic effects?

  • AUDIENCE: More of the epigenetic marks.

  • PROFESSOR: Epigenetic marks, OK.

  • So it might be now, yeast doesn't have DNA methylation.

  • It does have chromatin modifications

  • in the form of histone marks.

  • So it might be that there's some histone marks that

  • are copied from generation to generation that are not

  • counted for in our model.

  • right?

  • OK, that's one possibility.

  • Great.

  • Yes.

  • AUDIENCE: There could be more complex effects

  • so two separate genes may come out, other than just adding.

  • One could turn the other off.

  • So it one's on, it could [INAUDIBLE].

  • PROFESSOR: Right.

  • So those are called epistatic effects,

  • or they're non-linear effects.

  • They're gene-gene interaction effects.

  • That's actually thought to be one

  • of the major issues in missing heritability.

  • What else could there be?

  • Yes.

  • AUDIENCE: [INAUDIBLE].

  • PROFESSOR: Right.

  • So you're saying that there could be inherent noise that

  • would cause there to be fluctuations in colony size

  • that are unrelated to the genotype.

  • And, in fact, that's a good point.

  • And that's something that we're going

  • to take care of with the environmental variance.

  • So we're going to measure how well individuals

  • grow with exactly the same genotype in a given condition.

  • And so that kind of fluctuation would

  • appear in that variance term.

  • And we're going to get rid of that.

  • But that's a good thought and I think it's important and not

  • appreciated that there can be random fluctuations

  • in that term.

  • Any other ideas?

  • So we have epistasis.

  • We have epigenetics.

  • We've got two E's so far.

  • Anything else?

  • How about if there are a lot of different loci

  • that are influencing a particular trait,

  • but the effect sizes are very small.

  • That we've captured, sort of the cream.

  • We've skimmed off the cream.

  • So we get 70% of the variance explained,

  • but the rest of the QTLs are small,

  • right, and we can't see them.

  • We can't see them because we don't have enough individuals.

  • We're underpowered, right.

  • We just-- more individuals more sequencing, right.

  • And that would be the only way to break through this

  • and be able to see these very small effects.

  • Because if the effects are small, in some sense,

  • we're hosed.

  • Right?

  • You just can't see them through the noise.

  • All those effects are going to show up down here

  • and we're going to reject them.

  • Anything else, people can think about?

  • Yes?

  • AUDIENCE: Could you content maybe the sum of some areas

  • that are-- sorry, the addition sum of those guys

  • that have low effects.

  • Or is that not detectable by any [INAUDIBLE]?

  • PROFESSOR: Well, that's certainly

  • what we're trying to do with residuals, right?

  • This multi-round round thing is that we

  • take all the things we can detect

  • that have an effect with a conservative cut off

  • and we get rid of them.

  • And then we say, oh, is there anything left?

  • You know, that's hiding, sort of behind that forest, right.

  • If we cut through the first line of trees,

  • can we get to another collection of informative QTLs?

  • Yeah.

  • AUDIENCE: I was wondering if this

  • could be an overestimate also.

  • Like, for example, if, when you throw out

  • the variance for environmental conditions,

  • the environmental conditions aren't as exact as we thought

  • they were between two yeast growing in the same set, setup.

  • PROFESSOR: Right.

  • AUDIENCE: Then maybe you would inappropriately

  • assign a variance to the environmental condition

  • whereas some that could be, in fact-- something

  • that wouldn't be explained by.

  • PROFESSOR: And probably the other way around.

  • The other way around would be that you thought

  • you had the conditions exactly duplicated, right.

  • But when you actually did something else,

  • they weren't exactly duplicated so you see bigger variance

  • in another experiment.

  • And it appears to be heritable in some sense.

  • But, in fact, it would just be that you misestimated

  • the environmental component.

  • So, there are a variety of things

  • that we can think about, right.

  • Incorrect heritability estimates.

  • We can think about rare variance.

  • Now in this particular study we're

  • looking at everything, right.

  • Nothing is hiding.

  • We've got 50x sequencing.

  • There are no variants hiding behind the bushes.

  • They are all there for us to look at.

  • Structural variants-- well in this particular case,

  • we know structural variants aren't present,

  • but as you know, many kinds of mammalian cells

  • exhibit structural variance and other kinds

  • of bizarre behaviors with their chromosomes.

  • Many common variants of low effect.

  • We just talked about that.

  • And epistasis was brought up.

  • And this does not include epigenetics,

  • I'll have to add that to the listen.

  • It's a good point.

  • OK.

  • And then we talked about this idea

  • that epistasis is the case where we have nonlinear effects.

  • So a very simple example of this is

  • when you have little a and big B, and big A and big B

  • together, they both had an effect.

  • But little a, little b, have no effect.

  • And big A and big B have no effect by themselves.

  • So you have a pairwise interaction

  • between these terms.

  • Right.

  • So this is sort of the exclusive OR of two terms

  • and that non-linear effect can never

  • be captured when you're looking at terms one at a time.

  • OK.

  • Because looking one at a time looks

  • like it has no effect whatsoever.

  • And these effects, of course, could be more than pairwise,

  • if you have a complicated network or pathway.

  • Now, what the authors did to examine this,

  • is they looked at pairwise effects.

  • So they considered all pairs of markers

  • and asked whether or not, taken two at a time now,

  • they could predict a difference in trait need.

  • But what's the problem with this?

  • How many markers did I say there were?

  • 13,000, something like that.

  • All pairs of markers is a lot of pairs of markers.

  • Right.

  • And what happens to your statistical power

  • when you get to that many markers?

  • You have a serious problem.

  • It goes right through the floor.

  • So you really are very under-powered to detect

  • these interactions.

  • The other thing they did was to try

  • to get things a little bit better as they said,

  • how about this.

  • If we know that a given QTL is always important for a trait

  • because we discovered it in our additive model.

  • Well consider its pairwise interaction

  • with all the other possible variants.

  • So instead of now 13,000 squared,

  • it's only going to be like 22 different QTLs for a given

  • trait times 13,000 to reduce the space of search.

  • Obviously I got this explanation not completely clear.

  • So let me try one more time.

  • OK.

  • The naive way to go at looking at pairwise interactions

  • is consider all pairs and ask whether or not

  • all pairs have an influence on a particular trait value.

  • Right.

  • We've got that much?

  • OK.

  • Now let's suppose we don't want to look at all pairs.

  • How could we pick one element of the pair

  • to be interesting, but smaller in number?

  • Right.

  • So what we'll do is, for a given trait,

  • we already know which QTLs are important for it

  • because we've built our model already.

  • So let's just say, for purpose of discussion,

  • there are 20 QTLs that are important for this trait.

  • We'll take each one of those 20 QTLs

  • and we'll examine whether or not it has a pairwise interaction

  • with all of the other variance.

  • And that will reduce our search base.

  • Is that better?

  • OK, good.

  • So, when they did that, they did find

  • some pairwise interactions.

  • In 24 of their 46 traits had pairwise interactions

  • and here is an example.

  • And you can see the dot plot, or the upper right-hand part

  • of this slide, how when you BYBY.

  • You have a lower phenotypic value then

  • when you have just any RM component

  • on the right-hand side.

  • So those were two different snips

  • on chromosome 7 and chromosome 11

  • and showing how they interact with one another

  • in a non-linear way.

  • If they were linear, then as you added either a chromosome at 7

  • or a chromosome 11 contribution it would go up a little bit.

  • Here, as soon as you add either contribution from RM,

  • it goes all way up to have a mean of zero or higher.

  • In this particular case, 71% of the gap between broad-sense

  • and narrow-sense was explained by this one pair interaction.

  • So it is the case that pairwise interactions

  • can explain some of the missing heritability.

  • Can anybody think of anything else

  • they can explain missing heritability?

  • OK.

  • What's inherited?

  • Let's make a list of everything that's

  • inherited from the parental line to the F1s.

  • OK.

  • Yes.

  • AUDIENCE: I mean, because there's

  • a lot more things inherited.

  • The protein levels are inherited.

  • PROFESSOR: OK.

  • AUDIENCE: [INAUDIBLE] are inherited as well.

  • PROFESSOR: Good.

  • I like this line of thinking.

  • AUDIENCE: [INAUDIBLE].

  • PROFESSOR: There are a lot of things

  • that are inherited, right?

  • So what's inherited?

  • Some proteins are probably inherited, right?

  • What is replicable through generation

  • to generation as a genetic material that's inherited?

  • Let's just talk about that for a moment.

  • Proteins are interesting, don't get me wrong.

  • I mean, prions and other things are very interesting.

  • But what else is inherited?

  • OK, yes?

  • AUDIENCE: [INAUDIBLE].

  • PROFESSOR: So there are other genetic molecules.

  • Let's just take a really simple one-- mitochondria.

  • OK.

  • Mitochondria are inherited.

  • And it turns out that these two strains

  • have can have different mitochondria.

  • What else can be inherited?

  • Well, we were doing these experiments with our colleagues

  • over at the Whitehead and for a long time

  • we couldn't figure out what was going on.

  • Because we would do the experiments on day one

  • and they come out a particular way and on day two

  • they come out a different way.

  • Right.

  • And we're doing some very controlled conditions.

  • Until we figured out that everybody

  • uses S288C which is the genetic nomenclature

  • for the lab trained yeast, right.

  • It's lab train because it's very well behaved.

  • It's a very nice yeast.

  • It grows very well.

  • It's been selected for that, right.

  • And people always do genetic studies by taking S288C,

  • which is the lab yeast, which has being completely sequenced

  • and so you want to use it because you can download

  • the genome with a wild strain.

  • And wild strains come from the wild, right.

  • And they come either off of people

  • who have yeast infections.

  • I mean, human beings, or they come off of grape vines

  • or God knows where, right.

  • But they are not well behaved.

  • And why are they not well behaved?

  • What makes these yeast particularly rude?

  • Well, the thing that makes them particularly rude

  • is that they have things like viruses in them.

  • Oh, no.

  • OK.

  • Because what happens is that when

  • you take a yeast that has a virus in it,

  • and you cross it with a lab yeast, right.

  • All of the kids got the virus.

  • Yuck.

  • OK.

  • And it turns out that the so-called killer virus in yeast

  • interacts with various chromosomal changes.

  • And so now you have interactions--

  • genetic interactions between a viral element

  • and the chromosome.

  • And so the phenotype you get out of particular deletions

  • in the yeast genome has to do with whether or not

  • it's infected with a particular virus.

  • It also has to do with which mitochondrial content it has.

  • And people didn't appreciate this

  • until recently because most of the past yeast studies for QTLs

  • were busy crossing lab strains with wild strains

  • and whether it was ethanol tolerance or growth and heat,

  • a lot of the strains came up with a gene

  • as a significant QTL, which was MKT1.

  • And people couldn't understand why MKT1 was so popular, right.

  • MKT1, maintenance of killer toxin one.

  • Yeah.

  • That's the viral thing that enables-- the chromosomal thing

  • that enables a viral competence.

  • So, it turns out that if you look

  • at this-- in this particular case,

  • we're looking at yeast that don't

  • have the virus in the bottom little photograph there.

  • You can see they're all sort of, you know,

  • they're growing similarly.

  • And the yeast with the same genotype above-- those

  • are all in tetrads.

  • Two out of the four are growing, the other two

  • are not, because the other two have a particular deletion.

  • And if you look at the model-- a deletion only model,

  • the deletion only, only looks at the chromosomal compliment

  • doesn't predict the variance very well.

  • And if you look at the deletion and whether or not

  • you have the virus, you do better.

  • But you do even better, if you allow

  • for there to be a nonlinear interaction

  • between the chromosomal modification

  • and whether or not you have a virus.

  • And then you recover almost all of missing heritability.

  • So I'll leave you with this thought, which

  • is that genetics is complicated and QTLs are great, but don't

  • forget that there are all sorts of genetic elements.

  • And on that note, next time we'll

  • talk about human genetics.

  • Have a great weekend until then.

  • We'll see you.

  • Take care.

The following content is provided

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it