Placeholder Image

Subtitles section Play video

  • The goal of this module is to cover the fundamentals of

  • marker aided selection in forest trees, concentrating on integration

  • of markers into basic quantitative genetic theory.

  • In previous modules we have discussed when, and or where, in the tree

  • breeding cycle markers may be employed to improve breeding efficiency.

  • That is, where marker-assisted selection

  • (MAS) would be beneficial.

  • In this module, the author introduces mathematical approaches to incorporation

  • of marker information in the selection process.

  • This is done by carefully defining the mixed linear

  • models that capture random genetic variables and fixed

  • environmental variables influencing the trait of interest.

  • The models are expressed and solved using

  • matrix algebra. Familiarity with matrix algebra

  • will significantly enhance your understanding of the module’s content.

  • This module is organized in four sections

  • and begins with a discussion of the best linear unbiased

  • prediction (BLUP) of breeding values using a

  • numerical relationship matrix, called the A matrix.

  • This matrix is usually derived from pedigrees of

  • individuals. The relationships are sometimes called additive

  • genetic covariances. The second section

  • focuses on incorporation of markers in models and the various approaches

  • to marker aided selection we have introduced in previous modules.

  • The third section addresses a key issue

  • related to the use of markers for selection, namely, how missing

  • genotypes are inferred or imputed so that breeding values

  • can be calculated. The final section is about the

  • realized genetic matrix for G-BLUP.

  • The genetic relationship matrix based on the markers is called

  • the G matrix, and BLUP analysis based on the G matrix is

  • called G-BLUP. We will start, however, with

  • definitions for a number of terms.

  • When an individual is crossed,

  • either as a female or a male, with a number of other parents,

  • we measure the performance of the progeny of those crosses

  • and estimate a mean value of that parent.

  • This is typically done for trees used as females with many pollen donors.

  • The deviation of the parent mean (X) from

  • the population mean (μ)

  • is called the general combining ability (GCA) of that tree.

  • We can use a linear model to define the GCA of a tree

  • as y sub i is equal to μ

  • plus GCA plus error.

  • GCA can be defined as the expected value of offspring from a given parent

  • after it has been crossed with many pollen parents.

  • The GCA of a given tree is

  • half of the parental additive genetic value

  • (½ a sub f).

  • The other half, unmeasured here, comes from the pollen parents.

  • The breeding value (BV)

  • is the value of genes transmitted to progeny.

  • We can define breeding value of a tree in a linear model as the average

  • parental breeding values of the male (a sub m) and the

  • female (a sub f), the fixed effects

  • (that is, the intercept), and the error (e sub i).

  • The genetic value is the value of genes to the parent tree itself.

  • It includes both the additive and non-additive

  • (dominance) effects. Dominance effects cannot

  • be passed on to progeny by breeding.

  • The difference between the genetic and the breeding value is, therefore, largely dominance

  • deviation (assuming epistatic effects are negligible).

  • We can use a linear model to define genetic values,

  • as the average of parental breeding values, the Mendelian sampling

  • (m sub i), and the error variance.

  • Mendelian sampling is the deviation of offspring from mid-parent breeding values.

  • In other words,

  • offspring from a cross deviate from the mid-parent breeding values because of

  • random sampling of parental genes caused by segregation and assortment.

  • They receive different sets of genes from parents,

  • which affects their phenotype (deviation from

  • mid-parental mean).

  • There are two approaches to calculating breeding values (BVs)

  • based on progeny test information that is available for a sample of parents.

  • We may fit a general combining ability model

  • (GCA, or parental model)

  • to obtain the parental GCA value and

  • multiply this estimate by two to get breeding values.

  • These models are easy to run in programs like SAS Statistical Software

  • because there are relatively few mixed model equations

  • to be solved. If we need to estimate breeding values of genetic groups, parents,

  • and progeny simultaneously,

  • individual-tree models are preferred. These models

  • are calledanimal modelsin the animal breeding literature.

  • In the individual-tree models, trees are no longer independent.

  • Progeny from the same female are correlated

  • (that is, they are genetically related).

  • The individual-tree models rely on both additive genetic variance

  • and a matrix of information that

  • describes the relationship of every tree to every other tree

  • (both parents and progeny) in the data set.

  • This is a key feature of the BLUP approach to estimating breeding values.

  • This approach, as noted in module 5, is desired in programs

  • that have advanced beyond one or two generations of breeding.

  • So, let’s look at the details of the BLUP approach to see how general combining ability,

  • breeding value, and genetic value are calculated.

  • The traditional approach of using the BLUP procedure to predict

  • breeding values is based on individual tree phenotypes

  • and the genetic relationship matrix (A)

  • derived from the individual tree pedigrees. The A must be known,

  • and can be estimated if enough genetic markers exist.

  • Best linear unbiased prediction (BLUP)

  • is performed using matrix algebra. While this may

  • be foreign to many of you, bear with the presentation to see if you can extract

  • the essence of the analysis. The type

  • of model used will depend on the nature of the trait being measured. We use

  • linear mixed models to predict breeding values

  • of continuous response variables, such as growth,

  • and generalized linear mixed models for binary traits, such as disease

  • incidence. The statistical model chosen

  • is the foundation of progeny test analysis and must be defined with utmost care.

  • We will now take a few moments to describe

  • each basic element in a linear mixed model.

  • y is the n by 1 row vector

  • of phenotypic observations

  • (think of a vector like a column of data for trees).

  • “n by 1” represents the dimensions of the row vector, where n is the

  • number of trees (that is, the number of rows) and 1

  • is the number of columns. X is the

  • design matrix that relates elements of the fixed effect

  • vector b to the row vector y

  • b is the p by 1 row vector of fixed effects

  • (for example, the intercept, sites, and

  • blocks within sites), where p is the number of rows

  • and 1 is the number of columns.

  • These are non-genetic factors that contribute to the phenotype observed.

  • Z is the design matrix that relates

  • elements of the a and e vectors to y

  • a is the q by 1 row vector of random, that is, genetic effects

  • for family and family by site interaction for instance).

  • q is the number of levels

  • of random effects or number of trees in the data.

  • e is the n by 1 row vector of random residuals

  • with n by 1 dimensions, where n is the number

  • of observations (trees).

  • Although these terms likely seem abstract to you now,

  • hopefully they will become clearer after our example on the next slide.

  • To achieve a thorough understanding of linear mixed models in the

  • context of predicting breeding values, you will likely need to

  • review this slide and the next slide several times.

  • In this slide,

  • we adapt the linear mixed model described in the previous slide

  • to develop a model for predicting breeding values.

  • Suppose that we have measured height of five trees grown in two different locations

  • (L1 and L2).

  • We can assume that the trees come from a large population, which is a reasonable assumption,

  • and therefore, we will treat the trees as random.

  • The linear mixed model is shown in a standard statistical format

  • (MODEL 1) as (y sub ij

  • = l sub i + t sub j +

  • e sub ij) where y sub ij is the

  • j-th tree height measured in the i-th location;

  • li is the i-th location effect, tj is the j-th tree effect (breeding value)

  • and eij is the error term associated with the j-th tree in location i.

  • The same model is shown in compact matrix notation as MODEL 2.

  • In reality we may have many more fixed and random terms and writing linear models in statistical format makes them much longer.

  • The other advantage of matrix format is that it is easier to talk about the assumptions of the model

  • and it is easier to describe different variance-

  • covariance structures of the model for the matrix format.

  • The full matrix format of the same model is given in MODEL 3.

  • This is like taking the x-ray pictures of matrices (X and Y)

  • and vectors (y, b, a, and e).

  • We will now describe each element of the mixed model

  • as it applies to the predicting breeding values.

  • y is a 5 by 1 row vector of height observations

  • where 5 is the number of trees.

  • This is the number of rows of the vector.

  • X is the design matrix that relates the

  • fixed location effects (b) to the height observations (y).

  • For example, the first column of X is for location 1

  • and the second column is for location 2.