Subtitles section Play video Print subtitles The goal of this module is to cover the fundamentals of marker aided selection in forest trees, concentrating on integration of markers into basic quantitative genetic theory. In previous modules we have discussed when, and or where, in the tree breeding cycle markers may be employed to improve breeding efficiency. That is, where marker-assisted selection (MAS) would be beneficial. In this module, the author introduces mathematical approaches to incorporation of marker information in the selection process. This is done by carefully defining the mixed linear models that capture random genetic variables and fixed environmental variables influencing the trait of interest. The models are expressed and solved using matrix algebra. Familiarity with matrix algebra will significantly enhance your understanding of the module’s content. This module is organized in four sections and begins with a discussion of the best linear unbiased prediction (BLUP) of breeding values using a numerical relationship matrix, called the A matrix. This matrix is usually derived from pedigrees of individuals. The relationships are sometimes called additive genetic covariances. The second section focuses on incorporation of markers in models and the various approaches to marker aided selection we have introduced in previous modules. The third section addresses a key issue related to the use of markers for selection, namely, how missing genotypes are inferred or imputed so that breeding values can be calculated. The final section is about the realized genetic matrix for G-BLUP. The genetic relationship matrix based on the markers is called the G matrix, and BLUP analysis based on the G matrix is called G-BLUP. We will start, however, with definitions for a number of terms. When an individual is crossed, either as a female or a male, with a number of other parents, we measure the performance of the progeny of those crosses and estimate a mean value of that parent. This is typically done for trees used as females with many pollen donors. The deviation of the parent mean (X) from the population mean (μ) is called the general combining ability (GCA) of that tree. We can use a linear model to define the GCA of a tree as y sub i is equal to μ plus GCA plus error. GCA can be defined as the expected value of offspring from a given parent after it has been crossed with many pollen parents. The GCA of a given tree is half of the parental additive genetic value (½ a sub f). The other half, unmeasured here, comes from the pollen parents. The breeding value (BV) is the value of genes transmitted to progeny. We can define breeding value of a tree in a linear model as the average parental breeding values of the male (a sub m) and the female (a sub f), the fixed effects (that is, the intercept), and the error (e sub i). The genetic value is the value of genes to the parent tree itself. It includes both the additive and non-additive (dominance) effects. Dominance effects cannot be passed on to progeny by breeding. The difference between the genetic and the breeding value is, therefore, largely dominance deviation (assuming epistatic effects are negligible). We can use a linear model to define genetic values, as the average of parental breeding values, the Mendelian sampling (m sub i), and the error variance. Mendelian sampling is the deviation of offspring from mid-parent breeding values. In other words, offspring from a cross deviate from the mid-parent breeding values because of random sampling of parental genes caused by segregation and assortment. They receive different sets of genes from parents, which affects their phenotype (deviation from mid-parental mean). There are two approaches to calculating breeding values (BVs) based on progeny test information that is available for a sample of parents. We may fit a general combining ability model (GCA, or parental model) to obtain the parental GCA value and multiply this estimate by two to get breeding values. These models are easy to run in programs like SAS Statistical Software because there are relatively few mixed model equations to be solved. If we need to estimate breeding values of genetic groups, parents, and progeny simultaneously, individual-tree models are preferred. These models are called ‘animal models’ in the animal breeding literature. In the individual-tree models, trees are no longer independent. Progeny from the same female are correlated (that is, they are genetically related). The individual-tree models rely on both additive genetic variance and a matrix of information that describes the relationship of every tree to every other tree (both parents and progeny) in the data set. This is a key feature of the BLUP approach to estimating breeding values. This approach, as noted in module 5, is desired in programs that have advanced beyond one or two generations of breeding. So, let’s look at the details of the BLUP approach to see how general combining ability, breeding value, and genetic value are calculated. The traditional approach of using the BLUP procedure to predict breeding values is based on individual tree phenotypes and the genetic relationship matrix (A) derived from the individual tree pedigrees. The A must be known, and can be estimated if enough genetic markers exist. Best linear unbiased prediction (BLUP) is performed using matrix algebra. While this may be foreign to many of you, bear with the presentation to see if you can extract the essence of the analysis. The type of model used will depend on the nature of the trait being measured. We use linear mixed models to predict breeding values of continuous response variables, such as growth, and generalized linear mixed models for binary traits, such as disease incidence. The statistical model chosen is the foundation of progeny test analysis and must be defined with utmost care. We will now take a few moments to describe each basic element in a linear mixed model. y is the n by 1 row vector of phenotypic observations (think of a vector like a column of data for trees). “n by 1” represents the dimensions of the row vector, where n is the number of trees (that is, the number of rows) and 1 is the number of columns. X is the design matrix that relates elements of the fixed effect vector b to the row vector y b is the p by 1 row vector of fixed effects (for example, the intercept, sites, and blocks within sites), where p is the number of rows and 1 is the number of columns. These are non-genetic factors that contribute to the phenotype observed. Z is the design matrix that relates elements of the a and e vectors to y a is the q by 1 row vector of random, that is, genetic effects for family and family by site interaction for instance). q is the number of levels of random effects or number of trees in the data. e is the n by 1 row vector of random residuals with n by 1 dimensions, where n is the number of observations (trees). Although these terms likely seem abstract to you now, hopefully they will become clearer after our example on the next slide. To achieve a thorough understanding of linear mixed models in the context of predicting breeding values, you will likely need to review this slide and the next slide several times. In this slide, we adapt the linear mixed model described in the previous slide to develop a model for predicting breeding values. Suppose that we have measured height of five trees grown in two different locations (L1 and L2). We can assume that the trees come from a large population, which is a reasonable assumption, and therefore, we will treat the trees as random. The linear mixed model is shown in a standard statistical format (MODEL 1) as (y sub ij = l sub i + t sub j + e sub ij) where y sub ij is the j-th tree height measured in the i-th location; li is the i-th location effect, tj is the j-th tree effect (breeding value) and eij is the error term associated with the j-th tree in location i. The same model is shown in compact matrix notation as MODEL 2. In reality we may have many more fixed and random terms and writing linear models in statistical format makes them much longer. The other advantage of matrix format is that it is easier to talk about the assumptions of the model and it is easier to describe different variance- covariance structures of the model for the matrix format. The full matrix format of the same model is given in MODEL 3. This is like taking the x-ray pictures of matrices (X and Y) and vectors (y, b, a, and e). We will now describe each element of the mixed model as it applies to the predicting breeding values. y is a 5 by 1 row vector of height observations where 5 is the number of trees. This is the number of rows of the vector. X is the design matrix that relates the fixed location effects (b) to the height observations (y). For example, the first column of X is for location 1 and the second column is for location 2.