Journal list menu

Volume 4, Issue 3
Original Research
Open Access

Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP

Jeffrey B. Endelman

Corresponding Author

E-mail address: j.endelman@gmail.com

Dep. of Crop and Soil Sciences, Washington State Univ., 16650 State Route 536, Mount Vernon, WA, 98273

Corresponding author (E-mail address: j.endelman@gmail.com).Search for more papers by this author
First published: 01 November 2011
Citations: 127

All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.

Abstract

Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marker‐assisted selection. Genomic selection addresses this complexity by including all markers in the prediction model. A key method for the genomic prediction of breeding values is ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other kernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive kernels in plant breeding, a new software package for R called rrBLUP has been developed. At its core is a fast maximum‐likelihood algorithm for mixed models with a single variance component besides the residual error, which allows for efficient prediction with unreplicated training data. Use of the rrBLUP software is demonstrated through several examples, including the identification of optimal crosses based on superior progeny value. In cross‐validation tests, the prediction accuracy with nonadditive kernels was significantly higher than RR for wheat (Triticum aestivum L.) grain yield but equivalent for several maize (Zea mays L.) traits.

Abbreviations

  • θREML
  • restricted maximum likelihood solution for θ
  • BLR
  • Bayesian Linear Regression
  • BLUP
  • best linear unbiased prediction
  • EXP
  • exponential model
  • GAUSS
  • Gaussian model
  • GEBV
  • genomic‐estimated breeding value
  • LL
  • log‐likelihood
  • ML
  • maximum likelihood
  • REML
  • restricted maximum likelihood
  • RR
  • ridge regression
  • rpred
  • cross‐validation accuracy
  • rtrain
  • training population accuracy
  • SNP
  • single nucleotide polymorphism
  • The ability to predict complex traits from marker data is becoming increasingly important in plant breeding (Bernardo, 2008). The earliest attempts, now over 20 years old, involved first identifying significant markers and then combining them in a multiple regression model (Lande and Thompson, 1990). The focus over the last decade has been on genomic selection methods, in which all markers are included in the prediction model (Bernardo and Yu, 2007; Heffner et al., 2009; Jannink et al., 2010).

    One of the first methods proposed for genomic selection was ridge regression (RR), which is equivalent to best linear unbiased prediction (BLUP) in the context of mixed models (Whittaker et al., 2000; Meuwissen et al., 2001). The basic RR‐BLUP model is
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0001

    where urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0002 is a vector of marker effects, G is the genotype matrix (e.g., {aa,Aa,AA} = {–1,0,1} for biallelic single nucleotide polymorphisms (SNPs) under an additive model), and W is the design matrix relating lines to observations (y). The BLUP solution for the marker effects can be written as either urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0003 or urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0004, where Z = WG and the ridge parameter urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0005 is the ratio between the residual and marker variances (Searle et al., 2006). Compared with ordinary regression, for which the number of markers cannot exceed the number of observations, RR has no such limit and also has improved numerical stability when markers are highly correlated (Hoerl and Kennard, 2000).

    There is a close connection between marker‐based RR‐BLUP (Eq. [1]) and kinship‐BLUP, in which the performance of breeding lines is predicted based on their kinship to other germplasm (Bernardo, 1994; Piepho et al., 2008). The basic kinship‐BLUP model is
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0006
    where g is a vector of genotypic values. In pedigree‐based prediction of breeding values, K is the additive relationship matrix A derived from the coefficients of coancestry (Bernardo, 2010). These coefficients reflect the average behavior of alleles undergoing Mendelian segregation, but the actual segregation can be captured with the marker‐based relationship matrix
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0007

    Equation [3] has the property that, for random populations, its expected value is proportional to A plus a constant (Habier et al., 2007); for this reason it has been called the realized (additive) relationship matrix. Another key property of KRR is that the genomic‐estimated breeding values (GEBVs) it produces (urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0008 in Eq. [2]) are equivalent to those from the marker‐based RR‐BLUP approach (urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0009 in Eq. [1]) (Hayes et al., 2009).

    When using genomic selection to advance lines as varieties, it is not just the breeding (additive) value but the full genotypic value that is of interest (Piepho et al., 2008). Rather than modeling epistatic interactions directly, which is challenging because of the combinatorial complexity, an alternative approach is to capture them through an appropriate kernel function (Gianola and van Kaam, 2008; Piepho, 2009; de los Campos et al., 2010). The realized relationship model (Eq. [3]) is in fact a kernel in genotype space and can be written as urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0010, where the angle brackets denote the inner (or dot) product between genotypes i and j. In geometry the inner product measures the similarity of two vectors, so with the additive relationship model the genetic covariance between lines is proportional to their similarity in genotype space.

    This geometric formulation enables use of the so‐called kernel “trick” in machine learning, which involves replacing the inner product in the original (genotype) space with an inner product in a more complex feature space, technically called a reproducing kernel Hilbert space (Schölkopf and Smola, 2002):
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0011

    Equation [4] means that the kernel function K, which takes the two genotypes as arguments and returns a single number, equals the inner product between the genotypes in a feature space defined by Φ. Although one can construct kernels by first specifying Φ and then applying Eq. [4], this is unnecessary as the feature space is guaranteed to exist for any positive semidefinite kernel (Schölkopf and Smola, 2002). To calculate BLUPs that include nonadditive effects, it is sufficient to solve Eq. [2] with K based on an appropriate kernel function (Gianola and van Kaam, 2008).

    The objective of the present research was to develop an R package for genomic prediction based on a maximum likelihood (ML) or restricted maximum likelihood (REML) approach to ridge regression (RR) and other kernels. The result is rrBLUP (available at http://cran.r‐project.org/web/packages/rrBLUP [verified 21 Nov. 2011]), which uses a fast spectral algorithm for mixed models with a single variance component besides the residual error (Kang et al., 2008). After demonstrating features of the software, the accuracy of its prediction methods are compared by cross‐validation using structured populations of wheat (Triticum aestivum L.) (Crossa et al., 2010) and maize (Zea mays L.) (Yu et al., 2006).

    MATERIALS AND METHODS

    The wheat population consisted of 599 inbred lines genotyped at 1279 Diversity Array Technology (DArT) markers and was downloaded as part of the Bayesian Linear Regression (BLR) package for R, version 1.2 (Pérez et al., 2010). Single nucleotide polymorphism markers and phenotypic data for maize ear height, ear diameter, and male flowering time were downloaded from the TASSEL website (Bradbury et al., 2007). For each of the ten maize chromosomes, the diploid marker data were phased and missing alleles imputed using the software BEAGLE, version 3.3.1 (Browning and Browning, 2007). After removing monomorphic markers, 2953 remained. The population size was 279 inbred lines, but due to missing phenotypic data only 276 lines were available for flowering time and 249 for ear diameter.

    For each of the 179,101 unique crosses between the 599 wheat lines, the expected mean and standard deviation (SD) for the GEBV of the recombinant inbred progeny were calculated based on the predicted marker effects in environment 1. In the absence of a linkage map, markers were assumed to segregate independently, which is clearly an approximation. (With a linkage map the SD could be simulated more realistically.) If pk+ and pk denote the frequency of the +1 and –1 alleles, respectively, at locus k in the parents, then the mean GEBV of the inbred progeny is urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0012, and the variance (neglecting uncertainty in the marker effects) is
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0013

    Bayesian LASSO predictions were made with the BLR package for R, version 1.2, and hyperparameters were chosen based on the guidelines of Pérez et al. (2010). For the prior distribution of the residual variance, the degrees of freedom was dfε = 3 and the scale was Sε = (Var[y]/2)(2 + dfε), where Var[y] is the variance of the training data. The prior distribution for the LASSO shrinkage parameter had mode urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0014, where urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0015 is the average over the training data and the sum is over markers. The rate and shape hyperparameters were 2 × 10−5 and 0.52, respectively. A total of 10,000 iterations was used, with a burn‐in period of 2000 iterations.

    Statistical analysis of the cross‐validation results was conducted with SAS PROC GLM (SAS Institute, 1994), with partition and method as fixed effects. The REGWQ option was used to control the strong familywise error rate (the probability of false discovery) at 0.05.

    RESULTS AND DISCUSSION

    Marker vs. Kinship‐Based Prediction

    At the core of the rrBLUP package is the function mixed.solve, which solves any mixed model of the form
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0016

    where X is a full‐rank design matrix for the fixed effects β, Z is the design matrix for the random effects u, K is a positive semidefinite matrix, and the residuals are normal with constant variance. Variance components are estimated by either ML or REML (default) using the spectral decomposition algorithm of Kang et al. (2008). The R function returns the variance components, the maximized log‐likelihood (LL), the ML estimate for β, and BLUP solution for u.

    It was stated in the introduction that when the realized relationship matrix GG′ is used, the marker‐based (Eq. [1]) and kinship‐based (Eq. [2]) formulations of the prediction problem give equivalent GEBV. This can be verified numerically using mixed.solve and a set of 599 wheat lines from the BLR package for R (Pérez et al., 2010). The BLR variable Y contains the two‐year average grain yield in four environments (standardized to zero mean and unit variance), and the genotype matrix is coded as {0,1} in the variable X. To be consistent with the notation in this article, the genotypes were recoded as {–1,1} in G:

    In the first call to mixed.solve the design matrix equals the genotype matrix, so the random effects are the marker effects. In this case K is an identity matrix, which the software assumes because no K variable is provided. When no design matrix for fixed effects is provided, as in this example, an intercept term is automatically included. In the second call to mixed.solve, an identity matrix is used for Z and the realized relationship matrix GG′ is used for K. In this case the random effects are the breeding values, which in the last line of code are compared with the GEBV from the marker‐based model. As shown in the comments, the correlation is exactly 1. Each of the two calls to mixed.solve took five seconds on a laptop computer with two gigabytes of memory, running R 2.13.1 (R Development Core Team, 2011).

    Although the two approaches are equivalent for calculating GEBV, some analyses depend on knowing the marker effects. For example, when different lines are evaluated in different environments, even though a whole genotype × environment analysis is not possible, one can still study marker × environment interactions (Crossa et al., 2010).

    Another application is to design crosses in a breeding program (Bernardo et al., 2006; Zhong and Jannink, 2007). The expected mean for the progeny can be calculated as the mean of the parental GEBV, but the marker effects are needed to compute the variance of the population, which is important for genetic gain. To illustrate, each circle in Fig. 1 shows the expected mean (μ) and standard deviation (σ) for the GEBV of recombinant inbred lines from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marker effects in environment 1. In the upper right corner of the figure are crosses between lines with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected.

    image

    Analysis of line crosses. Each circle is the expected mean and standard deviation (SD) for the genomic‐estimated breeding values (GEBVs) of the recombinant inbred progeny from one wheat cross. Results are shown for all 179,101 unique crosses between the 599 wheat lines, using the predicted marker effects in environment 1. In the top right of the figure are crosses between parents with high GEBV and complementary alleles, for which high levels of transgressive segregation are expected.

    For a given selection intensity i, the mean of the selected population is μs = μ + iσ, which Zhong and Jannink (2007) called the superior progeny value. The superior progeny values for the crosses in Fig. 1 were calculated for selection intensities ranging from 1.4 (20% selected) to 2.7 (1% selected). The top nine crosses were conserved across this range and are listed in Table 1, with lines identified by their GEBV rank. Exactly one of the top two highest‐GEBV lines was found in every pair, but the 1×2 cross does not appear because the two lines share 96% of their alleles and have an expected SD of 0.07.

    Table 1. Top nine wheat crosses based on superior progeny value (SPV) in environment 1.
    Cross Kinship SPV20% SPV1% Mean GEBV§ SD GEBV
    1×4 0.57 2.261 2.524 1.971 0.207
    1×5 0.57 2.260 2.522 1.970 0.207
    1×3 0.69 2.256 2.487 2.000 0.183
    2×4 0.58 2.245 2.507 1.954 0.208
    2×5 0.58 2.243 2.506 1.953 0.208
    2×3 0.69 2.236 2.466 1.982 0.181
    1×7 0.57 2.227 2.486 1.940 0.205
    1×12 0.60 2.210 2.481 1.910 0.214
    2×7 0.59 2.209 2.469 1.923 0.205
    • Line identifier equals the GEBV rank.
    • Fraction of shared alleles (identity by state).
    • § GEBV, genomic‐estimated breeding value.

    Kernels with Epistatic Effects

    At present there are two kernels other than RR in the rrBLUP package. One is the Gaussian model (GAUSS):
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0018
    Where
    urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0019
    is the Euclidean distance between genotypes i and j, normalized to the interval [0,1]. The parameter θ is a scale parameter that influences how quickly the genetic covariance decays with distance. The other kernel is the exponential model (EXP): Kij = exp(–Dij/θ).
    These kernels are available through the rrBLUP function kinship.BLUP, which was designed to predict the genotypic values of one population based on the genotypes and phenotypes of a second, training population. To illustrate its use, consider again the 599 wheat lines from the BLR package, which have been randomly partitioned into 10 sets for use in 10‐fold cross‐validation (Pérez et al., 2010). The variable sets contain the partition number for each line. To predict the genotypic values of set 1 using the other nine sets as the training population, the R code is

    In the first call to kinship.BLUP the kernel method is not specified, so by default the realized relationship model is used. The last two lines of code calculate the correlation urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0021 between the predicted genotypic value and observed phenotype for the prediction population, which measures the cross‐validation accuracy of the prediction method.

    Table 2 shows the accuracies of the two methods for all 10 sets in environments 1 and 2. The results demonstrate that the performance of GAUSS compared to RR depends on both the structure of the population and the phenotype. For 9 out of 10 sets in environment 1, the accuracy with GAUSS was higher than RR. The largest gap was for set 5, where the accuracy with RR was 0.34 vs. 0.51 with GAUSS. Across the 10 sets the mean accuracy with GAUSS was 0.58 vs. 0.51 for RR (p = 0.009 by paired t‐test). By contrast, in environment 2 there was no significant difference between the prediction methods (p = 0.2).

    Table 2. Cross‐validation accuracies () for wheat grain yield.
    Environment 1 Environment 2
    Set RR GAUSS§ RR GAUSS
    1 0.49 0.61 0.37 0.37
    2 0.44 0.52 0.49 0.51
    3 0.41 0.44 0.48 0.49
    4 0.64 0.69 0.42 0.43
    5 0.34 0.51 0.31 0.31
    6 0.43 0.36 0.59 0.60
    7 0.64 0.71 0.54 0.55
    8 0.54 0.66 0.62 0.63
    9 0.57 0.62 0.42 0.44
    10 0.65 0.69 0.56 0.53
    Mean 0.51 0.58** 0.48 0.49
    • ** Means significantly different at the 0.01 probability level in Environment 1.
    • Prediction set; the other nine sets were used for training.
    • RR, ridge regression.
    • § GAUSS, Gaussian model.

    To better understand these differences, Fig. 2 shows the log‐likelihood (LL) (solid circles), training population accuracy (rtrain) (dashed line), and cross‐validation accuracy (rpred) (open circles) as a function of the scale parameter θ (see Eq. [6]). The rrBLUP package uses REML (or ML) to identify the optimal scale parameter, and because the genotype distances have been normalized to the unit interval (Eq. [7]), this is also the essential range for θ. The two panels in Fig. 2 correspond to sets 5 and 6 in environment 1, which showed contrasting results in the RR vs. GAUSS comparison: for set 5 the accuracy with GAUSS was higher and vice versa for set 6 (see Table 2). In both cases the REML solution for θ (θREML) was similar and the rtrain approached 1 as θ decreased to zero.

    image

    Performance of the Gaussian model (GAUSS). The figure depicts the effect of the Gaussian scale parameter (θ in Eq. [6]) on the restricted log‐likelihood (LL), the training population accuracy (rtrain), and the cross‐validation accuracy (rpred) when predicting sets 5 or 6 in environment 1. For set 5 the restricted maximum likelihood solution for θ (θREML) = 0.5, and for set 6 θREML = 0.4. In both cases rtrain approached 1 as θ → 0, but the trends for rpred were different. For set 5 rpred exhibited an interior maximum near θREML, while for set 6 rpred increased monotonically with θ. Because GAUSS is approximately ridge regression (RR) when θ is large, the contrasting behavior in this figure illustrates why GAUSS had higher rpred than RR for set 5 but vice versa for set 6 (see Table 2).

    The crucial difference lies in rpred. For set 5 rpred exhibited an interior maximum near the θREML while for set 6 rpred was maximized at θ = 1 and declined steadily as θ decreased. The significance of this observation for understanding Table 2 is that GAUSS behaves like RR when θ is large relative to D. This follows from the Taylor series expansion, urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0022, and the fact that urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0023 is equivalent to the additive model GG′ for inbred lines (Piepho, 2009). As θ decreases, the epistatic interactions in the higher order terms (e.g.,urn:x-wiley:19403372:equation:tpg2plantgenome2011080024-math-0024) become more important. When rpred has an interior maximum near θREML, as in set 5, GAUSS will have higher accuracy than RR. When rpred increases monotonically with θ, GAUSS will not have higher accuracy than RR; whether GAUSS is lower or equivalent depends on the shape of the LL profile. In the case of set 6, the LL profile peaked at θREML = 0.4, so RR had higher accuracy. For most sets in environment 2, both LL and rpred increased monotonically with θ (not shown), so GAUSS and RR were equivalent.

    These phenomena are relevant to the question of whether GAUSS is prone to overfitting, which Piepho (2009) and Heslot et al. (2012) have raised as a concern. In both studies the residual error with GAUSS was much smaller than with RR, or equivalently the accuracy for the training population was nearly 1. This was also observed with the BLR wheat data, as shown by the dashed line in Fig. 2. To constitute overfitting, however, there must be a tradeoff between higher accuracy for the training set and lower accuracy for the validation set (Dietrich, 1995). The results in Heslot et al. (2012) and the present study show that such a tradeoff is rare provided the scale parameter is chosen properly. Overfitting was observed for set 6 in environment 1, but more typically rpred was either the same or higher with GAUSS compared to RR (see Table 2).

    To investigate the matter further, a different data set—279 maize lines genotyped at 2953 SNP markers—was analyzed with the rrBLUP package. The cross‐validation accuracies for maize flowering time, ear height, and ear diameter are shown alongside the results for wheat grain yield in Table 3. For wheat grain yield, the accuracy with GAUSS was 6 to 7 percentage points higher than RR in every environment but environment 2 (similar to Crossa et al. [2010]). For all three maize traits there was no significant difference between GAUSS and RR, which provides additional evidence that overfitting (i.e., a loss in cross‐validation accuracy) is not common with GAUSS. The results also suggest that most (perhaps all) of the genetic variation was additive for the maize traits.

    Table 3 includes the cross‐validation results with EXP, which was equivalent to GAUSS for all seven traits. Piepho (2009) also found little difference between these two models in his analysis of maize grain yield. Like GAUSS, EXP captures nonadditive effects but the structure of its feature space is different. For the limited plant breeding data analyzed thus far with the two methods, this difference appears to be of little consequence.

    For the sake of comparison, Table 3 also shows the accuracy of the additive Bayesian LASSO model, which was equivalent to RR for all seven traits.

    Table 3. Tenfold cross‐validation accuracy () for maize and wheat traits.
    Method Wheat yield 1 Wheat yield 2 Wheat yield 3 Wheat yield 4 Maize flowering time Maize ear height Maize ear diameter
    GAUSS 0.58a 0.49a 0.45a 0.54a 0.73a 0.51a 0.53ab
    EXP 0.57a 0.49a 0.45a 0.54a 0.73a 0.54a 0.54a
    RR 0.51b 0.48a 0.38b 0.48b 0.73a 0.51a 0.52b
    BL 0.51b 0.48a 0.38b 0.47b 0.73a 0.52a 0.53ab
    • GAUSS, Gaussian model; EXP, exponential model; RR, ridge regression; BL, Bayesian LASSO.
    • Within each trait, accuracies with the same letter were not significantly different at the 0.05 probability level.

    CONCLUSIONS

    The objective of this research was to create software that makes ridge regression and other kernel methods accessible to plant breeders interested in genomic selection. At the core of the rrBLUP package is the function mixed.solve, which can be used to solve both the marker‐based and kinship‐based versions of the genomic prediction problem. The function kinship.BLUP provides a more intuitive interface for kinship‐based prediction and includes several genetic models, including an additive relationship matrix and the nonadditive Gaussian kernel.

    Acknowledgments

    The author thanks Jean‐Luc Jannink for his mentoring and helpful comments on the manuscript.

        Number of times cited according to CrossRef: 127

        • An active learning tool for quantitative genetics instruction using R and shiny, Natural Sciences Education, 10.1002/nse2.20026, 49, 1, (2020).
        • PANOMICS meets germplasm, Plant Biotechnology Journal, 10.1111/pbi.13372, 18, 7, (1507-1525), (2020).
        • Using public databases for genomic prediction of tropical maize lines, Plant Breeding, 10.1111/pbr.12827, 139, 4, (697-707), (2020).
        • Recurrent genomic selection for wheat grain fructans, Crop Science, 10.1002/csc2.20130, 60, 3, (1499-1512), (2020).
        • Genomic selection helps accelerate popcorn population breeding, Crop Science, 10.1002/csc2.20112, 60, 3, (1373-1385), (2020).
        • Mapping crown rust resistance at multiple time points in elite oat germplasm, The Plant Genome, 10.1002/tpg2.20007, 13, 1, (2020).
        • Genomic selection for lentil breeding: Empirical evidence, The Plant Genome, 10.1002/tpg2.20002, 13, 1, (2020).
        • Dominance and G×E interaction effects improve genomic prediction and genetic gain in intermediate wheatgrass (Thinopyrum intermedium), The Plant Genome, 10.1002/tpg2.20012, 13, 1, (2020).
        • Genomic prediction of maternal haploid induction rate in maize, The Plant Genome, 10.1002/tpg2.20014, 13, 1, (2020).
        • Implementing within‐cross genomic prediction to reduce oat breeding costs, The Plant Genome, 10.1002/tpg2.20004, 13, 1, (2020).
        • Genome‐wide association of volatiles reveals candidate loci for blueberry flavor, New Phytologist, 10.1111/nph.16459, 226, 6, (1725-1737), (2020).
        • Impact of sorghum racial structure and diversity on genomic prediction of grain yield components, Crop Science, 10.1002/csc2.20060, 60, 1, (132-148), (2020).
        • A connected half‐sib family training population for genomic prediction in barley, Crop Science, 10.1002/csc2.20104, 60, 1, (262-281), (2020).
        • Genome‐wide association mapping and genomic prediction of Fusarium head blight resistance, heading stage and plant height in winter rye (Secale cereale), Plant Breeding, 10.1111/pbr.12810, 139, 3, (508-520), (2020).
        • Genomic prediction and quantitative trait locus discovery in a cassava training population constructed from multiple breeding stages, Crop Science, 10.1002/csc2.20003, 60, 2, (896-913), (2020).
        • Genomewide selection utilizing historic datasets improves early stage selection accuracy and selection stability, Crop Science, 10.1002/csc2.20017, 60, 2, (772-778), (2020).
        • Impact of dominance effects on autotetraploid genomic prediction, Crop Science, 10.1002/csc2.20075, 60, 2, (656-665), (2020).
        • Choosing the optimal population for a genome‐wide association study: A simulation of whole‐genome sequences from rice, The Plant Genome, 10.1002/tpg2.20005, 13, 1, (2020).
        • Genomewide predictions as a substitute for a portion of phenotyping in maize, Crop Science, 10.1002/csc2.20082, 60, 1, (181-189), (2020).
        • Genome‐wide association mapping of Fusarium langsethiae infection and mycotoxin accumulation in oat (Avena sativa L.), The Plant Genome, 10.1002/tpg2.20023, 13, 2, (2020).
        • The pulse of the tree is under genetic control: eucalyptus as a case study, The Plant Journal, 10.1111/tpj.14734, 103, 1, (338-356), (2020).
        • Variance heterogeneity genome‐wide mapping for cadmium in bread wheat reveals novel genomic loci and epistatic interactions, The Plant Genome, 10.1002/tpg2.20011, 13, 1, (2020).
        • A sorghum practical haplotype graph facilitates genome‐wide imputation and cost‐effective genomic prediction, The Plant Genome, 10.1002/tpg2.20009, 13, 1, (2020).
        • Life‐history genomic regions explain differences in Atlantic salmon marine diet specialization, Journal of Animal Ecology, 10.1111/1365-2656.13324, 0, 0, (2020).
        • Allelic variation in rice Fertilization Independent Endosperm 1 contributes to grain width under high night temperature stress, New Phytologist, 10.1111/nph.16897, 0, 0, (2020).
        • Association mapping and genomic prediction for ear rot disease caused by Fusarium verticillioides in a tropical maize germplasm, Crop Science, 10.1002/csc2.20272, 0, 0, (2020).
        • Genetic dissection of bread wheat diversity and identification of adaptive loci in response to elevated tropospheric ozone, Plant, Cell & Environment, 10.1111/pce.13864, 0, 0, (2020).
        • Genome‐wide analysis and prediction of Fusarium head blight resistance in soft red winter wheat, Crop Science, 10.1002/csc2.20273, 0, 0, (2020).
        • Genome‐based prediction of multiple wheat quality traits in multiple years, The Plant Genome, 10.1002/tpg2.20034, 0, 0, (2020).
        • Genomic prediction of maize microphenotypes provides insights for optimizing selection and mining diversity, Plant Biotechnology Journal, 10.1111/pbi.13420, 0, 0, (2020).
        • Identifying inbred lines with resistance to endemic diseases in exotic maize germplasm, Crop Science, 10.1002/csc2.20275, 0, 0, (2020).
        • The genetics of exapted resistance to two exotic pathogens in pedunculate oak, New Phytologist, 10.1111/nph.16319, 226, 4, (1088-1103), (2019).
        • Genomic assessment of local adaptation in dwarf birch to inform assisted gene flow, Evolutionary Applications, 10.1111/eva.12883, 13, 1, (161-175), (2019).
        • Multivariate Genomic Selection and Potential of Rapid Indirect Selection with Speed Breeding in Spring Wheat, Crop Science, 10.2135/cropsci2018.12.0757, 59, 5, (1945-1959), (2019).
        • Genome‐wide Association Mapping and Prediction of Adult Stage Septoria tritici Blotch Infection in European Winter Wheat via High‐Density Marker Arrays, The Plant Genome, 10.3835/plantgenome2018.05.0029, 12, 1, (1-13), (2019).
        • Genomic Prediction using Existing Historical Data Contributing to Selection in Biparental Populations: A Study of Kernel Oil in Maize, The Plant Genome, 10.3835/plantgenome2018.05.0025, 12, 1, (1-9), (2019).
        • Evaluation of RR‐BLUP Genomic Selection Models that Incorporate Peak Genome‐Wide Association Study Signals in Maize and Sorghum, The Plant Genome, 10.3835/plantgenome2018.07.0052, 12, 1, (1-14), (2019).
        • Predicted Genetic Gains from Targeted Recombination in Elite Biparental Maize Populations, The Plant Genome, 10.3835/plantgenome2018.08.0062, 12, 1, (1-8), (2019).
        • Evaluating Selection of a Quantitative Trait: Snow Mold Tolerance in Winter Wheat, Agrosystems, Geosciences & Environment, 10.2134/age2019.07.0059, 2, 1, (1-8), (2019).
        • Genetic Changes in Sorghum, Sorghum, undefined, (1-30), (2019).
        • Multiple Maize Reference Genomes Impact the Identification of Variants by Genome‐Wide Association Study in a Diverse Inbred Panel, The Plant Genome, 10.3835/plantgenome2018.09.0069, 12, 2, (1-12), (2019).
        • Genome‐Wide Analysis and Prediction of Resistance to Goss's Wilt in Maize, The Plant Genome, 10.3835/plantgenome2018.06.0045, 12, 2, (1-10), (2019).
        • Identification and Fine‐Mapping of a Soybean Quantitative Trait Locus on Chromosome 5 Conferring Tolerance to Iron Deficiency Chlorosis, The Plant Genome, 10.3835/plantgenome2019.01.0007, 12, 3, (1-13), (2019).
        • Training Population Optimization for Genomic Selection, The Plant Genome, 10.3835/plantgenome2019.04.0028, 12, 3, (1-14), (2019).
        • Association Analysis of Baking and Milling Quality Traits in an Elite Soft Red Winter Wheat Population, Crop Science, 10.2135/cropsci2018.12.0751, 59, 3, (1085-1094), (2019).
        • Comparison of Representative and Custom Methods of Generating Core Subsets of a Carrot Germplasm Collection, Crop Science, 10.2135/cropsci2018.09.0602, 59, 3, (1107-1121), (2019).
        • Validating Genomewide Predictions of Genetic Variance in a Contemporary Breeding Program, Crop Science, 10.2135/cropsci2018.11.0716, 59, 3, (1062-1072), (2019).
        • Genetic Variation and Trait Correlations in an East African Cassava Breeding Population for Genomic Selection, Crop Science, 10.2135/cropsci2018.01.0060, 59, 2, (460-473), (2019).
        • Multienvironment and Multitrait Genomic Selection Models in Unbalanced Early‐Generation Wheat Yield Trials, Crop Science, 10.2135/cropsci2018.03.0189, 59, 2, (491-507), (2019).
        • Mapping QTLs for Grain Protein Concentration and Agronomic Traits under Different Nitrogen Levels in Barley, Crop Science, 10.2135/cropsci2018.03.0208, 59, 1, (68-83), (2019).
        • Influence of Genotype and Environment on Wheat Grain Fructan Content, Crop Science, 10.2135/cropsci2018.06.0363, 59, 1, (190-198), (2019).
        • Genome‐Wide Association Mapping of Seedling Net Form Net Blotch Resistance in an Ethiopian and Eritrean Barley Collection, Crop Science, 10.2135/cropsci2019.01.0003, 59, 4, (1625-1638), (2019).
        • Cold stress tolerance of soybeans during flowering: QTL mapping and efficient selection strategies under controlled conditions, Plant Breeding, 10.1111/pbr.12734, 138, 6, (708-720), (2019).
        • Genetic relatedness and the ratio of subpopulation‐common alleles are related in genomic prediction across structured subpopulations in maize, Plant Breeding, 10.1111/pbr.12717, 138, 6, (802-809), (2019).
        • Genomic Selection of Forage Quality Traits in Winter Wheat, Crop Science, 10.2135/cropsci2018.10.0655, 59, 6, (2473-2483), (2019).
        • Maintaining the Accuracy of Genomewide Predictions when Selection Has Occurred in the Training Population, Crop Science, 10.2135/cropsci2017.11.0682, 58, 3, (1226-1231), (2018).
        • Naturalgwas: An R package for evaluating genomewide association methods with empirical data, Molecular Ecology Resources, 10.1111/1755-0998.12892, 18, 4, (789-797), (2018).
        • Genomic Analysis and Prediction within a US Public Collaborative Winter Wheat Regional Testing Nursery, The Plant Genome, 10.3835/plantgenome2018.01.0004, 11, 3, (1-7), (2018).
        • Genome‐Wide Association Study Using Historical Breeding Populations Discovers Genomic Regions Involved in High‐Quality Rice, The Plant Genome, 10.3835/plantgenome2017.08.0076, 11, 3, (1-12), (2018).
        • Genome‐Wide Association Mapping of Host‐Plant Resistance to Soybean Aphid, The Plant Genome, 10.3835/plantgenome2018.02.0011, 11, 3, (1-12), (2018).
        • Candidate Variants for Additive and Interactive Effects on Bioenergy Traits in Switchgrass (Panicum virgatum L.) Identified by Genome‐Wide Association Analyses, The Plant Genome, 10.3835/plantgenome2018.01.0002, 11, 3, (1-18), (2018).
        • The Accuracy of Genomic Prediction between Environments and Populations for Soft Wheat Traits, Crop Science, 10.2135/cropsci2017.10.0638, 58, 6, (2274-2288), (2018).
        • Diagnostic Markers for Vernalization and Photoperiod Loci Improve Genomic Selection for Grain Yield and Spectral Reflectance in Wheat, Crop Science, 10.2135/cropsci2017.06.0348, 58, 1, (242-252), (2018).
        • Genomewide Selection for Unfavorably Correlated Traits in Maize, Crop Science, 10.2135/cropsci2017.12.0719, 58, 4, (1587-1593), (2018).
        • Impact of Mislabeling on Genomic Selection in Cassava Breeding, Crop Science, 10.2135/cropsci2017.07.0442, 58, 4, (1470-1480), (2018).
        • Applications of Machine Learning Methods to Genomic Selection in Breeding Wheat for Rust Resistance, The Plant Genome, 10.3835/plantgenome2017.11.0104, 11, 2, (1-15), (2018).
        • Effective Genomic Selection in a Narrow‐Genepool Crop with Low‐Density Markers: Asian Rapeseed as an Example, The Plant Genome, 10.3835/plantgenome2017.09.0084, 11, 2, (1-14), (2018).
        • Genomic Selection for Increased Yield in Synthetic‐Derived Wheat, Crop Science, 10.2135/cropsci2016.04.0209, 57, 2, (713-725), (2017).
        • A Simple Package to Script and Simulate Breeding Schemes: The Breeding Scheme Language, Crop Science, 10.2135/cropsci2016.06.0538, 57, 3, (1347-1354), (2017).
        • Improving Genomic Prediction for Pre‐Harvest Sprouting Tolerance in Wheat by Weighting Large‐Effect Quantitative Trait Loci, Crop Science, 10.2135/cropsci2016.06.0453, 57, 3, (1315-1324), (2017).
        • Genomic Selection for Yield and Seed Protein Content in Soybean: A Study of Breeding Program Data and Assessment of Prediction Accuracy, Crop Science, 10.2135/cropsci2016.06.0496, 57, 3, (1325-1337), (2017).
        • Mapping Agronomic Traits in a Wild Barley Advanced Backcross–Nested Association Mapping Population, Crop Science, 10.2135/cropsci2016.10.0850, 57, 3, (1199-1210), (2017).
        • Population Structure and Genetic Diversity Analysis of Germplasm from the Winter Wheat Eastern European Regional Yield Trial (WWEERYT), Crop Science, 10.2135/cropsci2016.08.0639, 57, 2, (812-820), (2017).
        • Genomewide Selection with Biallelic versus Triallelic Models in Three‐Way Maize Populations, Crop Science, 10.2135/cropsci2016.12.1001, 57, 5, (2471-2477), (2017).
        • Genomic Prediction in a Large African Maize Population, Crop Science, 10.2135/cropsci2016.08.0715, 57, 5, (2361-2371), (2017).
        • Genome‐Wide Association and Prediction of Grain and Semolina Quality Traits in Durum Wheat Breeding Populations, The Plant Genome, 10.3835/plantgenome2017.05.0038, 10, 3, (1-12), (2017).
        • Prospects for Genomic Selection in Cassava Breeding, The Plant Genome, 10.3835/plantgenome2017.03.0015, 10, 3, (1-19), (2017).
        • Increasing Genomic‐Enabled Prediction Accuracy by Modeling Genotype × Environment Interactions in Kansas Wheat, The Plant Genome, 10.3835/plantgenome2016.12.0130, 10, 2, (1-15), (2017).
        • Strategies for Selecting Crosses Using Genomic Prediction in Two Wheat Breeding Programs, The Plant Genome, 10.3835/plantgenome2016.12.0128, 10, 2, (1-12), (2017).
        • Unlocking Diversity in Germplasm Collections via Genomic Selection: A Case Study Based on Quantitative Adult Plant Resistance to Stripe Rust in Spring Wheat, The Plant Genome, 10.3835/plantgenome2016.12.0124, 10, 3, (1-15), (2017).
        • Prospective Targeted Recombination and Genetic Gains for Quantitative Traits in Maize, The Plant Genome, 10.3835/plantgenome2016.11.0118, 10, 2, (1-9), (2017).
        • Multitrait, Random Regression, or Simple Repeatability Model in High‐Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield, The Plant Genome, 10.3835/plantgenome2016.11.0111, 10, 2, (1-15), (2017).
        • Genome‐Wide Analysis of Tar Spot Complex Resistance in Maize Using Genotyping‐by‐Sequencing SNPs and Whole‐Genome Prediction, The Plant Genome, 10.3835/plantgenome2016.10.0099, 10, 2, (1-14), (2017).
        • Comparison of Models and Whole‐Genome Profiling Approaches for Genomic‐Enabled Prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot Resistance in Wheat, The Plant Genome, 10.3835/plantgenome2016.08.0082, 10, 2, (1-16), (2017).
        • GBS‐Based Genomic Selection for Pea Grain Yield under Severe Terminal Drought, The Plant Genome, 10.3835/plantgenome2016.07.0072, 10, 2, (1-13), (2017).
        • Evaluation of Genetic Diversity and Host Resistance to Stem Rust in USDA NSGC Durum Wheat Accessions, The Plant Genome, 10.3835/plantgenome2016.07.0071, 10, 2, (1-13), (2017).
        • A Comprehensive Image‐based Phenomic Analysis Reveals the Complex Genetic Architecture of Shoot Growth Dynamics in Rice (Oryza sativa), The Plant Genome, 10.3835/plantgenome2016.07.0064, 10, 2, (1-14), (2017).
        • QTLs Associated with Crown Root Angle, Stomatal Conductance, and Maturity in Sorghum, The Plant Genome, 10.3835/plantgenome2016.04.0038, 10, 2, (1-12), (2017).
        • Genome‐Wide Association and Prediction Reveals Genetic Architecture of Cassava Mosaic Disease Resistance and Prospects for Rapid Genetic Improvement, The Plant Genome, 10.3835/plantgenome2015.11.0118, 9, 2, (1-13), (2016).
        • Genomic Selection for Processing and End‐Use Quality Traits in the CIMMYT Spring Bread Wheat Breeding Program, The Plant Genome, 10.3835/plantgenome2016.01.0005, 9, 2, (1-12), (2016).
        • Germplasm Architecture Revealed through Chromosomal Effects for Quantitative Traits in Maize, The Plant Genome, 10.3835/plantgenome2016.03.0028, 9, 2, (1-11), (2016).
        • GAPIT Version 2: An Enhanced Integrated Tool for Genomic Association and Prediction, The Plant Genome, 10.3835/plantgenome2015.11.0120, 9, 2, (1-9), (2016).
        • The Triticeae Toolbox: Combining Phenotype and Genotype Data to Advance Small‐Grains Breeding, The Plant Genome, 10.3835/plantgenome2014.12.0099, 9, 2, (1-10), (2016).
        • Genome to Phenome Mapping in Apple Using Historical Data, The Plant Genome, 10.3835/plantgenome2015.11.0113, 9, 2, (1-15), (2016).
        • Software for Genome‐Wide Association Studies in Autopolyploids and Its Application to Potato, The Plant Genome, 10.3835/plantgenome2015.08.0073, 9, 2, (1-10), (2016).
        • Establishment and Optimization of Genomic Selection to Accelerate the Domestication and Improvement of Intermediate Wheatgrass, The Plant Genome, 10.3835/plantgenome2015.07.0059, 9, 1, (1-18), (2016).
        • Molecular Marker Information in the Analysis of Multi‐Environment Trials Helps Differentiate Superior Genotypes from Promising Parents, Crop Science, 10.2135/cropsci2016.03.0151, 56, 5, (2612-2628), (2016).
        • Genome‐wide Association for Plant Height and Flowering Time across 15 Tropical Maize Populations under Managed Drought Stress and Well‐Watered Conditions in Sub‐Saharan Africa, Crop Science, 10.2135/cropsci2015.10.0632, 56, 5, (2365-2378), (2016).
        • Near‐Infrared Calibration of Soluble Stem Carbohydrates for Predicting Drought Tolerance in Spring Wheat, Agronomy Journal, 10.2134/agronj2015.0173, 108, 1, (285-293), (2016).
        • Genomic Selection Performs Similarly to Phenotypic Selection in Barley, Crop Science, 10.2135/cropsci2015.09.0557, 56, 6, (2871-2881), (2016).
        • See more