Journal list menu

Early View
ORIGINAL ARTICLE
Open Access

Predicting superior crosses in winter wheat using genomics: A retrospective study to assess accuracy

Carolina Ballén-Taborda

Corresponding Author

Carolina Ballén-Taborda

Department of Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, USA

Pee Dee Research and Education Center, Clemson University, Florence, South Carolina, USA

Correspondence

Carolina Ballén-Taborda, Department of Plant and Environmental Sciences, Clemson University, Clemson, SC 29634, USA. Email: [email protected]

Contribution: Conceptualization, Data curation, Formal analysis, ​Investigation, Methodology, Visualization, Writing - original draft, Writing - review & editing

Search for more papers by this author
Jeanette Lyerly

Jeanette Lyerly

Crop and Soil Sciences Department, North Carolina State University, Raleigh, North Carolina, USA

Contribution: Data curation, Formal analysis, ​Investigation, Methodology, Writing - review & editing

Search for more papers by this author
Jared Smith

Jared Smith

USDA-ARS, Plant Science Research Unit, Raleigh, North Carolina, USA

Contribution: Data curation, Formal analysis, ​Investigation, Methodology

Search for more papers by this author
Kimberly Howell

Kimberly Howell

USDA-ARS, Plant Science Research Unit, Raleigh, North Carolina, USA

Contribution: Data curation, Formal analysis, ​Investigation, Methodology

Search for more papers by this author
Gina Brown-Guedira

Gina Brown-Guedira

Crop and Soil Sciences Department, North Carolina State University, Raleigh, North Carolina, USA

USDA-ARS, Plant Science Research Unit, Raleigh, North Carolina, USA

Contribution: Data curation, Formal analysis, Funding acquisition, ​Investigation, Methodology, Writing - review & editing

Search for more papers by this author
Noah DeWitt

Noah DeWitt

School of Plant, Environmental and Soil Sciences, Louisiana State University, Baton Rouge, Louisiana, USA

Contribution: Data curation, Formal analysis, ​Investigation, Methodology, Writing - review & editing

Search for more papers by this author
Brian Ward

Brian Ward

Forage Genetics International, West Salem, Wisconsin, USA

Contribution: Data curation, Formal analysis, ​Investigation, Methodology, Writing - review & editing

Search for more papers by this author
Md Ali Babar

Md Ali Babar

Agronomy Department, University of Florida, Gainesville, Florida, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
Stephen A. Harrison

Stephen A. Harrison

School of Plant, Environmental and Soil Sciences, Louisiana State University, Baton Rouge, Louisiana, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
Richard E. Mason

Richard E. Mason

College of Agricultural Sciences, Colorado State University, Fort Collins, Colorado, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
Mohamed Mergoum

Mohamed Mergoum

Department of Crop and Soil Sciences, University of Georgia, Griffin, Georgia, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
J. Paul Murphy

J. Paul Murphy

Crop and Soil Sciences Department, North Carolina State University, Raleigh, North Carolina, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
Russell Sutton

Russell Sutton

Department of Soil and Crop Sciences, Texas A&M University, Commerce, Texas, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
Carl A. Griffey

Carl A. Griffey

School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, Virginia, USA

Contribution: Data curation, Funding acquisition, ​Investigation, Methodology, Resources, Writing - review & editing

Search for more papers by this author
Richard E. Boyles

Richard E. Boyles

Department of Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, USA

Pee Dee Research and Education Center, Clemson University, Florence, South Carolina, USA

Contribution: Conceptualization, Data curation, Formal analysis, Funding acquisition, ​Investigation, Methodology, Project administration, Resources, Supervision, Visualization, Writing - original draft, Writing - review & editing

Search for more papers by this author
First published: 24 May 2024

Assigned to Associate Editor Sivakumar Sukumaran.

Abstract

In plant breeding, selecting cross-combinations that are more likely to result in superior lines for cultivar development is critical. This step, however, is subjective with decisions being based on available genomic and phenotypic data for prospective parents. Genomic prediction (GP) provides new opportunities to accelerate genetic gain for a target trait by identifying superior crosses through simulation of progeny performance. In this context, this study deployed GP using the phenotype and genotype of potential parents to predict the progeny genetic variance (VG) and means of overall, inferior 10%, and superior 10% (μ, μip, and μsp, respectively). This retrospective experimental design investigated whether the crosses that produced superior soft red winter wheat breeding lines would have been made if progeny simulations had guided crossing decisions of breeding programs. Here, data from historical wheat breeding lines were used to train GP models and predict VG and means for yield, test weight, heading date, and plant height for all combinations of 217 parents. Predicted and observed data for 670 lines derived from biparental crosses were compared to assess the accuracy of progeny simulations, and low-to-moderate prediction accuracy was observed for the four traits (0.25–0.52). Of the pedigrees that produced lines that were selected and advanced into later stage nurseries, 76% were predicted to give rise to progeny with above-average yield. The moderate correlation found between predicted progeny means and observed line per se performance justifies using cross-combination prediction as a tool to reduce crossing number and focus on segregating populations that harbor future cultivars.

Abbreviations

  • BLUPs
  • best linear unbiased predictors
  • DON
  • deoxynivalenol
  • GAWN
  • Gulf Atlantic Wheat Nursery
  • GBS
  • genotyping-by-sequencing
  • GEBV
  • genomic estimated breeding value
  • GP
  • genomic prediction
  • GSSP
  • GAWN, SunWheat, SPE, and SPL nurseries
  • h2
  • narrow-sense heritability
  • H2
  • broad-sense heritability
  • HapMap
  • haplotype map
  • HD
  • heading date
  • MP
  • mid-parent
  • PH
  • plant height
  • PopVar
  • population genetic variance
  • RILs
  • recombinant inbred lines
  • rrBLUP
  • Ridge-regression best linear unbiased prediction
  • SNP
  • single nucleotide polymorphism
  • SPE
  • SunGrains preliminary early nursery
  • SPL
  • SunGrains preliminary late nursery
  • SRWW
  • soft red winter wheat
  • SunGrains
  • Southeastern University Small Grains Cooperative
  • SunWheat
  • SunGrains Advanced Wheat Nursery
  • TP
  • training population
  • TW
  • test weight
  • USW
  • Uniform Southern Soft Red Winter Wheat nursery
  • VA
  • additive variance
  • VG
  • genetic variance
  • YLD
  • grain yield
  • 1 INTRODUCTION

    In plant breeding, the selection of parents is critical for developing superior progeny with the highest mean performance, increasing or maintaining genetic variation, and enhancing the rate of genetic gain for a target trait (Jean et al., 2021; Yao et al., 2018). Often, new inbreds are developed from hybridizations among elite lines, and other combinations are not considered because a limited number of crosses can be managed by a breeding program each year (Bernardo, 2003; Yao et al., 2018). Further, lines derived from a high percentage of crosses that do not deliver superior performance are discarded in subsequent cycles (Lado et al., 2017; Witcombe et al., 2013). Therefore, having the biparental progeny simulations among a large number of parents, before the crosses are made and evaluated, would be helpful to improve the efficiency of line development (Bernardo, 2014; Utz et al., 2001). This is particularly important to select crosses for accelerated advancement using doubled haploids or speed breeding, which are valuable tools to hasten the breeding cycle but are often costly and labor intensive (Dwivedi et al., 2015; Li et al., 2013; Wanga et al., 2021).

    Breeders use all information available (genetic markers and phenotypes) to prioritize the crosses to be made (Lado et al., 2017), but these data can be limited, particularly when implementing rapid cycling. Superior crosses produce segregating populations with a large genetic variance (VG) and improved trait mean (μ). Predicting both parameters would be important for identifying the best crosses and maximizing genetic gain (Beckett et al., 2019; Utz et al., 2001; Yao et al., 2018; Zhong & Jannink, 2007). Breeders can rely on expected progeny μ (or mid-parent [MP] value) because it can be accurately estimated by averaging the phenotypic values of two parents (Bernardo, 2014; Jean et al., 2021). However, when two breeding populations exhibit similar μ, a precise estimation of VG may determine which populations have greater potential (Beckett et al., 2019). Previously, obtaining accurate estimations of VG was difficult (Bernardo, 2014; Zhong & Jannink, 2007), but prediction of this genetic parameter has improved with the application of genomic prediction (GP) methods (Osthushenrich et al., 2018; Tiede et al., 2015).

    Before the widespread use of genome-wide markers, the usefulness criterion (Up) was proposed to measure the short-term genetic gain achieved in biparental crosses (Schnell & Utz, 1975). The Up is a function of the population mean (μ), additive variance (VA), narrow-sense heritability (h2), and selection intensity (i) (Schnell & Utz, 1975). Assigning value to a given cross without a priori information on its μ or VA is now possible using genomic information (Bernardo, 2014).

    GP provides new opportunities to accelerate genetic gain within modern crop breeding programs (Crossa et al., 2017; Voss-Fels et al., 2019). This method uses calibrated statistical models with genotypes and phenotypes of a training population (TP) to predict the additive merit (termed as genomic estimated breeding values [GEBVs]) of new, unphenotyped breeding lines based on genome-wide marker data. In principle, GEBVs can be used to advance genotypes with favorable trait values in the breeding pipeline prior to phenotyping and to select parents for new cross-combinations (Voss-Fels et al., 2019). Using a GP approach to evaluate all possible parental combinations in silico could allow breeders to identify crosses that would produce useful progenies (Jean et al., 2021) based on both VG and μ (Beckett et al., 2019; Utz et al., 2001; Yao et al., 2018; Zhong & Jannink, 2007). Implementing GP methodology, the R package PopVar (population genetic variance) uses genotype and phenotype data of a set of potential parents to predict VG and progeny means for all possible biparental crosses in a half-diallel mating design (Mohammadi et al., 2015). Based on predicted population parameters, superior crosses can theoretically be identified.

    This study investigated the usefulness of progeny simulations to select the best cross-combinations in soft red winter wheat (SRWW). It was examined whether the crosses that produced superior wheat breeding lines, defined as lines that advanced into later stage nurseries or were released, would have been prioritized by SunGrains’ (Southeastern University Small Grains Cooperative) breeders if progeny simulations had been available. The simulated performance of biparental populations was completed using genotype and phenotype data for 217 parental lines and compared with data from 670 lines previously evaluated in multi-year, multilocation trials. Four traits were investigated in the study, including grain yield (YLD), test weight (TW), heading date (HD), and plant height (PH). Results indicate that predicted VG and μ—and, by extension, μsp—in progeny simulations could collectively inform breeders to help identify the most valuable crosses and allocate resources toward useful cross-combinations and downstream progeny selection. Predicting parental combinations using genome-wide markers could significantly increase the efficiency of breeding programs, including reducing cost, by allowing breeders to focus efforts on pedigrees that are more likely to give rise to superior lines for cultivar development.

    2 MATERIALS AND METHODS

    2.1 Plant materials and historical phenotype data

    As part of the SunGrains multi-institutional breeding cooperative, superior SRWW breeding lines are evaluated annually across the greater Southeastern United States in five regional nurseries: the Uniform Southern Soft Red Winter Wheat Nursery (USW), the Gulf Atlantic Wheat Nursery (GAWN), the SunGrains Advanced Wheat Nursery (SunWheat), and the SunGrains preliminary early and late nurseries (SPE and SPL) (Figure 1). A total of 3084 lines have been tested in 24 trial locations across 11 states over 15 years (2008–2022) (Table S1). The SunGrains multi-year, multilocation, and multi-trait historical phenotypic dataset used in the present study consisted of 30,382 observations for YLD (kg ha−1), 25,470 observations for TW (kg hL−1), 18,742 observations for HD (Julian days), and 13,761 observations for PH (cm). The number of replications per site-year combination varied, with an average of 1.4.

    Details are in the caption following the image
    Workflow for advancing lines from preliminary to advanced nurseries. From F6 generation → SPE/SPL (SunGrains Preliminary Early/Late Nursery) → SunWheat (SunGrains Advanced Wheat Nursery) → GAWN (Gulf Atlantic Wheat Nursery) to → USW (Uniform Southern Soft Red Winter Wheat Nursery).

    Two methods were used to subset the historical phenotypic dataset into TP. TP1 included historical data collected from the GAWN, SunWheat, SPE, and SPL nurseries (GSSP) from 2008 to 2022, and TP2 included only data collected from the GAWN and SunWheat nurseries (GSW) over the same years (Table 1). For TP2, SPE and SPL were excluded to investigate the inclusion or exclusion of these preliminary, unreplicated nurseries.

    Core Ideas

    • In plant breeding, selecting parents to be crossed is critical for developing superior progeny.
    • Historical winter wheat data were used to assess the usefulness of genomic prediction for parental selection.
    • Predicted yield and SunGrains breeders’ assessment and selection largely agreed.
    • Simulated progeny performance could allow breeders to focus on the most promising crosses.
    TABLE 1. Four combinations of input data for population genetic variance (PopVar) simulation of population parameters.
    No. of SNPs training population 1500 SNPs (randomly distributed) 12,917 SNPs
    TP1 (GSSP) 1 2
    TP2 (GSW) 3 4
    • Abbreviations: GSSP, GAWN, SunWheat, SPE, and SPL nurseries; GSW, GAWN and SunWheat nurseries; SNP, single nucleotide polymorphism; TP, training population.
    For each TP, estimates of genotype values across environments for YLD, TW, HD, and PH were estimated by fitting the following linear model (Yao et al., 2018) using the function “lmer” of the “lme4” package in R (Bates et al., 2015):
    Y i j k = μ + G i + E j + R k ( j ) + G E i j + e i j k $$\begin{equation*}{{Y}_{ijk}} = \mu + {{G}_i} + {{E}_j} + {{R}_{k(j)}} + G{{E}_{ij}} + {{e}_{ijk}}\end{equation*}$$
    where Yijk represents the phenotypic observation of genotype i in environment j and replication k; μ is the overall mean; Gi is the effect of genotype i; Ej is the effect of environment (site-year combination) j; Rk(j) is the effect of replication k nested in environment j; GEij is the G × E interaction between genotype i and environment j; and eijk is the residual effect associated with genotype i in environment j and replication k. All terms except genotype were estimated as independent and identically distributed random effects. Genotype was defined as a random effect (Yao et al., 2018) to calculate best linear unbiased predictors (BLUPs) using the “coef” functions of the “lme4” package in R. The Cullis broad-sense heritability (H2) (Cullis et al., 2006), recommended for unbalanced datasets (Covarrubias-Pazaran, 2019), was estimated with the “H2cal” function of the “inti” R package (Lozano-Isla, 2022):
    H Cullis 2 = 1 V ¯ Δ BLUP 2 × σ g 2 $$\begin{equation*}\ H_{{\mathrm{Cullis}}}^2 = 1 - \frac{{\bar{V}_\Delta ^{{\mathrm{BLUP}}}}}{{2 \times \sigma _g^2}}\end{equation*}$$
    where V ¯ Δ B LUP $\bar{V}_\Delta ^{B{\mathrm{LUP}}}$ refers to the average standard error of the genotypic BLUPs, σ 2 ${{\sigma }^2}$ refers to variance, and g $g$ refers to genotype.

    2.2 Genotype data

    Genotyping was performed as reported in previous studies that used the SunGrains dataset (Sarinelli et al., 2019; Winn et al., 2022). According to manufacturer's instructions, DNA was isolated using the sbeadex maxi plant kit (LGC Genomics). Genotyping-by-sequencing (GBS) was performed as previously described (Poland et al., 2012), and libraries were prepared at the USDA-ARS Eastern Regional Small Grains Genotyping Laboratory and sequenced on an Illumina HiSeq 2500 or NovaSeq 6000. Reads were mapped to the wheat RefSeq 1.0 genome assembly (Appels et al., 2018) using Burrows–Wheeler aligner (v.0.7.12) (Li & Durbin, 2009). Single nucleotide polymorphism (SNP) discovery was completed with TASSEL 5 GBSv2 (v.5.2.35; Glaubitz et al., 2014). Data were filtered by removing taxa with >85% missing data while retaining SNPs with ≥5% minor allele frequency, ≤20% of heterozygous proportion, and missing data ≤25%. Finally, missing SNP calls were imputed with Beagle v.5.1 (B. Browning et al., 2018; S. Browning & Browning, 2007). The generated variant call format file containing 19,232 SNPs for 6399 breeding lines was converted to a diploid haplotype map (HapMap) using TASSEL 5 (Bradbury et al., 2007). The HapMap was then converted into a numerical matrix (0, 1, 2) using GAPIT (v.3.1.0) in R (Lipka et al., 2012).

    2.3 Genetic map

    The genetic map was constructed using a population of 906 recombinant inbred lines (RILs) derived from the cross between the Synthetic W7984 × Opata M85 (SynOpRIL) (Gutierrez-Gonzalez et al., 2019). The GBS SNP data were used to interpolate recombination distances for the GBS SNP datasets obtained for SunGrains breeding programs. The MonoPoly R package (Turlach & Murray, 2019) was used to fit a ninth-degree monotonically increasing spline to the SynOpRIL genetic map for each chromosome. Recombination distances for the SNP markers used in the present study were then obtained from the fit splines.

    2.4 Progeny simulations

    With the assumption that useful crosses give rise to superior breeding lines that are advanced and later released by breeders, a retrospective analysis was performed to understand the relationship between line advancement and predicted cross merit. Pedigrees of 3084 SunGrains advanced breeding lines were used to identify 2294 lines derived from two-way crosses, from which a total of 217 unique parents with available genome-wide SNP were identified. Phenotype data were available for 194 of the 217 lines. There were 670 breeding lines, including four released cultivars (GA09436-16LE12, AR09137UC-17-2, LA06146E-P4, and ARLA06146E-1-4), that had a two-way pedigree where both parents had genomic and phenotypic data available. The last two released cultivars (also known as AGS3000 and AGS3000-late, respectively) are full sibs that were selected and released as early and late heading lines, respectively. The R package PopVar (Mohammadi et al., 2015) was used to simulate progenies of all 23,436 possible pairwise combinations in a half-diallel mating design [(P × (P − 1))/2, where P = number of parents (= 217)]. The phenotypic estimated BLUPs for YLD, TW, HD, and PH and the high-density genotype data of the TPs (TP1 and TP2) were used as input for PopVar. The genetic map described above was used to allow an accurate estimation of recombination across the genome. Ridge-regression best linear unbiased prediction (rrBLUP) was used for cross-validation and to estimate marker effects, with the number of iterations each population is simulated “nSim” set to 25, the predicted progeny size per cross “nInd” set to 200, and other parameters left as default. To determine whether input data (SNP number or phenotype) could influence predictions, four simulation experiments were completed using four combinations of input parameters (Table 1).

    Upon review, the mean correlation for predicted parameters (μ, VG, μip, and μsp) for the four simulation combinations (Table 1) was 0.90 for all agronomic traits (YLD, TW, HD, and PH) (Figure S1). Given the consistency among these combinations, simulations using 1500 SNPs and BLUPs of the TP1 (GSSP) were only reported in the results for simplicity and brevity.

    2.5 Data analysis

    Pearson's correlations were estimated for the four predicted PopVar outputs: (1) progeny mean (μ) or MP GEBV, (2) VG, (3) mean of inferior 10% progeny (μip), and (4) mean of superior 10% progeny (μsp). To assess whether the progeny simulations matched the empirical data on the 670 breeding lines, Pearson's correlations between observed values and predicted progeny parameters for YLD, TW, HD, and PH were calculated, and relationships were visualized through scatter plots between each trait's progeny mean (μ) and observed values. Correlations between observed MP values and MP GEBV for the four traits were analyzed to assess accuracy further. Pairwise correlations among the four phenotypes were also studied. Correlations were computed using the “cor” function of the “stats” R package (R Core Team, 2013) and the “corrplot” and “cor.mtest” functions of the “corrplot” R package (Wei & Simko, 2017).

    Scatter plots between predicted values of μ versus μsp and VG versus μ for each trait were generated to study whether predictions for the 670 SunGrains lines agreed with breeders’ decisions to advance lines to later stage evaluation nurseries (SunWheat, GAWN, and USW) or to drop from the breeding program following preliminary evaluation (SPE and SPL). Unlike YLD and TW, where breeders directionally select for higher values, optimal values for HD and PH are intermediate and breeders frequently select against extremes (i.e., stabilizing selection). To identify HD and PH windows where progenies could have top yield potential, scatter plots between predicted μ for YLD versus agronomic trait (HD or PH) were generated.

    To examine the performance of populations derived from best-by-best crosses, scatter plots of predicted progeny mean (μ) versus mean of superior 10% progeny (μsp) for YLD were labeled in two different ways. First, based on YLD estimated BLUPs, 30 top-yielding parents were identified, and their 435 derived simulated progenies were highlighted. Additionally, categories representing the number of years between each parent's last year of evaluation (0–3, 4–7, 8–11, and 12–14) were highlighted to compare rapid cycling (new line by new line crosses) with the benefit of potentially reusing or recycling older lines as parents for superior crosses. Second, based on YLD means for each nursery–year combination (e.g., SPL-2008 or GAWN-2022), the top two yielding parents were selected, and their simulated progeny were marked in the plot. All plots were generated using the “ggplot” package in R (Wickham, 2016).

    To understand why two distinct clusters were observed in the PH scatter plots of μ versus μsp and VG versus μ, various analyses were completed. First, an estimation of marker effects was calculated through the “mixed solve” function of the R “rrBLUP” package (Endelman, 2011). Second, KASP marker data of major PH genes (Rht1, Rht2, and Rht8) were inspected along with the parental allelic combinations on the 23,436 predicted progenies. Third, a principal component analysis was generated using the SunGrains’ lines SNP data with “prcomp” package (R Core Team, 2013) and plotted with “ggplot.” Fourth, a genome-wide association study (GWAS) was deployed using a mixed linear model in TASSEL 5 (Bradbury et al., 2007) to identify SNPs associated with the distinct clustering. The Manhattan plot was created using the “qqman” R package (Turner, 2014), and thresholds were calculated with the Bonferroni method using the “CalcThreshold” function of the “Rainbow” package (Hamazaki & Iwata, 2020). Parental allelic combinations at the significant SNPs were assessed on both the set of 670 SunGrains lines and the 23,436 cross simulations.

    3 RESULTS

    3.1 Relationships among observed phenotypes

    Using observed data for the 670 wheat lines evaluated in multilocation yield plots, the broad-sense heritability was 0.49 for YLD, 0.35 for TW, 0.63 for HD, and 0.58 for PH. The relationships among trait BLUPs were low, ranging from −0.05 to 0.29 (Figure S2). Fortunately, the two most correlated traits were YLD and TW (= 0.29), which favor increasing traits in tandem. Using BLUPs generated from the wide range of testing environments (Table S1) resulted in no significant relationship between YLD and HD (= 0.05), while YLD had a small positive correlation (= 0.12) with PH. Given absent or low relationships, with no negative correlations in particular, among the four traits under study as well as the predominant focus on increasing YLD, a multi-trait selection index was not explored for predicting progeny performance.

    The released and commercialized cultivars LA06146E-P4 and ARLA06146E-1-4, which are full sibs from the same cross, were independently selected in contrasting environments (Louisiana and Arkansas, respectively). As a result, the two cultivars have a near 10-day difference in HD (97.4–106.6), demonstrating high VG for the cross. Meanwhile, the predicted μ and VG for HD of this family were 102.4 and 0.86, respectively, which was early maturity with a slightly greater than average VG (Figures 2, 3, and 4c).

    Details are in the caption following the image
    Relationship between best linear unbiased predictors (BLUPs) of observed data (x-axis) and predicted progeny mean (μ) (y-axis) for yield (= 0.30) (a), test weight (= 0.25) (b), heading date (= 0.52) (c) and plant height (= 0.27) (d). Data for 670 SunGrains breeding lines are represented by colored/shaped dots. Red stars denote the four released lines. The most advanced nursery where each of the 670 lines were tested at is indicated by color–shape combination of the dots as follows: USW (Uniform Southern Soft Red Winter Wheat Nursery) indicated by light blue squares, GAWN (Gulf Atlantic Wheat Nursery) by dark blue triangles, SunWheat (SunGrains Advanced Wheat Nursery) by dark gold diamonds, SPE (SunGrains Preliminary Early Nursery) by light green circles, and SPL (SunGrains Preliminary Late Nursery) by dark green circles. Horizontal and vertical dotted gray lines represent the means of each variable. Correlations are shown in Figure S2A,B (in pink).
    Details are in the caption following the image
    Relationship between predicted progeny mean (μ) (x-axis) and superior 10% progeny (μsp) (y-axis) for yield (kg ha−1) (= 0.98) (a), test weight (kg hL−1) (= 0.99) (b), heading date (Julian days) (= 0.98) (c) and plant height (cm) (= 0.93) (d). All simulated progenies are represented by black dots and colored/shaped dots (670 SunGrains breeding lines). Red stars denote the four released lines. The most advanced nursery where each of the 670 lines were tested at is indicated by color–shape combination of the dots as follows: USW (Uniform Southern Soft Red Winter Wheat Nursery) indicated by light blue squares, GAWN (Gulf Atlantic Wheat Nursery) by dark blue triangles, SunWheat (SunGrains Advanced Wheat Nursery) by dark gold diamonds, SPE (SunGrains Preliminary Early Nursery) by light green circles, and SPL (SunGrains Preliminary Late Nursery) by dark green circles. Horizontal and vertical dotted gray lines represent the means of each variable. Table in each quadrant details counts and percentages of lines tested in each nursery and overall. Correlations are shown in Figure S2B (in green).
    Details are in the caption following the image
    Relationship between predicted genetic variance (VG) (x-axis) and predicted progeny mean (μ) (y-axis) for yield (kg ha−1) (= −0.01) (a), test weight (kg hL−1) (= 0.07) (b), heading date (Julian days) (= −0.11) (c) and plant height (cm) (= −0.29) (d). All simulated progenies are represented by black dots and colored/shaped dots (670 SunGrains breeding lines). Red stars denote the four released lines. The most advanced nursery where each of the 670 lines were tested at is indicated by color–shape combination of the dots as follows: USW (Uniform Southern Soft Red Winter Wheat Nursery) indicated by light blue squares, GAWN (Gulf Atlantic Wheat Nursery) by dark blue triangles, SunWheat (SunGrains Advanced Wheat Nursery) by dark gold diamonds, SPE (SunGrains Preliminary Early Nursery) by light green circles, and SPL (SunGrains Preliminary Late Nursery) by dark green circles. Horizontal and vertical dotted gray lines represent the means of each variable. Table in each quadrant details counts and percentages of lines tested in each nursery and overall. Correlations are shown in Figure S2B (in orange).

    3.2 Predicted population variance parameters and their correlation with observed BLUPs

    As expected, correlations between trait μ and μsp were consistently high, with all traits having a ≥ 0.93 (Figure S2B). Conversely, correlations between μ and VG were low or nonexistent, oscillating between −0.29 and 0.07. To evaluate prediction accuracy, pairwise correlations were calculated between predicted progeny μ and observed BLUPs of 670 progeny lines selected from predicted families and later evaluated. Low-to-moderate correlations were observed for each trait (Figure 2), ranging from 0.25 (TW) to 0.52 (HD). YLD had a = 0.3 to indicate a moderate relationship existed between predicted progeny μ and derived progeny line mean performance over environments. The correlation between MP BLUP values and MP GEBVs was 0.68, 0.78, 0.82, and 0.72 for YLD, TW, HD, and PH, respectively.

    Using the 23,436 progeny simulations, the average predicted within-family progeny μ for YLD was 4313.9 ± 78.7 kg ha−1 with a VG of 0.75 ± 0.2 (Table 2). For TW, average progeny μ was 73.3 ± 0.3 kg hL−1 with a VG of 0.04 ± 0.01. A progeny mean of 103.4 ± 1.2 Julian days to heading was obtained with a VG of 0.84 ± 0.23, while PH had an average μ of 88.5 ± 1.7 cm and VG of 0.35 ± 0.19. See Table 2 for more descriptive statistics of all progeny predicted parameters.

    TABLE 2. Descriptive statistics including average (Ave.), standard deviation (SD.), minimum (Min.), and maximum (Max.) values for four predicted population genetic variance (PopVar) parameters (μ, VG, μip, and μsp) for four agronomic traits.
    Trait PopVar parameter Ave. SD. Min. Max.
    Yield μ 4313.9 78.7 4031.1 4579.1
    VG 0.75 0.18 0.00 1.68
    μip 4213.5 82.1 3943.1 4540.6
    μsp 4414.4 77.2 4096.0 4658.7
    Test weight μ 73.3 0.3 72.1 74.5
    VG 0.04 0.01 0.00 0.08
    μip 72.8 0.3 71.6 74.4
    μsp 73.7 0.3 72.5 74.9
    Heading date μ 103.4 1.2 98.9 107.5
    VG 0.84 0.23 0.01 2.19
    μip 101.8 1.2 97.6 106.0
    μsp 105.0 1.2 99.8 109.1
    Plant height μ 88.5 1.7 82.7 94.0
    VG 0.35 0.19 0.00 0.97
    μip 86.1 2.0 80.3 92.6
    μsp 91.0 1.6 84.4 96.0
    • Abbreviations: μ, progeny mean; μip, inferior 10% progeny mean; μsp, superior 10% progeny mean; VG, genetic variance.

    3.3 Ability of predicted progeny means to inform superior cross-combinations for key traits

    The predicted progeny μ and μsp were plotted to retrospectively assess how predicted progeny simulation parameters would relate to observed data for the 670 derived lines that were advanced to field testing (Figure 3 and Table S2). The same assessment was made using plots of VG and progeny μ. In addition, the scatter plots contained all 23,436 possible simulations from intercrossing all 217 parents for which genotype and phenotype data were available. The stepwise field testing framework (Figure 1) enabled cross-combination prediction parameters to be generalized for every nursery stage, where each nursery stage (SPE, SPL, SunWheat, GAWN, and USW) represented a collection of lines that was their final destination before being discarded (or released). This answered the following questions: (1) Did lines that were advanced to later stage nurseries (SunWheat, GAWN, and USW) have favorable progeny predictions based on the cross-combination? (2) Were lines derived from pedigrees that were predicted to underperform discarded following preliminary testing (SPE and SPL)?

    For YLD, lines that persisted longer in the breeding pipeline were predicted to have superior YLD, including two released cultivars (GA09436-16LE12 and AR09137UC-17-2), with progeny μ equal to or higher than the overall mean and μsp values higher than the overall mean of top progeny (Table S2). Of the five lines that advanced to the USW, the most advanced nursery (Figure 1), all five had above-average predicted YLD (Figure 3a) with a collective average μ and μsp of 4365.5 and 4469.4 kg ha−1, respectively. Additionally, 75% of lines that advanced to both the GAWN (58 of 77) and SunWheat (106 of 141) nurseries also had above-average means (Figure 3a). GAWN lines had a collective mean of 4362.6 kg ha−1 for μ and 4468.1 kg ha−1 for μsp, while the SunWheat class of lines was lower with a μ of 4341.8 kg ha−1 and μsp of 4444.5 kg ha−1. Lines that did not pass the preliminary testing stage (SPE and SPL) were much more uniformly distributed across the entire range (from low to high) of predicted YLD values, with 67% having a progeny μ over the grand mean of combinations. As such, the SPE/SPL YLD means (in kg ha−1) for μ and μsp were more modest at 4339.7/4322.6 and 4440.8/4424.5 kg ha−1, respectively (Table S2).

    Unlike YLD, TW simulations did not display a trend despite having a wide range for progeny μ and μsp across cross-combinations, with a grand mean falling between 73 and 74 kg hL−1 for all nurseries (Figure 3b and Table S2). Three of the four released lines (GA09436-16LE12, LA06146E-P4, and ARLA06146E-1-4) showed above-average TW, while AR09137UC-17-2 was near the grand prediction mean.

    Similar to TW, HD progeny prediction parameters for later stage nurseries (SunWheat, GAWN, and USW) had a wide overall range (103–105 Julian days). There was a clear trend where lines evaluated in the preliminary early (SPE) headed earlier, where 156 of 220 lines (71%) had a progeny prediction for μ and μsp that were both below average. The percentage of lines tested in the late nursery (SPL) that arose from families predicted to head later than the grand mean was even greater (81%; 162 of 199 lines). Interestingly, the four released cultivars included in the study were predicted to all head earlier than average (Figure 3c).

    With respect to PH, lines that were advanced to later stage nurseries were much more likely to emanate from a progeny that was simulated to have below average height (132 below average, 63 above average). The grand mean predicted parameters for PH were 88.5 and 91 cm for μ and μsp, respectively (Figure 3d).

    3.4 Role of genetic variance estimations in determining progeny usefulness

    As previously mentioned, there were no significant correlations between VG and other predicted cross-combination parameters. In general, progeny lines that advanced to multilocation field testing exhibited a wide predicted VG for YLD (Figure 4a and Table S2). Notably, the majority of lines advancing to later stage nurseries tended to derive from families predicted to have above-average VG for YLD (80% USW, 68% GAWN, and 55% SunWheat), with predicted crosses that gave rise to preliminary lines being more evenly distributed (52%). Despite the two traits being phenotypically correlated, the opposite trend was observed for TW, where the majority of 670 progeny lines demonstrated a narrow VG (Figure 4b). Despite observing a wide range of predicted VG for HD, there was no clear distinguishable trend for this trait (Figure 4c), and the bimodal distribution observed for PH made it difficult to assess trends (Figure 4d). The simulations for PH revealed this clustering pattern that grouped predictions for all population variance parameters (Figures 3d and 4d), including VG, into two distinct clusters. Despite completing numerous analyses to find the cause of the bimodal distribution (Figure S3), a clear explanation was not elucidated.

    3.5 Assessment of pedigrees with agronomic trait intervals that give rise to high-yielding progeny

    In winter wheat, breeders typically select (phenotypically) against extreme values of HD and PH on both sides of their distributions, especially when selecting cultivars for broad adaptation. To mimic this approach and thus identify HD and PH windows where cross-combinations have an increased likelihood of producing superior progeny, scatter plots between predicted μ for YLD versus HD or PH were analyzed. To recall, correlations for progeny μ between YLD versus HD and YLD versus PH were = −0.07 and = 0.16, respectively (Figure S2B). It was observed that 3790 (including 135 existing crosses) parental combinations with high predicted YLD (μ + SD) headed between 100.1 and 106.3 days (Figure 5a) and had a PH ranging from 83.8 to 93.0 cm (Figure 5b). Further, a total of 370 (including 26 existing crosses) cross-combinations with outstanding predicted YLD (μ + 2SD) matured between 101.4 and 105.8 Julian days and their PH oscillated from 86.0 to 91.1 cm. In general, derived lines from predicted cross-combinations that advanced to late-stage field evaluation and demonstrated highest progeny μ for YLD tended to have predicted μ PH very near the grand mean, while predicted μ HD for these theoretically high-yielding families tended to be either near the grand prediction mean or later heading lines than average (Figure 5).

    Details are in the caption following the image
    Relationship between predicted progeny means (μ) (x-axis) for heading date (Julian days) (a) and plant height (cm) (b) versus yield (kg ha−1) (y-axis). Correlations in Figure S2B (in red). All simulated progenies are represented by black dots and colored/shaped dots (670 SunGrains breeding lines). Red stars denote the four released lines. The most advanced nursery where each of the 670 lines were tested at is indicated by color–shape combination of the dots as follows: USW (Uniform Southern Soft Red Winter Wheat Nursery) indicated by light blue squares, GAWN (Gulf Atlantic Wheat Nursery) by dark blue triangles, SunWheat (SunGrains Advanced Wheat Nursery) by dark gold diamonds, SPE (SunGrains Preliminary Early Nursery) by light green circles, and SPL (SunGrains Preliminary Late Nursery) by dark green circles. Horizontal and vertical dotted gray lines represent the means of each variable. Blue and gray areas indicate heading date (A) or plant height (B) windows where progenies have predicted yield ≥4392.7 kg ha−1 (μ + SD) or ≥4471.4 kg ha−1 (μ + 2SD), respectively.

    3.6 Comparison of cross-combinations selected from simulations versus phenotypic best-by-best

    To study performance of the traditional approach of crossing phenotypic best-by-best relative to GEBVs from predicted progeny simulations, two different graphical depictions were completed using the progeny μ versus mean of superior 10% progeny (μsp) scatter plot for YLD (Figure 6). First, 435 progenies derived from 30 parents with top YLD BLUPs were highlighted, along with the information of the difference in years when each parent was last evaluated. Of the phenotypic best-by-best crosses, 96.6% (420 total, including 25 existing crosses derived from 10 pedigrees) were predicted to have above-average YLD. It was noted that regardless of how chronologically distanced two parents were (from 0- to 14-year difference), the offspring tended to have outstanding predicted YLD (Figure 6a). Further, of the 42 nursery–year combinations, the predicted performance of progeny derived from two parents with top mean YLD was identified. In most cases (91%), they displayed higher predicted YLD than the overall means, whereas four progenies developed from the two best-by-best parents from SunWheat-2016, SPE-2015, 2016, and SPL-2015 fell below predicted YLD averages (Figure 6b).

    Details are in the caption following the image
    Relationship between predicted progeny mean (μ) (x-axis) and superior 10% progeny (μsp) (y-axis) for yield (kg ha−1) (= 0.98; Figure S2B in green). All simulated progenies are represented by black dots. Horizontal and vertical dotted gray lines represent the means of each variable. In (a) colored dots represent simulated progenies derived from 30 top parental lines based on yield best linear unbiased predictors (BLUPs); blue diamonds, green circles, yellow triangles, and red squares represent progenies where the corresponding parents were last evaluated with 0–3, 4–7, 8–11, and 12–14 years difference, respectively. In (b) the shape-color combination denotes a simulated progeny originated from two top parents identified based on yield means in each nursery–year combination. Shapes indicate the nursery: GAWN (Gulf Atlantic Wheat Nursery) by triangles, SunWheat (SunGrains Advanced Wheat Nursery) by diamonds, SPE (SunGrains Preliminary Early Nursery) by circles, and SPL (SunGrains Preliminary Late Nursery) by squares. Colors denote 15 years from 2008 to 2022. Black crosses indicate populations that have been developed.

    4 DISCUSSION

    The selection of parental lines that are most likely to generate high-performing genotypes is critical to increasing the genetic gain of a target trait and enhancing the efficiency of the cultivar development pipeline (Jean et al., 2021; Yao et al., 2018). Before the genomic era, the Up was utilized to identify the best parental combinations (Schnell & Utz, 1975). Nowadays, genome-wide selection methods have been leveraged to identify promising cross-combinations by simulating progeny means and VG (Bernardo, 2014; Lado et al., 2017; Mohammadi et al., 2015). In this context, tools such as PopVar have allowed the estimation of biparental population parameters that could inform breeders of the most valuable crosses (Mohammadi et al., 2015).

    With the assumption that certain populations possess a greater propensity to give rise to superior breeding lines that are advanced and later released by breeders, a retrospective analysis was conducted to assess the accuracy and utility of key simulated population parameters to select valuable crosses. Historical phenotype and genotype data of wheat breeding lines (TPs) that included 217 parental lines (representing 670 field-tested breeding lines) were entered into PopVar to simulate 23,436 potential populations, a number that far exceeds the capacity of any breeding program. To examine the practical value and accuracy of progeny simulations and the persistence of lines through the narrowing selection pipeline, predicted progeny means (μ and μsp) and VG were compared to observed BLUPs for YLD, TW, HD, and PH of 670 breeding lines. Low-to-moderate prediction accuracies were observed between the predicted progeny parameters and observed values (0.25–0.52). Values for YLD, TW, and HD were expected due to the nature and heritability of the traits; however, PH exhibited lower values than anticipated (Ballén et al., 2022; DeWitt et al., 2021; Heffner et al., 2011). Additionally, the significant correlations between observed and predicted MP values (0.68–0.82) indicated that GP is useful for cross-prediction.

    Of the 670 lines subjected to field evaluation and selection for YLD, 223 (33.3%) were retained and tested in advanced nurseries, including four releases. Of these 223 breeding lines, 169 (76%) were derived from cross-combinations that were predicted to have above-average YLD. This suggests that GP of progeny performance for YLD, a primary target trait, and SunGrains breeders’ assessment largely agreed. Similar results were described in a similar retrospective analysis in soybean, where 91% of the superior crosses retained by breeders were predicted to have above-average YLD (Jean et al., 2021), and in barley, where crosses selected and advanced by breeders (≥F6) had a high predicted YLD means (Abed & Belzile, 2019). In contrast, a number of retrospective validation crosses predicted to generate high-yielding progeny were not preserved by breeders. The reasons could be that lines may not have shown satisfactory performance for non-yield traits (e.g., disease resistance or HD), predictions might not have been accurate (Abed & Belzile, 2019; Jean et al., 2021), or breeders may have mistakenly discarded valuable crosses. In this context, predicted superior multi-trait performance (Mohammadi et al., 2015) (e.g., high YLD, improved resistance to a pest/disease, and optimal agronomic trait values) could be helpful to develop lines adapted to specific conditions across target production environments (Benaouda et al., 2022; Boyles et al., 2019; Crespo-Herrera et al., 2022). Furthermore, although the correlation coefficients between predicted YLD versus HD and YLD versus PH were trivial (−0.07 and 0.16, respectively), it was possible to identify windows of HD and PH where parental combinations were more likely to produce above-average yielding progenies. Looking at predicted values for agronomic traits (including HD and PH) would allow breeders to identify cross-combinations that are more likely to produce better adapted lines across geographic regions (Boyles et al., 2019).

    Generally, crossing the best with the best, based on both line per se performance and the line's record of having produced superior progeny, is expected to have a higher probability of generating superior progeny due to additive gene action (van Ginkel & Ortiz, 2018). To assess how well this traditional approach works in the absence of GPs, and how well progeny simulations agree with this, offspring derived from parents with top observed YLD values were studied. Crossing phenotypic best-by-best still appears efficient in enhancing genetic gain for a target trait, given PopVar progeny simulations consistently aligned with best-by-best wheat crosses examined in this study. Part of this effect may relate to the greater replication of older lines, which produces less shrinkage of predicted line performance to the mean relative to less-observed, newer lines. While it is rare to cross top old lines (recycled) with newer advanced lines, such as those with 8- to 14-year difference between the last evaluation year, the predictions demonstrated that these could potentially generate new breeding materials with superior performance. Hybridizing old and new elite lines from within the breeding program is necessary to enhance the genetic gain over time. The genotypic and phenotypic data collected on progenies from these crosses can be utilized to train and improve the predictive ability of GP models (Rutkoski et al., 2022). However, if old lines have been discarded, the best lines tested closer in time (0- to 7-year difference) could also produce superior progeny.

    Breeders commonly use high progeny mean (or MP value) to identify superior parental combinations for a given trait (Bernardo, 2014; Jean et al., 2021); however, the expected VG would be a valuable criterion on which to discriminate among populations with similar progeny means (e.g., when crossing elite parents) (Beckett et al., 2019; Merrick et al., 2022; Osthushenrich et al., 2018). Previous work suggests that the μ is the major determinant for identifying superior crosses for yield (YLD) in wheat, whereas the VG has more influence on end-use quality traits (Lado et al., 2017; Yao et al., 2018). In barley, two scenarios were described: (1) when the primary trait for selection was YLD, both predicted progeny μ and VG were meaningful determinants to identify lines with outstanding yield but not for lower deoxynivalenol (DON) levels where their distribution was more scattered, and (2) when the emphasis for selection was placed on DON (pre-breeding), the predicted μ and VG were more variable for both YLD and DON (Abed & Belzile, 2019). In soybean, progeny μ facilitated the identification of superior crosses with above-average YLD (64%) and below-average maturity (73%), and VG did not influence selection for either high YLD or a specific maturity window (Jean et al., 2021). Results from this study reinforced that μ, and by extension μsp, was a strong driver for highlighting superior crosses for YLD (advanced into SunWheat, GAWN, and USW nurseries). In contrast, VG did not have an influence on identifying top-yielding crosses. Conversely, there was no clear effect of predicted μ or VG to improve overall cross-selection for the other three evaluated traits (TW, HD, and PH). Though the influence of VG was a minor contributor to identifying improved progeny lines in SRWW, it remains valuable for maintaining genetic diversity within the breeding materials (Lado et al., 2017).

    5 CONCLUSION

    This retrospective study predicted progeny performance using genome-wide marker effects and historical phenotypic data to inform breeders of the most valuable crosses among all possible parental combinations. Predicted progeny μ of biparental populations was moderately correlated with per se trait performance of derived inbred lines that breeders advanced to regional yield trials. These positive correlations suggest that selecting crosses based on simulated trait means and variances would effectively allow a reduction of total crosses made each cycle to allocate more resources to downstream segregating populations that are more likely to yield superior lines for cultivar development. Increased predicted within-cross genetic variation is associated with greater genetic diversity between parents. As such, even if the information on predicted cross variance only marginally improves yearly genetic gain, selecting based on cross variability will improve genetic gain in the long term by maintaining the diversity of the breeding program. Further, simulating progeny performance to select the best cross-combinations and coupling with doubled haploid technology would accelerate the breeding cycle and theoretically increase genetic gain. Results from this study provide considerable evidence that progeny predictions using genomic information of prospective parents can be leveraged by breeding programs to concentrate on more rewarding populations for crop improvement. Finally, this study in winter wheat and previous reports in soybean and barley have shown that progeny simulations to select superior cross-combinations could be implemented in other self-pollination crops. Extensive, high-quality phenotypic and genotypic information is key to building reliable GP models.

    AUTHOR CONTRIBUTIONS

    Carolina Ballen-Taborda: Conceptualization; data curation; formal analysis; investigation; methodology; visualization; writing—original draft; writing—review and editing. Jeanette Lyerly: Data curation; formal analysis; investigation; methodology; writing—review and editing. Jared Smith: Data curation; formal analysis; investigation; methodology. Kimberly Howell: Data curation; formal analysis; investigation; methodology. Gina Brown-Guedira: Data curation; formal analysis; funding acquisition; investigation; methodology; writing—review and editing. Noah DeWitt: Data curation; formal analysis; investigation; methodology; writing—review and editing. Brian Ward: Data curation; formal analysis; investigation; methodology; writing—review and editing. Md Ali Babar: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. Stephen A. Harrison: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. Richard E. Mason: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. Mohamed Mergoum: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. J. Paul Murphy: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. Russell Sutton: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. Carl A. Griffey: Data curation; funding acquisition; investigation; methodology; resources; writing—review and editing. Richard E. Boyles: Conceptualization; data curation; formal analysis; funding acquisition; investigation; methodology; project administration; resources; supervision; visualization; writing—original draft; writing—review and editing.

    ACKNOWLEDGMENTS

    This work was primarily supported by the USDA NIFA AFRI Foundational project SC-2020-03599 awarded to Richard E. Boyles “Cultivar Development: Combining Genomics-Enabled Breeding with Coordinated Regional Testing to Accelerate Wheat Genotype to Market (award no. 2021-67014-33941)”, but also funded in part by the US Wheat & Barley Scab Initiative within the Variety Development and Host Resistance Southern Winter Wheat Coordinated Project (FY22-SW-004). Historical phenotypic and genomic data were provided by the SunGrains breeding cooperative as many scientists, postdocs, students, and technicians have greatly contributed over the years to the collection and compilation of these data. The authors acknowledge the continuous genomics and genetics support from the USDA-ARS Eastern Regional Small Grains Genotyping Laboratory (ERSGGL) in Raleigh, NC.

      CONFLICT OF INTEREST STATEMENT

      The authors declare no conflicts of interest.

      DATA AVAILABILITY STATEMENT

      Genomic and phenotypic SunGrains datasets are not readily available. Requests to access these data should be directed to the corresponding author.