Utilization of Multiyear Plant Breeding Data to Better Predict Genotype Performance

Despite the availability of multiyear, multicycle, and multiphase data in plant breeding programs for annual crops, selection is often based on single-year, single-cycle, and single-phase data. As genotypes in the same fields are usually grown under the same management practice, data from these fields can and should be analyzed together. In Monsanto’s North American maize (Zea mays L.) breeding program, this approach enables a spatial model to be fitted in each field, providing an estimate of spatial trend and a better estimate of residual variance in each field. Multiyear, multicycle analysis showed that the estimates of genotype ́ year variance (VGY) and genotype ́ year ́ location variance (VGYL) were still the largest components of the estimated phenotypic variance. Analysis of any single-year subset of the data inflated the estimate of genotypic variance (VG) by the size of the estimate of VGY, resulting in potential bias in the estimates of genotype performance. These results demonstrate the advantage of a combined analysis of data across years and cycles to make selection decisions for genotype advancement. V. N. Arief, I. H. DeLacy, and K.E. Basford, The Univ. of Queensland, School of Agriculture and Food Sciences, St. Lucia, QLD 4072, Brisbane, Australia; H. Desmae, International Crops Research Institute for Semi-Arid Tropics–West & Central Africa (ICRISAT– WCA), Bamako, Mali; C. Hardner, The Univ. of Queensland, Queensland Alliance for Agriculture and Food Innovation, Brisbane, QLD 4072, Australia; A. Gilmour, Statistical and ASReml Consultant, Cargo, NSW 2800, Australia; J. K. Bull, The Climate Corporation, 4 Cityplace Dr., St. Louis, MO 63141, USA. Received 15 Mar. 2018. Accepted 26 Nov. 2018. *Corresponding author (k.e.basford@uq.edu.au). Assigned to Associate Editor Marcio Resende Jr. Abbreviations: BLUE, best linear unbiased estimate; BLUP, best linear unbiased prediction; MET, multi-environment trial; TPE, target population environments. Published in Crop Sci. 59:1–11 (2019). doi: 10.2135/cropsci2018.03.0182 © Crop Science Society of America | 5585 Guilford Rd., Madison, WI 53711 USA This is an open access article distributed under the CC BY license (https:// creativecommons.org/licenses/by/4.0/). Published online February 14, 2019

variability for each individual environment (Piepho et al., 2012).In contrast, a two-stage analysis is computationally efficient (Möhring and Piepho, 2009) and can handle a larger amount of data or more complex models than a one-stage analysis.A two-stage analysis can be used to model any specific randomization layout and withinenvironment error for each individual environment in the first stage (Smith et al., 2001;Welham et al., 2010;Piepho et al., 2012), then use the adjusted genotype mean for the across-environment second-stage analysis.In the analysis of plant breeding MET data, a spatial model is usually fitted for each environment (Cullis and Gleeson, 1991;Cullis et al., 1998;Smith et al., 2001) to produce a spatially adjusted genotype mean in each environment.The spatially adjusted genotype means are then combined as the data for the second stage, across-environment analysis.The loss of efficiency in a two-stage analysis is reduced by weighting the adjusted means according to the (inverse) predicted accuracy of the value in the individual environment analysis (Cullis et al., 1996;Möhring and Piepho, 2009;Piepho et al., 2012).
It is a common practice in plant breeding programs to conduct multiphase selection to identify the elite genotypes from all those derived from each cycle of the breeding program.A cycle of breeding is a period from initial crossing to final selection phase.For annual crops, each cycle of breeding usually consists of two or three selection phases, each corresponding to 1 yr of multilocation field trials.These multilocation field trials are also used to test genotypes from multiple breeding cycles that were in the same or different selection phases.This practice results in a large number of genotypes being tested every year in any given location.Often, these genotypes are divided into sets depending on their breeding cycle, selection phase, genetic relationship in the breeding program, and the TPE for which they were bred.Some established genotypes and common checks are usually included in each trial to provide connection among sets, especially for sets that target the same TPE (Piepho et al., 2006).All genotypes in the same field are normally grown under the same field management, although they are from different sets, so an overall experimental design that minimizes the estimated error within a field should be used (Basford et al., 1996;Williams and John, 1996;Federer, 2005).Moreover, a spatial analysis for a field can be conducted using all available data in that field.
For annual crops, genotype performance is usually evaluated on a single phase and single cycle of data across locations, hence a single-year analysis.The results from the analysis of any single year's data have been shown to overestimate the genotypic (V G ) and genotype ´ location (V GL ) variances, as they were confounded with the genotype ´ year (V GY ) and genotype ´ year ´ location (V GYL ) variances, respectively (Nyquist and Baker, 1991;Holland and Nyquist, 2010;Arief et al., 2015).Moreover, as the same genotypes appear in different phases within and across cycles, it is logical to combine data across selection phases and across cycles to provide better estimates of the performance of the common genotypes.In the past, multiyear analysis was impractical due to the lack of computing power to handle large unbalanced data.However, a two-stage mixed model analysis is now enabling multiyear data analysis.
It has been shown that multiyear data analysis would provide better estimates of V G and V GL , thus better estimating genotype performance (Arief et al., 2015).In this study, we evaluate the advantage of using multiyear, multiphase data for selection through simulated data for one of Monsanto's North American maize (Zea mays L.) breeding programs.

Maize Data
Data were obtained from METs from one of Monsanto's North American maize breeding programs.The hybrids, which from this point on are referred to as genotypes, were evaluated in three phases (P1 to P3) of testing in a breeding cycle.The best genotypes from P1 were advanced to P2 the following year, and the best genotypes from P2 were advanced to P3 the third year.In these METs, genotypes were grouped into sets.These sets corresponded to breeding cycles, selection phases, relative maturity zones, breeding objectives, or special purposes.
Yield data (t ha −1 ) collected from the advancement trials for genotypes targeted for deployment in the Midwest (referred to as Midwest genotypes) were used for this study.The data consist of >10,000 unique genotypes tested in >1000 fields across hundreds of locations in several states for a 6-yr period.A field is where the genotypes were grown, whereas a location is a collection of nearby fields.These genotypes were obtained from eight breeding cycles, of which four cycles (C3-C6) had data for all three phases.There were common genotypes across years (Table 1, lower triangular values) corresponding to the selection phases (i.e., 3-yr selection cycle).However, as indicated in the paragraph above, there were also common genotypes across the different phases and cycles within a year and across years.Common genotypes across phases within a cycle were genotypes that were selected to be advanced to the next phase.Common genotypes across cycles were genotypes that were selected from the previous cycles or checks.As expected, the number of common genotypes across phases within cycles was greater than the number of common genotypes across cycles.The number of common locations was relatively consistent across years (Table 1, upper triangular values).

Data Analysis
The two-stage analysis approach (Smith et al., 2001, Welham et al., 2010) was used to estimate the variance components in the maize MET with mixed-model restricted maximum likelihood (Gilmour et al., 1997) implemented in ASReml (Gilmour et al., 2009).
The first-stage analysis was applied to all data in a field, including data for genotypes that were not targeted for the Midwest  region (referred to as "other" genotypes) but were planted in the same fields as Midwest genotypes and assumed to be grown under the common agronomic conditions.In the second-stage analyses, only data for Midwest genotypes were analyzed.

First-Stage Analysis: Field Spatial Analysis
This analysis was conducted for each of the fields where genotypes targeted for the Midwest were grown.These fields varied in size and in field arrangement (Fig. 1).Some fields were also used to test other deployment regions for non-advancement trials (Fig. 1).Replications, if present, were placed within genotype sets (Fig. 1a, 1e, and 1f ).The heatmap for observed yield showed patterns that were associated with environmental factors such as ponding (Fig. 1a  and 1b), the effects of relative maturity and set (Fig. 1c), or management factors such as usage of pivot irrigation (Fig. 1d).A spatial model (Cullis and Gleeson, 1991;Cullis et al., 1998;Smith et al., 2001) was fitted to take account of those patterns.An extra step was added to the first-stage analysis to cope with the limited number of replications.In the first step, genotype was fitted as a random effect to provide an estimate of the spatial trend.This step enables all genotypes to be used to estimate the trend.In the second step, genotype was fitted as a fixed effect to obtain best linear unbiased estimates (BLUEs) that were required for the second-stage analysis (Smith et al., 2001), and the variance components were fixed to the values obtained from the first step.
Ideally, a customized spatial model would be fitted for each field.However, due to the size of the data and number of trials, this approach was not practical.Moreover, it is a company requirement that the analysis must be automated.Therefore, the following spatial model was chosen as the base model for both steps in the first-stage analysis for each field: where y ipq is the grain yield for genotype i in row p and column q; m is the intercept; lin(r) and lin(c) are the fixed effect of linear function of rows and columns, with lin(c).lin(r)being their interaction; (b|s) jk is the effect of block j within set k; g i is the effect of genotype i; (r|b|s) pjk and (c|b|s) qjk are the random effects 4. Across-year, single-cycle data for cycles with complete phases (i.e., C3 to C6, four analyses) to obtain variance estimates for each cycle.5. Across-year, multicycle data for cycles with complete phases (i.e., data for C3 to C6, one analysis) to obtain variance estimates for each cycle.The analysis of this dataset provided estimates of genotypic variance for each cycle with a pooled estimate of genotype ´ year, genotype ´ location, and genotype ´ year´ location, respectively.6.All across-year data (two analyses) to obtain an overall estimate of genotypic variance and an estimate for each selection phase with a pooled estimate of genotype ´ year, genotype ´ location, and genotype ´ year ´ location, respectively.
The estimate of genotypic variance from any single-year data (i.e., Datasets 1 and 2) was expected to be inflated by the genotype ´ year variance.Similarly, the estimate of genotype ´ location variance from any single year was expected to be inflated by the genotype ´ year ´ location variance.The size of these confounded effects were evaluated by comparing the variance components from single-year analysis with the variance components from multiyear analysis (i.e., Datasets 3 to 4).
The models used to obtain the estimate of variance components were as follows: e it jk applied to single-year, single-stage data (Dataset 1) and singleyear data (Dataset 2) to obtain an estimate of genotypic variance and genotype ´ location variance for each phase and each year for Dataset 1, and for each year across stages for Dataset 2. 2. z it( jk)  w it( jk) = m + e t( jk) + g i + (gl) ij + (gy) ik + (gly) ijk + ( ) e it jk applied to single-phase across-year data (Dataset 3) and single-cycle data for cycles with complete phases (Dataset 4) to obtain an estimate of genotypic variance for each phase for Dataset 3, and for each cycle for cycles with complete phases for Dataset 4. This model was also applied to all data (Dataset 6) to obtain overall genotypic variance across years, stages, and cycles.In addition to genotypic variance, this model also provided variance estimates of genotype ´ location, genotype ´ year, and genotype ´ year ´ location, respectively.3.
e it jk applied to multicycle data (Dataset 5) to obtain genotypic variance for each cycle with pooled variance for genotype ´ year, genotype ´ location, and genotype ´ year ´ location variance, respectively.This model was used to remove the confounded effect of genotype ´ year variance in the estimate of genotypic variance for each cycle.4. z it( jk)  w it( jk) = m + e t( jk) + g i |s m + (gl) ij + (gy) ik + (gly) ijk + ( ) e it jk applied to all data (Dataset 6) to obtain genotypic variance for each phase with pooled variance for genotype ´ year, genotype ´ location, and genotype ´ year ´ of row p and column q, respectively, within block j and set k; spl(r) and spl(c) are the spline effects of row r and column c, respectively; and e ipq is the residual effect modeled using firstorder autoregression (AR1) for row r and column c.As noted above, the first-step analysis was used to obtain estimates of the variance components of the random spatial terms (i.e., rows and columns terms) when genotype was fitted as a random effect.In the second step, these estimates of variance components were used to obtain the genotype BLUEs and their respective weights by fitting the genotype as a fixed effect.The weights were calculated as the inverse of residual variance for each field multiplied by the replication number for each genotype and the pooled residual variance across fields (Cullis et al., 1996).
As trial sizes varied from field to field, this base spatial model was adjusted according to the following criteria: 1.No spatial model was fitted for a field with 10 or fewer replicated observations (i.e., when no. of nonmissing observations − no. of unique genotypes £ 10).For these fields, yield and mean yield were used as BLUEs, and the weights were set as 1 (as the residual variance for these fields was assumed to be equal to the estimated pooled variance and the number of replications was set to 1). 2. For fields where a spatial model was fitted: a.The term (b|s) jk was only fitted when all genotypes were replicated; b.The term lin(c).lin(r)was only fitted when the number of rows and columns were both more than five; c.The terms spl(r), (r|b|s) pjk , and AR1(r) were only fitted when the number of rows was five or more; and d.The terms spl(c), (c|b|s) qjk , and AR1(c) were only fitted when the number of columns was five or more.
These first-stage analyses were conducted as an automated process using R software (R Development Core Team, 2008).This process was designed to report fields with convergence problems.This automated process also produced a series of plots for each field to display the results from the first-stage analysis.A variogram was produced to display the autoregression correlation of residuals, and a heatmap each was produced for centered yield data, genotype BLUEs, residual effects, and trend effects (calculated as the sum of all spatial terms in the model).These plots were used to evaluate the fit of the base spatial model and could be used by the breeder to get a good understanding of the fields being used for these trials.

Second-Stage Analysis: Across-Location Analysis
Across-location analyses were conducted using the BLUEs and weights for genotypes designated for Midwest advancement trials.Across-location analyses were conducted for the following data: 1. Single-year, single-phase data (18 analyses) to obtain variance estimates for each year and each phase.2. Single-year, across-phase data (six analyses) to obtain variance estimates for each year.3. Across-year, single-phase data (three analyses) to obtain variance estimates for each phase.
location, respectively.This model was used to remove the confounded effect of genotype ´ year variance in the estimate of genotypic variance for each phase; where z it( jk)  w it( jk) is the weighted BLUE from the first-stage analysis for genotype i in field t within location j and year k; e t( jk) is the effect of environments defined as fields t within location j and year k; g i is the effect of genotype i; (gl) ij is the effect of genotype i in location j; (gy) ik is the effect of genotype i in year k; (gly) ijk is the effect of genotype i in location j and year k; g i |f n is the effect of genotype i within cycle n; g i |s m is the effect of genotype i within phase m; and ( ) e it jk is the pooled residual.In a case when all fields in the datasets had <10 replicated observations, the weights were set to 1 and the pooled residual was set as the pooled residual from overall data.
The variance components obtained from these analyses were presented as a percentage of phenotypic variance (V P ).Phenotypic variance was calculated as the sum of genotypic variance (V G ), variance of genotype ´ year interaction (V GY ), variance of genotype ´ location interaction (V GL ), variance of genotype ´ year ´ location (V GYL ), and residual variance (V R ).For Models 3 and 4, V P was calculated using an average V G across cycles and phases, respectively.

Simulated Multi-environment Trial Data
Simulated data were used to evaluate the use of single-year and multiyear data for selection.The phenotypic data were calculated as the sum of genotypic (G), genotype ´ year (GY), genotype ´ location interaction (GL), genotype ´ year ´ location (GYL), and residual (R) effects.These effects were drawn from normal distributions with zero mean and variance of V G , V GY , V GL , V GYL , and V R , respectively.These variance components were estimated using all data (Dataset 6) analyzed with Model 4 and expressed as the percentage of V P (i.e., V P = 100%).As V G was truncated due to selection (Cochran, 1951), the values of G were generated based on Stage 1's V G and retained throughout the phases within a cycle (Arief et al., 2013).The values of GY, GL, GYL, and the residual were generated for each phase and each cycle assuming selection does not affect these variances.
For each cycle, a fixed percentage of top-performing genotypes were advanced from phase to phase.The residual variance was estimated from two replications of 20 checks.The phenotypic value for these checks was randomly generated in a similar way to the genotypes.These checks were not included in selection, and two of these checks were replaced every year.Simulated data were generated for 20, 40, and 60 locations for Phases 1, 2, and 3, respectively.
Three selection scenarios were evaluated.In all these scenarios, selection was always made within phase and cycle with no reselection (i.e., nonselected genotypes could not be reselected in later phases or later cycles).There was also no common genotypes across cycles, except for checks.In the first scenario, selection was based on genotype performance calculated from single-year, single-phase data, as in current practice.This scenario was appropriate for analyzing Dataset 1.In the second scenario, selection was based on genotype performance calculated from single-cycle data (i.e., three consecutive years of data).This scenario was appropriate for analyzing Dataset 4. In the third scenario, selection was based on genotype performance calculated from all available data from 1 to 8 yr in the 8-yr period.This scenario was appropriate for analyzing Dataset 6.Each scenario was evaluated using 500 simulated datasets.
Three criteria were calculated for each simulation scenario: 1.The ratio of V G to V P (i.e., per-plot heritability) to evaluate the accuracy of variance component estimation.This value was compared with the expected ratio calculated from the parameter variance components.2. The correlation between the estimated genotype performances and their parameter G values to evaluate the accuracy in estimating genotype performance.This correlation was calculated based on the performance of the current-year Phase 3 genotypes and their G value. 3. The proportion of common selected genotypes between the two selection methods.A set of 100 genotypes was selected either based on the estimated performance or based on the parameter G values.The number of genotypes that were selected in both methods was calculated and divided by 100.This proportion of common genotypes was used to evaluate how similar the two selection methods were in ranking the genotypes.As in the second criterion, selection was based on the performance of the current-year Phase 3 genotypes.

First-Stage Analysis of Real Data
The spatial model managed to model the spatial trends in all 81% of the fields where the model was fitted.The spatial model was not fitted for the remaining 19% of the overall total fields, as these fields had <10 repeated observations.There were no convergence problems with this spatial model.This ease of calculation was likely due to the two-step approach, where in the first step genotype was fitted as random, hence all plots were used to estimate the trend.The terms used to model the trend (i.e., spline terms for row and column, random row and column, and replication) generally captured most of the pattern in the field and removed that trend from the genotype effects (e.g., Fig. 2b, 2e, and 2f ).The AR1 model fitted for rows and columns also adequately modeled the residual correlation (Fig. 2b, c, and d).However, as one base spatial model was used in all fields, there were some fields where the patterns were still confounded with residuals (Fig. 2a and  2d).In a few cases, the field trends were confounded with set effects (Fig. 2c).

Estimates of Variance Components
All variance components, except for environment variance (V E ), were expressed as a percentage of V P .Environment variance was expressed as a percentage of total variance (i.e., V P + V E ) and varied from 57 to 83% across datasets, with an average of 73%.The V R was the largest Fig. 2. Results from the first-stage analysis for the six example fields.Each field has five plots: a variogram to display the autoregression (AR1) correlation model for the residual, a heatmap for centered yield data (t ha −1 ) (Yield), a heatmap for genotype effects (best linear unbiased estimates [BLUEs]) (Genotype), a heatmap for residual effects (Residual), and a heatmap for the sum of block, column, row, linear row, linear column, spline column, and spline row effects (Trend).The color intensity from blue to red indicates the range of the values from small to large.
component of V P and was relatively consistent across datasets (mean = 58%, minimum = 47%, and maximum = 66%).The V G from single-year analysis (Fig. 3a and 3b) was always greater than the one estimated from across-year analysis (Fig. 3c-3e, black bars).The V G from any singleyear analysis (Fig. 3a and 3b) was roughly equivalent to the sum of V G and V GY from across-year analysis (Fig. 3c-3e; black and white bars, respectively).The same pattern was observed for V GL , where V GL from any single-year analysis (Fig. 3h and 3i) was roughly equivalent to the sum of V GL and V GYL from across-year analysis (Fig. 3j-3n; black and white bars, respectively).For any multiyear analysis, V GL was always the smallest component of V P (Fig. 3j-3n, black bars).
The V G from single-phase analysis showed an increasing pattern across phases (Fig. 3c), whereas V G estimated from the analysis of all data showed decreasing pattern from P1 to P3 (Fig. 3g).This discrepancy and the decreasing V GY across phases from single-phase analysis (Fig. 3j) indicated the possibility of some confounding effects between V G and V GY , especially as there were not many common genotypes across cycles (Table 1).In contrast, the estimates of V G from single-cycle analysis (Fig. 3d) were consistent with the estimates of V G from multicycle analysis (Fig. 3e).The two models (Models 2 and 4) fitted to all data produced similar estimates of V GY , V GL , and V GYL (Fig. 3m and 3n), with V G from Model 2 (Fig. 3f ) similar to the average V G across phases from Model 4 (Fig. 3g).

Simulation Results
The simulation results showed that across-year analysis provided better estimates of the variance components than single-year analysis (i.e., single-year, singlephase, and 1-yr data; Fig. 4a).The V G /V P ratio from single-year data was higher than the expected ratio of 0.1 and closer to the (V G + V GY )/V P ratio of 0.17.The correlation of genotypic performance (i.e., best linear unbiased prediction [BLUP]) and the genotype parameters (G) from multiyear analysis was also higher than those from single-year analysis (Fig. 4b).As expected, the proportion of common top 100 genotypes showed a similar pattern as the correlation (Fig. 4c).Given those three criteria, the results from single-cycle analysis were similar to the ones from multiyear analysis with three or more years (Fig. 4).In general, there was no improvement in all three criteria after 3 yr.´ year (V GY , white), and the bottom row contains the genotype ´ location (V GL , black) and genotype ´ year ´ location (V GYL , white) variances.These variance components were estimated from (a, h) single-year, single-cycle, and single-phase data; (b, i) single-year, across-phase data; (c, j) across-year, single-phase data; (d, k) acrossyear, single-cycle data; (e, l) across-year, multicycle data; (f, m) across-year data with overall V G , V GY , V GL , and V GYL ; and (g, n) across-year data with estimated V G for each cycle and overall V GY , V GL , and V GYL .

DISCUSSION
The maize data used for our analysis were unbalanced, with the design and size varying across fields.Under such a scenario, a two-stage analysis approach is commonly used, whereby genotype means are first estimated separately for each field and then combined to form a data array for an overall mixed model analysis (Frensham et al., 1997).A weighted two-stage analysis approach has been found to perform adequately for variety prediction (Smith et al., 2001;Welham et al., 2010;Piepho et al., 2012).The first-stage analysis for each individual environment offers a convenient analysis that accounts for any specific randomization layout and within-environment error modeling (Piepho et al., 2012)-for example, a fitting spatial model for each environment.
We argue that the spatial model should use data for all genotypes under the same management in that field regardless of their designated set.This approach offers several advantages.First, it provides a better estimate of genotype performance, especially for genotypes that were repeated in a field.Second, it provides better estimates of variance components.Third, it enables a spatial model to be fitted for small trials.For example, in this study, spatial analysis could not be conducted for the Midwest advancement trials only, as these trials were relatively small (Fig. 1a-1e).
While using all available data of a field increases the ability to fit spatial model, most fields have a small number of repeated observations.This created convergence problems when a spatial model was fitted.Therefore, in this study, an additional step was added to the first stage of the two-stage approach.In this additional step, a genotype was fitted as a random effect to obtain estimates of variance components for the spatial terms.These variance components were then used in the second step where genotype was fitted as a fixed effect.This second step is necessary to obtain a genotype BLUE, rather than a genotype BLUP, for the second-stage analysis (Smith et al., 2001).This two-step procedure enables the spatial model to be fitted in fields with a small number of repeated observations and eliminate the convergence problem.Moreover, a better estimate of spatial variability is expected from this two-step procedure, as information from all genotypes was used.Automatization of data analysis is required for any large breeding program.Due to the large number of fields, a customized model for each field is not practical.Thus, an implementation of a one-model-fits-all approach is necessary.In this study, one base spatial model was fitted for all fields, with some adjustment to accommodate different field sizes.A series of heatmaps produced for each field can provide a quick tool to evaluate the fit of this one-model-fits-all in capturing the pattern in each field.In general, the base spatial model fitted for all fields managed to capture the spatial trend in the fields (Fig. 2), except for a few fields where some residual structure was still observed (Fig. 2a and 2d) or for special-purpose trials where treatment effects were confounded with field trends (Fig. 2c).However, due to the large number of fields used in the second-stage analysis, the differences between a one-model-fits-all and customized models are likely to be negligible (Qiao et al., 2004).
A single-year analysis invariably overestimates V G and V GL components (Fig. 3), as these estimates are confounded with the V GY (Nyquist and Baker, 1991;Holland and Nyquist, 2010;Arief et al., 2015).Multiyear analysis provides an estimate for the V GY and V GYL components, and therefore better estimates of V G and V GL .The accuracy of V GY and V GYL estimates are affected by the number of genotype connections across years, as indicated by the differences between V G estimated from single-phase data and all data (Fig. 3a).The V G was expected to decrease from P1 to P3 due to selection (Cochran, 1951).However, analysis of single-phase data showed an opposite pattern: V G increased from P1 to P3 (Fig. 3c, black bars).These results were likely to be an artifact of a poorly estimated V GY , as V GY was decreasing from P1 to P3 (Fig. 3j, black bars).The lack of genotype connections across years within stages caused V G to be partially confounded with V GY , resulting in an overestimated V G and an underestimated V GY .In contrast, V G estimated from all data showed the expected decrease across phases (Fig. 3g).There were more genotype connections across years in all data than in single-phase data, and that helps in the estimate of V GY .When the genotype connections across years were sufficient, similar estimates of V G for each cycle were obtained from both single-and multicycle data (Fig. 3d and 3e).
The ratios of variance components from all data were consistent with the ones from Texas commercial maize (Barrero Farfan et al., 2013).However, the ratio of V GL from this study was smaller than the one in other crops (Cullis et al., 1996;Butler et al., 2000;de la Vega et al., 2007;Arief et al., 2015).The genotypes in these trials were targeted for the Midwest and were only tested in locations assigned to these maturity zones.As the genotypes were highly adapted and the test locations have high similarity, small estimates of V GL were expected.It is interested to note that the estimated V GL values (Fig. 3j-3n, black bars) were relatively smaller than the V GY values (Fig. 3c-3g, white bars).These results could indicate the effectiveness of relative maturity zone in partitioning TPEs for maize in North America.However, V GYL is still the largest component of the phenotypic variance (Fig. 3c-3n, white bars), so multiyear data analysis would provide an advantage over single-year data analysis.
The simulation results indicated that, under the assumption of constant V G, V GY, V GL, V GYL , and V R across years and cycles, multiyear analysis provides better estimates of variance components and genotype performance, as well as increased accuracy of selection (Fig. 4).The simulation results also showed no advantage of using >3 yr of data (Fig. 4b and 4c).This was due to the current simulation setting where there were no overlapping genotypes across cycles.No improvement after 3 yr of data was also due to the assumption that spatial variations had been equally accounted for in all three simulation scenarios.For the real data, although the number of overlapping genotypes across cycles was minimal (Table 1), multiyear data could provide an advantage over single-cycle data by providing better estimates of spatial variation.However, it is clear that the analysis of multiyear data always provides better estimates of genotype performance than the analysis of single-year data.

CONCLUSION
The combined across-year analysis enables the use of information from all years to improve estimates of genotype performance.It enables a spatial model to be fitted for each field and separation of the effects of genotype and genotype ´ year interaction in the estimation of variance components across fields.In simulated data, this resulted in a better correlation between estimated performance and the parameter genotypic values.As the simulation contained no overlapping cycles, there was no advantage from analyzing >3 yr of data (containing one complete cycle).In practice, the analysis of more years and across-cycle data can provide better estimates of genotype performance than the analysis of single-cycle data.Thus, in general, the combined analysis of data across years and cycles is better for making selection decisions for genotype advancement.
of testing.Each year contains multiple breeding cycles in multiple selection phases.‡ Selection phase.For each breeding cycle, selection is conducted in three phases (P1-P3).Selected genotypes are advanced from P1 to P2 and P2 to P3.§ Breeding cycle.Each cycle contains genotypes from the same crosses.¶ No. of locations in common in the particular row and column of year, phase, and cycle.# No. of genotypes in common in the particular row and column of year, phase, and cycle.† † No. of total unique genotypes in each year, phase, and cycle.‡ ‡ Total no. of locations used to test genotypes from each year, phase, and cycle.Each location contains multiple fi elds.§ § Total no. of fields used to test genotypes from each year, phase, and cycle.Field is where the genotypes are grown.

Fig. 3 .
Fig.3.Variance components expressed as percentage of phenotypic variance from the analysis of Monsanto's Midwest Advancement Trials.The top row contains the genotypic (V G , black) and genotype ´ year (V GY , white), and the bottom row contains the genotype ´ location (V GL , black) and genotype ´ year ´ location (V GYL , white) variances.These variance components were estimated from (a, h) single-year, single-cycle, and single-phase data; (b, i) single-year, across-phase data; (c, j) across-year, single-phase data; (d, k) acrossyear, single-cycle data; (e, l) across-year, multicycle data; (f, m) across-year data with overall V G , V GY , V GL , and V GYL ; and (g, n) across-year data with estimated V G for each cycle and overall V GY , V GL , and V GYL .

Fig. 4 .
Fig. 4. Simulation results.(a) Ratio of genotypic (V G ) and phenotypic variance (V P ) (i.e., per-plot heritability).The red line indicates the expected ratio.(b) Correlation between the true genotype values (i.e., G) and their performance (i.e., best linear unbiased prediction [BLUP]) for all the current-year Stage 3 genotypes.(c) Proportion of the common top 100 genotypes selected in the current-year Stage 3 trial based on BLUP and G values.This proportion was calculated as the number of the same genotypes in the top 100 selected genotypes based on BLUP and G values divided by 100.Dotted lines in the boxplot indicate the mean.

Table 1 .
Number of genotypes tested in common (lower triangular values) and locations used in common (upper triangular values) across selection-phases, breeding cycles and years, and total number of unique genotypes, locations, and fields in each year, phase, and cycle in Monsanto's Advancement Trials.