Registration of USDA-Max × Soja Core Set-1 : Recovering 99 % of Wild Soybean Genome from PI 366122 in 17 Agronomic Interspecific Germplasm Lines

USDA-Max × Soja Core Set-1 (USDA-MxS-CS1-1 to USDA-MxSCS1-17 [Reg. No. GP-417 to GP-433, PI 689053 to PI 689069]) is a group of 17 interspecific breeding lines developed from the hybridization of lodging-resistant soybean cultivar N7103 [Glycine max (L.) Merr.] with wild soybean plant introduction PI 366122 [G. soja Siebold & Zucc.]. These materials were released by the USDA-ARS and the North Carolina Agricultural Research Service (March 2017) to expand the North American soybean breeding pool. The full-sib breeding lines are 50% wild soybean by pedigree and developed through bulk breeding and pedigree selection. Marker analysis of 2455 welldistributed polymorphic single-nucleotide polymorphism loci revealed that individual breeding lines ranged from 21 to 40% alleles derived from wild soybean. Collectively, most of the wild soybean genome was transferred to the core set in that 5, 10, and 17 breeding lines captured 83, 98, and 99% of G. soja–derived polymorphic alleles. Physical linkage maps suggested that extensive recombination occurred between the G. max and G. soja genomes. The 17 breeding lines are well adapted to the southeastern United States, exhibited seed yield ranging from 75 to 97% of the domesticated parent, and are group VI or VII maturity. Some breeding lines displayed increased seed protein, oil, or methionine content, and all exhibited increased seed size as compared to the domesticated parent. The novel genetic diversity, positive agronomic performance, and improved seed composition of these lines suggest that they are valuable genetic resources for US soybean breeding.


S
oybean [Glycine max (L.) Merr.] production in the United States set records recently for hectarage (33.9 million ha, 2016), mean yield per hectare (3.23 t ha -1 , 2016), and mean farmgate value ($40 billion, 2012-2016, USDA-NASS, 2018).Despite soybean's importance to the US economy, its yield potential and production stability is jeopardized by major short-term extrinsic agricultural threats, such as abiotic and biotic stressors, and by more subtle and equally important long-term intrinsic agricultural threats, such as depletion of genetic diversity in plant breeding programs.This latter threat, depletion of economic and agronomic diversity in the crop, usually results from genetic bottlenecks in breeding and can slow cultivar improvement programs, impose yield ceilings for the farmer, and end in reduced competitiveness of the crop in international markets (Carter et al., 2009).An important antidote to these agricultural challenges in soybean is the incorporation of novel global genetic resources into applied US breeding programs (Carter et al., 2004).
The USDA germplasm collection is a storehouse of freely available and diverse genetic resources to aid this effort.Using publicly available DNA marker data (SoySNP50K Infinium BeadChips set; Song et al., 2013) to remove genetic duplicates, Song et al. (2015) showed that the USDA germplasm collection preserves over 14,000 unique (<99.9%similarity) domesticated G. max accessions and over 800 unique (<99.9%similarity) accessions of the wild annual progenitor species (G.soja Siebold & Zucc.).The G. max portion of the USDA germplasm collection is benefiting agriculture at present, and typical success stories involve introgression of disease and pest resistance into cultivars (Carter et al., 2004;Rincker et al., 2014).In contrast, breeding success with the wild soybean has been rare because of the overwhelmingly poor agronomic quality exhibited in the progeny and the complex inheritance of the wild soybean's vinelike architecture (Abdel-Haleem et al., 2015;Carter et al., 2004;Delheimer, 2012;Weber, 1950).
Wild soybean is genetically diverse within the species and also genetically distinct from G. max.Hyten et al. (2006) demonstrated that the majority of rare genetic alleles in wild soybean are not present in the domesticated soybean.Phenotypic evaluation of wild soybean accessions in the USDA germplasm collection suggests that they carry traits of agronomic importance such as elevated seed protein content and improved amino acid composition (Thang et al., 2018), salt tolerance (Luo et al., 2005), and yield enhancing genes (Li et al., 2008).Thus, G. soja accessions could serve as a valuable reservoir of unique genetic resources for applied breeding.
Extensive introgression of novel genetic diversity from wild soybean into adapted upright breeding lines is key to effective use of wild soybean in basic and applied plant breeding.Collectively, the interspecific germplasm releases reported here represent the first successful transfer of a large portion of a wild soybean genome to adapted breeding lines.More than 99% of 2455 single-nucleotide polymorphism (SNP) alleles from the wild soybean were transferred to 17 breeding lines (Fig. 1-3), and they exhibited extensive recombination between the wild and domesticated genomes (Fig. [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].The breeding lines yielded from 75 to 97% of the G. max parent, resisted pod dehiscence and lodging, and exhibited a range in seed composition beyond that of the domesticated parent.All 17 lines also produced larger seed than the G. max parent, indicating that alleles for increased 100-seed weight were inherited from the small-seeded wild soybean.This unique set of lines should facilitate the exploration of the potential of the wild soybean (G.soja) genome of PI 366122 (USDA-ARS National Genetic Resources Program, 2017) to benefit agriculture, especially as it relates to novel alleles for seed yield, protein content, and amino acid composition.

Pedigree
The USDA-Max × Soja Core Set-1 (USDA-MxS-CS1-1 to PI 689053 to PI 689069]) is derived from a bi-parental hybridization of USDA-ARS cultivar N7103 (PI 615695, Carter et al., 2003) and wild soybean PI 366122 (Table 1).PI 366122 is a maturity group (MG) IV accession collected near Aizumisato (formerly Aizutakada), Fukushima Prefecture, Japan.This wild soybean accession was selected for this study based on visual appearance.When grown at Clayton, NC, the accession exhibited dark green leaf color and a near absence of foliar diseases.N7103 is a small-seeded and extremely lodging resistant MG VII cultivar derived from the hybridization of breeding line NTCPR90-143 and 'Pearl' (PI 583367, Carter et al., 1995).NTCPR90-143 was derived from the hybridization of 'GaSoy 17' (Baker and Harris, 1979) and 'Vance' (PI 553048).Pearl was derived from the hybridization of G80-1515 and Vance.G80-1515 was derived from a cross of 'Pickett 71' (Hartwig et al., 1971) and 'Bedford' (Hartwig and Epps, 1978).

Initial Breeding Line Development
Population development for this material was described previously (Delheimer, 2012).Briefly, hybridization was performed between N7103 and PI 366122 at the Central Crops Research Station (CCRS), Clayton, NC, in 2002.The CCRS is operated                 -g 100 g -1 dry mass-g 100  (Delheimer, 2012).Qiagen's DNeasy Plant Mini Kit was used with Qiagen's QIAcube to obtain high-quality DNA.Illumina's GoldenGate assay (Illumina) was used to genotype the breeding lines with 1536 SNP markers (USLP 1.0), with the analysis performed on the Illumina BeadStation 500G, as previously described by Hyten et al. (2008).Automatic allele calling, followed by visual inspection and verification of the data was performed using Illumina's GenomeStudio software (version 2011.1;Illumina, 2010).The SNP markers (558) that were polymorphic between the parents were used for further analysis (Delheimer, 2012).

Identification of the USDA-Max × Soja Core Set-1
The average yield across environments for each of the 192 breeding lines was converted to a percentage of control cultivar.The control cultivar was appropriate for the maturity group.All breeding lines with a percentage of control value <75% were dropped from further analysis, leaving 50 lines for study.The SNP marker data for this truncated set were then used to identify a core set of lines that recovered, in the aggregate, 99% of the polymorphic alleles from wild soybean.A customized program was developed in-house using the software environment R (R Development Core Team, 2013) to perform the core set identification.Briefly, the program initiated by making two-way pairings between all lines and then selecting the two-way combination that recovered the largest number of unique G. soja alleles, regardless of their location in the genome.Using this two-way pairing approach, we created a cumulative pseudo-genome that included all G. soja alleles derived from either breeding line.The pseudogenome was then paired with all remaining lines, and the identification process was repeated.After each iteration, the number of breeding lines present in the pseudo-genome increased by one.Based on evaluation of the program's output, 17 breeding lines were selected for further evaluation, because this set recovered approximately 99% of the G. soja alleles for polymorphic loci.
Agronomic Evaluation of USDA-Max × Soja Core Set-1 The 17 breeding lines that ultimately became USDA-Max × Soja Core Set-1 were compared initially in a large series of independent field trials, where genotypes were assigned to various trials at random.Random assignment in these trials precluded side-by-side comparisons of the 17 releases for agronomic performance (data not shown).Thus, we designed an additional field trial to generate a side-by-side evaluation of the 17 breeding lines over six environments, using the following checks: the G. max parent N7103, 'Dillon' (Shipe et al., 1997), 'NC-Raleigh' (Burton et al., 2006), and 'NC-Roy' (Burton et al., 2005).A randomized complete block design was used to evaluate the materials at two locations in 2013 (CRS and CCRS) and four locations in 2014 (CCRS, CRS, Tidewater Research Station, Plymouth, NC, and the Plant Science Farm, Athens, GA), with three blocks per environment.The Tidewater Research Station is operated by the North Carolina Department of Agriculture and North Carolina State University.The Plant Science Farm is owned and operated by the University of Georgia.
At the North Carolina locations (CCRS, CRS, and Tidewater), three-row plots were used, with the outside rows serving as borders.Planting length was 6.1 m and the inter-row spacing was 0.97 m.At maturity, plots were end trimmed to 4.6 m, resulting in a harvested area of approximately 4.4 m 2 .In Georgia, four-row plots were used, with the outside rows serving as borders.Planting length was 4.9 m, with an inter-row spacing of 0.76 m.At maturity, plots were end trimmed to 3.7 m, producing a harvested area of 5.6 m 2 .Approximately 340,000 seed ha -1 was planted at all locations.Cultural practices implemented for soybean management were location specific, based on best management practices for the area.

Traits Evaluated
Agronomic traits (seed yield, plant lodging at maturity, plant height at maturity, maturity date, and 100-seed weight) were recorded for all plots at all locations (Table 1).Plant lodging was rated on a scale of 1 to 5, where 1 = all plants upright and 5 = all plants prostrate (Fehr, 1987).Plant height was recorded as the mean of three randomly chosen plants per plot.Maturity date was defined as the first day in which 95% of the pods were mature.Seed yield was reported at approximately 8% moisture.and N concentrations were determined by the University of Missouri's Agricultural Experiment Station Chemical Laboratories (Table 4).Amino acid and N concentrations were assayed on two replications from the 2013 CRS and CCRS locations and three replications from the 2014 CRS and Plymouth locations.Amino acid measurements were standardized by expressing the observations as a percentage of total protein.Crude N was determined by combustion analysis and converted to protein concentration as N concentration multiplied by 6.25.Amino acids were determined by a single oxidation 4-h hydrolysis method followed by cation exchange chromatography in a Beckman 6300 Amino Acid Analyzer (Beckman Instruments, Inc.).
Statistical Analysis of Phenotypic Traits for USDA-Max × Soja Core Set-1 All agronomic and seed composition traits were analyzed using SAS 9.4 (SAS Institute, 2014).Outlier analysis was first performed using an analysis of variance (ANOVA) developed in GLM Procedure with environment, block within environment, and genotype × environment serving as random effects and genotype treated as a fixed effect.Standardized residuals were evaluated, and any observation with an absolute value >3 was eliminated from the analysis for agronomic traits (Carter et al., 2016).Two observations were removed for yield, four observations for seed protein, and eight observations for oil.Outliers for carbohydrates, fatty acids, and amino acids were investigated but dropped only if more than two replicates of data within a location were present.At most, a total of only three observations were dropped for seed composition analysis within an Soluble seed carbohydrates (inositol, glucose, fructose, sucrose, raffinose, and stachyose) were determined for two replications from the 2013 CRS and CCRS locations, and three replications from the 2014 CRS and Plymouth locations (Table 2).Three replicate assays were performed per experimental unit (field plot) and averaged to determine plot means prior to statistical analysis.Carbohydrate analyses were performed using high performance liquid chromatography (HPLC) by the USDA-ARS Market Quality and Handling Research Unit at Raleigh, NC, incorporating slight modifications to the HPLC method of Pattee et al. (2000).Three laboratory replicates were performed for each experimental unit, and averaged values were used for statistical analysis.
Fatty acid composition of seed oil was measured on seed from four environments (two replications from CRS and CCRS in 2013 and three replications from CRS and Plymouth in 2014) (Table 3).Fatty acids were analyzed using slight modifications to the method described by Burkey et al. (2007).Fatty acids were derivatized to their methyl esters and analyzed using an Agilent Technologies 6890N gas chromatograph in the USDA-ARS Soybean and Nitrogen Fixation Research Unit facilities at Raleigh.Three laboratory replicates were performed for each experimental unit, and averaged values were used for statistical analysis.
Seed protein and oil content were determined for all three replications at all six field environments using a Perten DA 7250 near-infrared resonance spectrometer (Perten Instruments) in the Soybean and Nitrogen Fixation Unit facilities and reported on a 0% moisture basis (Table 1).Calibration curves for the analysis were derived by the Perten Instruments Soybean Consortium (T.Carter, unpublished data, 2018).Seed amino acid  or heterozygous between the parents.Polymorphic SNPs were then used to perform principal components analysis using the prcomp function in R. Visualization of the first two principal components was used to select the most representative single plant from each breeding line.In most cases, all five plants from a breeding line tended to cluster closely, indicating that residual variation in the F 4 -derived breeding line was modest (Fig. 1).These representative plants were increased at CCRS in 2016 using an 18.3-m plant row.F 9:12 seed from this process were designated as the release source.

Botanical and Morphological Description
Maturity date ranged approximately 11 d among breeding lines, with seven lines significantly (p < 0.05) earlier in maturity than parental type N7103 (MG VII) and one breeding line significantly (p < 0.05) later (Table 1).All 17 breeding lines exhibited moderate lodging resistance; however, 16 of the 17 breeding lines lodged significantly (p < 0.05) more than lodging-resistant parent N7103 (Table 1).Only one of the germplasm lines differed significantly (p < 0.05) for plant height compared with the parental check (14 cm taller, Table 1).Nine of the 17 germplasm release lines had gray pubescence (Table 5).Fifteen of 17 germplasm releases were white flowered, suggesting that the purple flower allele inherited from the wild soybean may have been linked to a deleterious trait that was affected by selection (Table 5).Similarly, only one line exhibited brown seed coat and only two were black (the rest were yellow), suggesting that dark seed coat color was unconsciously selected against by visual selection environment.Analysis of variance was then performed using Proc Mixed with the same model described above.Hypothesis testing was performed using a Dunnett multiple comparison correction, with the domesticated parent (N7103) serving as the control.All results were reported as least square means.

Seed Purification and Conformational SNP Genotyping Using BARCSoySNP6K
Seed purification of the 17 breeding lines was initiated at CRS in 2013.At maturity, five uniform F 4:9 plants from a 6.1-m plant row were selected from each breeding line.Plants were individually threshed using an Almaco BT14 belt thresher with forced-air clean-out between each threshing.In 2014, individual progeny rows (4.6 m) were planted at CCRS for visual observation, genotyping, and seed increase.For each progeny row, an individual leaf was harvested from 20 plants, bulked into a 5.7-by 8.9-cm coin envelope (ULINE), and stored at -80°C.A tissue core (40 mm 2 ) was taken from each leaf (20 cores per breeding line) and placed into FastPrep 2-mL tubes containing Lysing Matrix A (MP Biomedicals).Tissue samples were frozen using liquid N and homogenized using a FastPrep-96 (MP Biomedicals) instrument.
DNA was isolated using a cetyltrimethylammonium bromide procedure (Stein et al., 2001).The 85 (17 breeding lines × 5 sublines) DNA samples and parental samples were genotyped using the Illumina Infinium BARCSoySNP6K BeadChip (Song et al., 2014).Genotyping was performed by the Soybean Genomics and Improvement Laboratory, USDA-ARS, in Beltsville, MD.Of the 5403 SNP markers evaluated, 2455 were retained after removing SNPs that were monomorphic, missing,  during the development process or, alternatively, was affected by linkage between seed coat color and deleterious alleles in the G. soja genome (Table 5).Only one of the 17 lines carried the narrow leaf trait inherited from N7103, suggesting that the allele for this qualitative trait (derived from G. max) may also be linked to a deleterious trait that was affected by selection (Table 5).

Transgressive Segregation for Seed Size
All breeding lines had numerically greater 100-seed weight than N7103, with 15 breeding lines showing significantly (p < 0.05) greater 100-seed weight.Tanksley et al. (1996) reported that alleles for large fruit size in tomato (Solanum lycopersicum L.) reside in a small-fruited wild species of tomato (Solanum pimpinellifolium Jusl.).Our research indicates that alleles for large seed size reside in the small-seeded wild soybean, supporting the notion that wild species with small size-related traits may harbor genes for larger size in their genome.Domestication of a wild species often involves increases in size for traits (Harlan, 1976), and our finding provides an opportunity to explore how this may occur for seed.

Yield Performance
Over six test environments, the 17 breeding line yields ranged from 75 to 97% of the domesticated parent N7103 (Table 1).N7103 yielded very similarly to the three adapted checks in the study.Although variation in maturity date was present among breeding lines, regression analysis of seed yield with maturity date as a covariate was not significant (p = 0.42).A significant (p < 0.05) genotype × environment interaction was identified, but the genotype × environment variance component was the smallest of all random effects and only 38% of the error variance (data not shown).

Seed Protein, Amino Acid, and Oil Content
Seed protein content was significantly (p < 0.05) increased in seven breeding lines and significantly (p < 0.05) decreased in two breeding lines as compared to N7103 (Table 1).Seven breeding lines exhibited seed protein content near or above 45 g 100 g -1 .Diers et al. (1992) showed that G. soja contains alleles for improved seed protein content on chromosomes 15 (LG E, formally LG A) and 20 (LG I, formally LG K).Using approximately 12,000 G. max accessions, Bandillo et al. (2015) performed a genomewide association analysis for seed protein content and identified large effect quantitative trait locus (QTL) regions on chromosomes 15 (~ position  4 Mb) and 20 (~ position 31 Mb).Using the map position on chromosome 20, we evaluated those seven germplasm releases that had elevated protein content.We observed that four carried the G. max rather than G. soja allele, indicating that they lacked the large-effect high protein seed QTL from wild soybean (Fig. 8,13,16,20).Evaluation of the chromosome 15 position for these same seven breeding lines revealed that none carried the G. soja allele and, thus, none appeared to inherit the large-effect high protein QTL on chromosome 15 from wild soybean (Fig. 8,10,11,13,16,19,20).
After normalizing for seed protein content, 15 of 17 germplasm releases were significantly (p < 0.05) greater than the G. max parent N7103 for cysteine (Table 4).Two lines (USDA-MxS-CS1-11 and USDA-MxS-CS1-15) appeared to have elevated levels of both methionine and cysteine.The simultaneous increase in both S-containing amino acids appeared to be independent of the well-known G. soja DNA segments on chromosomes 15 and 20 that promote elevated seed protein content.Neither line carried the high-protein related DNA segments.Seed oil content was increased significantly (p < 0.05) in six breeding lines and significantly (p < 0.05) decreased in two breeding lines as compared to N7103 (Table 1).One breeding line (USDA-MxS-CS1-12) had both elevated seed protein and oil content (p < 0.05) compared with N7103.

Seed Carbohydrate and Fatty Acid Content
Genetic variation was observed among breeding lines for the carbohydrates inositol, glucose, fructose, sucrose, raffinose, and stachyose (Table 2).The anti-nutritional carbohydrate raffinose was statistically (p < 0.05) increased in four breeding lines as compared to N7103, whereas the anti-nutritional carbohydrate stachyose was decreased statistically (p < 0.05) in four breeding lines as compared to N7103 (Table 2).When analyzed together, the combination of the anti-nutritional carbohydrates (stachyose + raffinose) was significantly (p < 0.05) decreased in two breeding lines (USDA-MxS-CS1-10 and USDA-MxS-CS1-17; Table 2).However, the two breeding lines with lowered raffinose + stachyose were not different from the lowest check cultivar, NC Roy.
Modest genetic variation was observed among breeding lines for fatty acid content (Table 3).Reduced palmitic acid (16:0) and linolenic acid (18:3) contents were observed in the USDA-Max × Soja Core Set-1 compared with the adapted parent, N7103.Eight and four breeding lines were reduced significantly (p < 0.05) for those fatty acids, respectively.Only one breeding line was reduced significantly (p < 0.05) for both palmitic acid and linolenic acid compared with N7103.None of the breeding lines exhibited the high linolenic acid content of the wild soybean parent.It is assumed that this trait may be linked to deleterious trait(s) such as shattering or lodging susceptibility, to such an extent that the trait was selected against and not present in advanced lines from the breeding process.

BARCSoySNP6K Genomic Analysis to Characterize Germplasm Releases
Agronomic performance of the germplasm releases reported here was based on F 4 -derived materials.The final germplasm releases were F 9 -derived selections from the F 4 -derived breeding lines.In the F 9 generation, five individual plants (F 10 seed) were selected from each breeding line and subjected to 6K (BARCSoySNP6K) genotyping (approximately 2455 polymorphic loci for the parents).The five F 9 plants within a breeding line clustered closely with each other based on SNP marker analysis; the most representative F 9 plant from each breeding line was selected and increased as the final purified release source (Table 1, Fig. 1).
Principal component analysis of polymorphic SNP data revealed a clear shift in the marker profile of the germplasm releases toward the domesticated parent, with the contribution of G. soja to the genomes of the breeding lines (percentage of polymorphic alleles from G. soja) varying from 21 to 40% (Fig. 2, Table 6).The shift toward the G. max genome was attributed to the intense selection pressure applied during   introduction, obviating the need to make new hybridizations with the original viny G. soja source.
A limitation of the marker analysis of the germplasm releases, especially regarding statements about genome recovery, is that the SNP marker platforms were designed mainly for G. max.Ascertainment bias is expected with respect to G. soja.However, the favorable distribution of polymorphic markers in this study suggests that this potential problem may be relatively small.Sequencing of the breeding lines, followed by alignment to a G. soja pan-genome (Li et al., 2014) should resolve this question.The rather high frequency of large linkage blocks in these germplasm releases may in part result from the methodology used to identify the core set.Thus, breeding lines with large linkage blocks derived from G. soja are likely to be selected on average, because they exhibit an increased overall percentage of G. soja in their genome.However, it is unclear how such large linkage blocks and in some cases entire chromosomes appeared to remain intact while under intense phenotypic selection.Future breeding efforts will focus on determining if smaller linkage blocks inherited from wild soybean improve the breeding value of the adapted progeny.
breeding line development (Delheimer, 2012).Output from the Core-Set R program indicated that the first seven breeding lines of the core set captured >90% of the G. soja genome, whereas the remaining 10 were needed to recover ~ 99% of the genome (Fig. 3).It was not possible to recover 100% of the G. soja SNP alleles because of active selection to remove unfavorable G. soja traits such as lodging, pod dehiscence, vinelike growth, and small seed.
To provide numerical estimates of genetic relatedness between F 9 -derived breeding lines in the release set, simple matching coefficients were calculated using the polymorphic BARCSoySNP6K SNP data (Table 6).Genetic relatedness between breeding lines ranged from 53 to 82%, with an average genetic relatedness of 61%.The deviation in genetic relatedness from the expected 50% for full-siblings was attributed to the intense phenoytypic selection pressure applied during breeding line development toward the domesticated parent (Fig. 2).
Using the polymorphic SNP calls on the F9-derived lines, physical maps were produced for each breeding line to visually evaluate the size and distribution of linkage blocks within the lines (Fig. [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].Observation of the physical maps showed considerable recombination within the breeding lines, which was expected given the phenotypic variation observed for all traits among the lines.Despite the generally high degree of recombination, an unanticipated result was that in some cases, all SNP marker alleles from entire chromosomes were inherited from G. max (Fig. 4, 5, and 8), and in other cases, inherited only from G. soja (Fig. 7).
It is interesting to note that estimates of the percentage of G. soja alleles recovered within each breeding line were similar whether the results were based on the initial 1536 SNP analysis (~500 polymorphic loci) of F 6 seed from the F 4 -derived breeding lines or the follow up 6K SNP analysis (2455 polymorphic loci) of F 9 -derived lines from the BARCSoySNP6K array.We attribute the high correlation between the F 4 and F 9 results (Fig. 1) to meticulous subline selection and to the close relationship between the 1536 and 6K SNPs.The 1536 SNP array is a subset of the 6K SNParray.

Conclusion
The release of these 17 germplasm lines demonstrates the value of research at the intersection between plant breeding and genomics.The large populations and intensive phenotypic selection program we used produced agronomic releases.However, the value of these releases was made clear only by quantifying the contribution of the wild soybean to their genomes.Authenticating the pedigree of the germplasm releases through genomic analysis coupled with extensive phenotypic evaluation clearly showed that the wild soybean provided genes for improvement of seed protein and oil content as well as amino acid composition.Heterosis associated with improved yield was reported for one of these lines, USDA-MxS-CS1-4, when it was backcrossed with parent N7103, suggesting that this material may also carry yield-enhancing alleles (Taliercio et al., 2017).The extensive genomic analysis of these germplasm lines should prove valuable to researchers as additional traits are discovered in PI 366122, in that the traits will likely be present in at least one of these 17 adapted lines as opposed to the unadapted wild parental plant

Fig. 1 .
Fig. 1.Principal component analysis (PCA) for five F 9 -derived sublines tracing to each of 17 F 4 -derived experimental lines.Analysis was based on 2455 polymorphic single-nucleotide polymorphism (SNP) markers.One representative F 9 subline was selected from each F 4 -derived line for germplasm release.Selected lines indicated with red circle.The 17 experimental lines were developed from the hybridization of G. soja line PI 366122 × G. max cultivar N7103.The 17 experimental lines were selected on the basis of acceptable agronomic performance (yield at least 70% of the parental cultivar) and maximum presence of G. soja alleles.SNP analysis was performed using the Illumina Infinium BARCSoySNP6K BeadChip (Song et al., 2014).

Fig. 2 .
Fig. 2. Three-dimensional principal component analysis for USDA-MxS-CS1 F 9 -derived breeding lines (17) and parents based on 2455 polymorphic single-nucleotide polymorphism (SNP) markers.The 17 experimental lines were developed from the hybridization of G. soja line PI 366122 × G. max cultivar N7103.Experimental lines were selected for inclusion in USDA-Max × Soja Core Set-1 on the basis of acceptable agronomic performance (yield at least 70% of the parental cultivar) and maximum incorporation G. soja alleles.SNP analysis was performed using the Illumina Infinium BARCSoySNP6K BeadChip (Song et al., 2014).

Table 4 .
Amino acid least significant means for USDA-Max × Soja Core Set-1 breeding lines (17) and check cultivars.The 17 experimental lines were developed from a cross of G. soja line PI 366122 × G. max cultivar N7103.Experimental lines were selected for inclusion in the core set based on acceptable agronomic performance (yield at least 70% of parental cultivar N7103) and maximum incorporation G. soja alleles.Amino acids were measured for two environments in 2013 and two environments in 2014 the North Carolina Department of Agriculture and North Carolina State University.To increase F 1 seed supply, additional hybridizations were performed at CCRS in 2003.N7103 rather than the wild soybean accession was used as the female parent because of N7103's larger flower size, which facilitated efficient emasculation for manual pollination.After each crossing season, the F 1 plants were grown at the USDA-ARS Tropical Agricultural Research Station, Isabela, PR (winter seasons of2002-2003  and 2003-2004).Subsequently, approximately 2500 F 2 plants were grown at CCRS in2003 and 2004 (~5000 plants total).When the majority of the F 2 plants neared maturity, individual late-maturing (MG VIII) plants were rogued from the nursery (~4% of the population).At maturity, the remaining F 2 plants were bulk harvested.F 3 seed was sent to the University of Georgia where they were scarified to improve germination.Combined over the summers of 2004 and 2005, nearly 1.2 million F 3 plants were evaluated at the Caswell Research Station (CRS), Kinston, NC, which is operated by the North Carolina Department of Agriculture and North Carolina State University.Over the 2 yr, a total of approximately 375 F 3 plants were selected on the basis of upright growth habit and were threshed individually.Pedigree selection was implemented during the summers of 2006 and 2007, and one to five F 4 plants were harvested from agronomic F (Fehr, 1987)tly (p < 0.05) different from G. max parent N7103 after Dunnett's multiple comparison correction.†%G. soja calculations based on simple matching coefficients of polymorphic SNPs.‡ 1536 and 6K SNP arrays used 558 and 2455 polymorphic SNPs, respectively.§ 1 = no lodging, 5 = complete lodging(Fehr, 1987).by 3:4 plant rows.Approximately 1300 F 4:5 plant rows were evaluated for agronomic appearance.After eliminating agronomically unacceptable breeding lines, on the basis of lodging and shattering, 192 F 4 -derived lines were advanced for replicated yield testing. Breeding lines were evaluated in three to five environments (CCRS and CRS) between 2008 and 2010 (data not shown).