Identification of Founding Accessions and Patterns of Relatedness and Inbreeding Derived from Historical Pedigree Data in a White Clover Germplasm Collection in New Zealand

Pedigrees provide suitable input for tracking breeding crosses and monitor plants and animal populations. Pedigree analysis is an essential tool to visualize and describe population structure and genetic diversity. Pasture plant breeding relies on a range of strategies from closed system recurrent selection using elite cultivars to diversity-focused programs using ecotypes and wild germplasm as parents. There is currently no published pedigree analysis of white clover (Trifolium repens L.), despite over a century of breeding efforts in New Zealand. A pedigree map was constructed for white clover germplasm stored in the Margot Forde Germplasm Centre located in Palmerston North, New Zealand. Inbreeding and kinship coefficients and the effective number of founders through time were estimated to assess genetic relatedness in the reference population. A total of 15,265 accessions were included in the map. The maximum number of traced generations was 15, and the mean completeness of parentage for the reference population was 72%. The mean number of offspring was 5.14. The origin of founding and introduced accessions was assessed. The effective number of founders was 175.68 or 17.5% of the total number of reference founders. Founding accessions and influencing ancestors were identified. One of the earliest parental accessions identified was ‘North Canterbury Type I’ from 1941. Relatedness was determined using the k coefficient, with k = 1 as complete relatedness. The overall mean relatedness was k = 0.002. Overall mean inbreeding level was 0.39%. Relatedness and inbreeding coefficients revealed distinct germplasm pools formed across time that are of interest to pre-breeding efforts. L.M. Egan and V. Hoyos-Villegas, AgResearch Lincoln Research Centre, PB 4749, Christchurch, New Zealand; L.M. Egan and R.W. Hofmann, Faculty of Agriculture and Life Sciences, Lincoln Univ., Lincoln, New Zealand; B.A. Barrett and K. Ghamkhar, AgResearch Grassslands Research Centre, PB 11008, Palmerston North, New Zealand; V. Hoyos-Villegas, current address, Dep. of Plant Science, McGill Univ., Ste-Anne-de-Bellevue, QC, Canada. Received 18 Nov. 2018. Accepted 17 Apr. 2019. *Corresponding author (valerio.hoyos-villegas@mcgill.ca). Assigned to Associate Editor Bradley Bushman. Abbreviations: COP, completeness of parentage; MFGC, Margot Forde Germplasm Centre; NPGS, National Plant Germplasm System. Published in Crop Sci. 59:2087–2099 (2019). doi: 10.2135/cropsci2018.11.0688 © 2019 The Author(s). This is an open access article distributed under the CC BY license (https://creativecommons.org/licenses/by/4.0/). Published September 5, 2019


Identification of Founding Accessions and Patterns of Relatedness and Inbreeding Derived from Historical Pedigree Data in a White Clover Germplasm Collection in New Zealand
Lucy M. Egan, Rainer W. Hofmann, Brent A. Barrett, Kioumars Ghamkhar, and Valerio Hoyos-Villegas* ABSTRACT Pedigrees provide suitable input for tracking breeding crosses and monitor plants and animal populations. Pedigree analysis is an essential tool to visualize and describe population structure and genetic diversity. Pasture plant breeding relies on a range of strategies from closed system recurrent selection using elite cultivars to diversity-focused programs using ecotypes and wild germplasm as parents. There is currently no published pedigree analysis of white clover (Trifolium repens L.), despite over a century of breeding efforts in New Zealand. A pedigree map was constructed for white clover germplasm stored in the Margot Forde Germplasm Centre located in Palmerston North, New Zealand. Inbreeding and kinship coefficients and the effective number of founders through time were estimated to assess genetic relatedness in the reference population. A total of 15,265 accessions were included in the map. The maximum number of traced generations was 15, and the mean completeness of parentage for the reference population was 72%. The mean number of offspring was 5.14. The origin of founding and introduced accessions was assessed. The effective number of founders was 175.68 or 17.5% of the total number of reference founders. Founding accessions and influencing ancestors were identified. One of the earliest parental accessions identified was 'North Canterbury Type I' from 1941. Relatedness was determined using the k coefficient, with k = 1 as complete relatedness. The overall mean relatedness was k = 0.002. Overall mean inbreeding level was 0.39%. Relatedness and inbreeding coefficients revealed distinct germplasm pools formed across time that are of interest to pre-breeding efforts.
www.crops.org crop science, vol. 59, september-october 2019 introduced pivotal farm cultural practices in conjunction with pasture management to increase pasture performance (Brock et al., 1989;Woodfield and Caradus, 1994). Depending on the objectives of a breeding program, different strategies are adopted to release a cultivar. Many methods have been used in white clover breeding over the course of history in New Zealand and worldwide. Mass selection (Wricke and Weber, 1986) and recurrent phenotypic selection are examples of common strategies (Yamada et al., 1989;Woodfield and Caradus, 1994;Caradus and Woodfield, 1998;Van Den Bosch et al., 1999;Mercer et al., 2000). Polycrossing is often used in forage species that display heterosis, where parent clones are grown in isolation and are all pollinated together. The progeny are often combined and tested (Taylor, 2008). However, pollen flow is uneven and the larger the increase in distance between the male and female genotypes, the lower the chance of successful fertilization (George, 2014). This method is suitable for complex traits if progeny testing is included in the strategy (Taylor, 2008). The outcrossing nature of white clover means that breeding strategies such as mass and recurrent selection use the available variation in the genetic material, while decreasing the risk of inbreeding depression. Although gains continue to occur in white clover breeding (Woodfield and Caradus, 1994;Hoyos-Villegas et al., 2019), the genetic consequences of breeding strategies are difficult to assess, particularly if strategies are executed as closed systems with no new genetic variation introduced over long periods of time.
Pedigrees are often used as a conventional method to monitor breeding crosses and population structure in populations of both plants and animals (Navabi et al., 2014). Pedigree analysis is an important and often essential tool to visualize and describe the population structure and genetic diversity within a population. Molecular tools are being used frequently in mixed models alongside pedigrees (Valera et al., 2005) in genome-wide association and genomic selection studies (Yu and Buckler, 2006;Zhao et al., 2011;Chen et al., 2017). Although many efforts are focused on conservation breeding programs, there is a need for pedigree analysis in breeding programs for plant cultivar development with heavy selection pressure ( Jones and Bingham, 2010). This is particularly the case, if these programs are heavily reliant on germplasm collections. With the advancement in marker and next generation sequencing technologies, there are now methods to analyze relatedness and population structure by using genetic marker data. However, genetic diversity bottlenecks in germplasm stored in gene banks and the capacity and costs associated with genotyping large numbers of individuals or populations may limit the power of these studies. Therefore, information on population structure and relatedness can serve as an appropriate tool in prioritizing plant breeding and genetics efforts. Pedigree analysis has never been performed for a white clover collection in New Zealand or anywhere in the world thus far; the closest source of this kind of information can be found in Caradus and Woodfield (1997). The authors in that publication compiled a checklist of white clover cultivars, indicating their parentage and some attributes that relate to the release. However, no relatedness data or quantitative analysis were associated with the handbook.
Safeguarding germplasm is the most inexpensive and efficient method of genetic conservation of wild germplasm of agriculturally valuable plants. The Margot Forde Germplasm Centre (MFGC) hosts New Zealand's and international germplasm of forage and pasture plants. The mission of the MFGC is to avail a broad range of genetic diversity in the form of seeds to provide a spectrum of new forage traits to the future breeding programs. The most collected species are the commercially prioritized species for cultivar development by breeders worldwide and specifically in New Zealand such as white clover, but the forages of the future are also included in collection and exchange programs and should receive more attention. The MFGC conserves and occasionally regenerates accessions of wild germplasm, domestic and naturalized germplasm, bred lines, and pre-breeding material. This diverse collection makes MFGC unique among other forage collections around the globe.
We used historical pedigree data from the white clover collection maintained at the MFGC in Palmerston North, New Zealand. The objectives of this study were (i) to create a pedigree map of the collection, (ii) to identify founding accessions and determine the effective number of founders, and (iii) to detect patterns affecting inbreeding and kinship.

Data Filtering
The term "accession" is used to refer to any seed material entered into the MFGC with an identification number. The terms "Grasslands cultivar" and "Other cultivar" refer to cultivars that were bred and released by different organizations. "Grasslands cultivars" are all cultivars that were trademarked under the "Grasslands" trademark. "Other cultivar" refers to cultivars bred worldwide by other organizations and are not trademarked as "Grasslands." To date, the MFGC database holds data for 26,703 accessions of white clover from 40 countries. These accessions were recorded over a timeframe of 75 yr, from 1941 to 2016, using a range of breeding techniques including poly and biparental crossing. Of this total number, 13,687 accessions ( Fig. 1) were used in the construction of the pedigree map using Helium, a software that allows the pedigree visualization of large pedigrees (Shaw et al., 2014). Accessions were categorized to different subsets based on missing data (1633 accessions), their specific lineage as part of breeding efforts (e.g., seed increases and isolated nodes in the pedigree were not included; Fig. 1). my and fy, the coefficient of kinship is calculated between the two genes. The coefficient of kinship of y with itself is Eq. [1]. The kinship coefficient between x and y is Eq. [2]. Influential parents were defined as accessions with the highest mean kinship (k) within their corresponding cluster.
A heat map was used to visualize the relatedness pairwise comparisons of the population among the 12,154 accessions ( Fig. 2a). A dendrogram was used to represent the clustering of the population based on kinship coefficients (Fig. 2b). Important ancestors were identified as the accessions with the highest mean kinship found at the 22,500 distance coefficient on the dendrogram. This was confirmed by pedigree lineages.
Inbreeding was calculated using the formula (Crow and Kimura, 1970;Wright, 1984;Wiggans et al., 1995): The unconditional probability that y is heterozygous at any given locus is symbolized by H. The conditional probability that y is heterozygous at a given locus where the genes are not identical-by-descent is symbolized by H o (Fernando and Habier, 2006). Founders were not included in the kinship or inbreeding analysis, as they did not have pedigree data associated with their records. The effective number of founders ( f e ) was calculated by the program Pypedal, using this equation from Cole (2007): where p i is the proportion of the genes of the living, descendant population contributed by founder i (Lacy, 1989). The dataset used on the calculation of f e was the full parental dataset of T. repens, containing 5727 accessions (Fig. 1). Only accessions with full parental data and original founders were used. Attempts to use the full dataset including half parentage did not result in a reliable (inflated) estimate of f e .

Pedigree Map Size and Complexity
Between 0 and 15 generations were traced. Of the 13,687 accessions, 4724 (34.51%) had full parentage, 7430 accessions (54.29%) had one parent listed, and 1533 accessions (11.20%) had no parentage listed. There were 11,643 terminal lines identified in the pedigree. Terminal lines are defined as lines that are at the end of a lineage. Completeness of parentage (COP) across the entire map was generally high. The COP is defined as how complete the immediate pedigree for the accession was (i.e., one, two, or no parents listed). The COP for each accession was assessed for 16 generations. The mean COP for the reference population was 72%. Generations 5, 13, 14, and 15 had a mean COP of >80%. In general, when the number of accessions per generation decreased, COP increased (Fig. 3). Within the pedigree map, we identified five notable parents who contributed a substantial number of progenies to the overall A total of 12,154 accessions were used for the derivation and analysis of relatedness and kinship parameters. These accessions were selected on the basis of the type of cross they involved (e.g., biparental crosses), whether they had full or half parentage indicated, and whether they had decipherable parental information ( Fig. 1). Polycrosses were excluded from the parameter analysis subset because polycrosses do not fit the allele frequency expectations of biparental crosses; as a result, the inbreeding and kinship values would have been overestimated. The data structure was: accession ID, Parent 1, Parent 2, accession date, seed weight, and country of collection. In total, 2479 accessions had specific collection countries listed.
Founders were defined as the first accessions in 1941 that had no parentage listed, assuming no breeding had occurred. Every other accession introduced into the database after 1941 was considered as an introduction. Likewise, parents with high contributions to pedigree size were declared as those having families with >50 full or half-sib families. The 50-offspring cutoff was defined arbitrarily as a value well above the mean number of families.

Data Analysis
The number of offspring, kinship, and inbreeding was calculated by the R package 'pedigree' (Coster, 2015). Kinship was calculated by using the pedigree information and using a recursive application of the two formulas: where F is the coefficient of inbreeding, and the kinship of two individuals, given x is not a descendant of y, is F xy . In Eq.
[2], F xy = 0 when x and y are both from the founder population (Fernando and Habier, 2006). The two genes, one from each parent, at a given neutral locus inherited randomly, are transcribed as my and fy of y. To quantify the relationship between population. The mean offspring number for the whole population was 5.14, and of the accessions with offspring, the mean was 39.52. The node sizing feature of Helium was used, which is based on the contribution of an accession to the overall population. The five parents with the largest nodes on the pedigree map, indicating the largest amounts of offspring were for C2413 (243 offspring), C6525 (181 offspring), C10850 (305 offspring), C15117 (236 offspring), and C19756 (151 offspring). As parents, these accessions made the largest contributions to the pedigree. Accession C2413 is an accession that arose from a polycross of C72, C759, C792, C809, and C822 and contributed 243 progeny to the population. The progeny from accession C2413 were used to create a further six generations in the population. Included in the third generation was the accession C6525, contributing 181 progeny, and it arose out of a pairwise cross from C4748 and C4785. Accession C10850 had no listed parentage and contributed 305 offspring to the population. C15117 was an introduction from Spain and contributed 236 progeny to the population. Out of the 236 progeny only C18732 was advanced for selection and produced accession C22640, which in turn produced four terminal accessions: C24250, C22938, C24252, and C24251. Accession C19756 was the progeny of C10576 and produced 151 offspring; 19 of these progeny carried on for further breeding.
Accession C15117 was frequently used as a parent in the early stages of white clover breeding. Originating from Spain, it produced 236 accessions, making it one of the most important parents recorded. The drought-tolerant phenotypes often produced in the Mediterranean countries are exceptionally desirable in white clover, explaining why C15117 was well utilized (Cattivelli et al., 2008).
In the 1940s, there were 22 accessions introduced into the MFGC, all naturalized in New Zealand. The 1950s saw two introductions from Spain and 10 from New Zealand. The 1960s was a decade when the range of geographic origin of introductions rose sharply, indicating the importance of germplasm collecting trips. Accessions were collected from a total of 15 countries, with France,  increases across a number of countries, with a noticeable increase in local breeding activity. A total of 294 introductions were collected in the 2000s, and 212 were collected in the 2010s. The only significantly large introduction in the 2010s was from Russia, where 112 (52.83%) introductions were collected.
The peak in both the New Zealand and international introductions was in the 1980s (Fig. 5a). From the 1990s onward, fewer accessions were collected from New Zealand compared with international collections. Biparental and single crosses were a common method, whereas polycrossing was rare (Fig. 5b), with a peak of the practice of single crosses in the 1990s.
The idea that there is untapped variation in wild germplasm that can be brought into germplasm centers and breeding programs has motivated the interest in collection trips globally (Hawkes, 1977;Richards and Volk, 2010). The numbers of introduced accessions show that wild germplasm has been traditionally recognized as an important source for breeding white clover in New Zealand ( Fig. 4 and 5a). Accession C40 was a founding ancestor of the Trifolium repens L. section of MFGC classified as Israel, Morocco, Spain, Lebanon, and Turkey (Fig. 4) as the most represented countries in the collection. Figure 4 shows the number of white clover introductions into the MFGC for the decades 1950 to 2010. In the 1970s, there was a total of 36 countries contributing to introductions, with France, Germany, Greece, Iran, Israel, Italy, Portugal, Spain, Sweden, Turkey, and the United States contributing to 48.23% of the number of collections. Collections also increased within New Zealand with 182 (37.83%) out of a total of 481 accessions introduced. This pattern carried into the 1980s, where although the number of countries was reduced to 17, the number of accessions was higher, and that included 544 (58.12%) accessions originating from New Zealand. Other countries were Portugal, Spain, and Australia, highlighting the awareness of the requirement to collect diversity from countries with arid environments to enhance adaptation to abiotic stress. In the 1990s, a dramatic increase in collections from the United States, 149 accessions (40.82%) out of a total 365, as well as collections from 27 countries, was recorded. The 2000s and 2010s were representative of a leveling-off period in collecting missions and small "wild" and greatly influenced a large proportion of the structure of the collection.
Many of the influencing ancestors of the population can be traced back to introduced germplasm from various countries, showing the influence of wild germplasm (Fig. 5c). With increased biosecurity laws, early collection trips provided a wide base of available plant genetic resources. Further expansion of the collection will enable breeding high-performing white clover cultivars to address the future challenges of agriculture.

Offspring Distribution
Of the 12.9% of accessions that had recorded progeny, offspring number ranged from 1 to 7,356. The mode of the progeny count distribution was 0, and the mean was 5.14. When accessions with 0 offspring were excluded, the mode was 1 and the mean was 39.52. Inspection of the pedigree maps and offspring distribution suggested that accessions with the highest usage as parents resulted in 90 parents with >50 offspring. Influential parents were introduced across a wide range of years, and no geographical collection data were available for these accessions. The year with the highest number of influential parents (9) was 1941, and the most prominent decade was the 1980s with 26.67% of influential parents, closely followed by the 1970s with 22.22%.
There were 12 accessions that had >1000 offspring, ranging from 1359 to 7356 offspring, with six of the accessions coming from the 1970s. Accessions C72 and C1067 were the two most influential parents in that decade. Accession C72 was referred to as 'North Canterbury type 1', a white clover with a desirable phenotype, and C1067 was an introduction from Spain. C72 had 2860 offspring, and C1067 had 7356 offspring.

Founders and Founder Effects
Founders are defined here as accessions that initiated a population (Ladizinsky, 1985). A total of 34 accessions from 1941 were found in the white clover germplasm database as having no recorded parentage and were thus declared as founder accessions.
The overall mean relatedness and inbreeding level of the MFGC white clover collection was <4%. Also, visual inspection of the pedigree map did not result in any obvious bottlenecks. However, confirmation of bottlenecks or founder effects using marker data would be required (Reynolds and Fitzpatrick, 2013). The majority of the introductions were made between the 1970s and 1990s (Fig. 5a). This agrees with Fig. 6, as the kinship and inbreeding levels decrease when the population reaches the 1990s, possibly due to new germplasm integrated into breeding programs.
The earliest literature contains reports of 'Type 1' white clover, an ecotype that was certified and commercially produced in 1930 (Caradus et al., 1996). Three additional ecotypes were also collected, presumably around the same time. Subsequent reselections from these four early ecotypes were recorded under their respective types. When white clover breeding was first recorded, 'Type 1' germplasm performed above the three other types of white clover (Brock et al., 1989). Most prominently, a breeding program was created to reselect and perform crosses from the commercialized 'Type 1' ecotype population until 1957, where a final selection was completed. This selection was named 'Grasslands Huia' in 1964 and is a current cultivar to date. To clarify, "type" will hereafter refer to accessions belonging to a particular group rather than the original ecotype collected.
The four main types of white clover found in New Zealand pastures were identified among accessions in our study through differences in genetic distance (Brock et al., 1989). Type 1 was 'New Zealand Wild White No.1', a medium-leaved, productive perennial. Type 2 was 'New Zealand Wild White No. 2', a small-leaved perennial that was less productive than Type 1. Type 3 was 'Ordinary New Zealand White', a medium-leaved, nonpersistent clover. Type 4 was 'Lax early-flowering New Zealand and ordinary European', a small-leaved, nonpersistent type (Brock et al., 1989).
Of the 20 founders that were associated with Types 1, 2, 3, or 4 in the database, 14 were associated with 'Type 1 clover'. Direct parentage in the pedigree indicated that Type 1 was a class to which 70% of the founders belonged. The remaining 14 founders were not associated with any particular type.
Interestingly, 13 of the 34 founders were collected from regions in New Zealand such as Hawkes Bay, Canterbury, Whenuakura, and Southland. Hawkes Bay was the most common collection site, with nine founders associated with the region. Hawkes Bay is a temperate region of New Zealand but is prone to localized and widespread drought. Being a coastal region, extreme weather patterns are common and strong winds often contribute to erosion in paddocks (Thompson, 1987;Mullan et al., 2005). Hawkes Bay would then have been an environment with high natural selection pressure suitable for finding white clover germplasm with abiotic stress tolerance.
Out of 34, three founders resulted in distinct lineages associated with highly influential parents with high mean k values. Interestingly, accession C40 was known as 'North Canterbury Type 1' and was a founder of the original breeding efforts from 1941, collected from the Canterbury region in New Zealand. The impact of the elite Type 1 parents was significant, as C40 was the ancestor of the accession that had the largest influence over the whole population. Accession C40 was a founder and produced 51 direct dependent accessions.
The accession C43 was a founder in the white clover breeding population introduced in 1941. It is described as phenotypically similar to Types 1 and 3 white clovers. Accession C43 contributed 17 progeny to the population, as evidenced from its diverse pedigree. The 17 progeny went on to become successful populations themselves, leading to a greater number of indirect progeny.
Accession C63 was known as 'Imported Kent Type 5' and was a founder of the original breeding efforts introduced in 1941. It originated from Kent, UK. Kent is one of the warmest parts of Britain; however, it is prone to high winds and as it boards the River Thames and the North Sea to the north, and the Straits of Dover and the English Channel to the south, it can be prone to flooding. These climatic conditions along with the political relationship between New Zealand and England were also a reason why this accession influenced breeding efforts.

Influential Parents
Mean kinship values suggested that there were six highly influencing parents in the population structure (Fig. 2b). The first influential accession was C121 with k = 0.059 (Cluster 2 in Fig. 2b). The parentage shown for C121 is C104/C101. The parentage for C104 is C64/C40, and the parentage for C101 is C63/C40.
Another influencing ancestor was accession C7743 (k = 0.016), which divided the cluster from C121 (Cluster 1 in Fig. 2b). In the pedigree, the earliest parentage that could be traced back for C7743 was C963/C40. Accession C963 was collected from Spain but harvested in Australia in 1951. The phenotype of a plant traditionally adapted to semiarid environments such as Spain and Australia was beneficial for breeding programs targeting droughttolerant traits (Cattivelli et al., 2008).
Accession C23964 (k = 0.013) was an influential parent (Cluster 4 in Fig. 2b) in that it had two parental cultivars listed, 'Crusader' and 'Kopu II' (Caradus and Woodfield, 1997). Both of these cultivars are Grasslands cultivars. 'Grasslands Kopu II', formerly known as 'Ranger', was developed from persistent genotypes that were identified in the fourth year of a trial under rotational sheep grazing. Crusader was bred from pair crosses from five 'Crau' genotypes and six genotypes from Syria, which had been identified as having desirable drought tolerance capabilities and high dry matter yield (Woodfield et al., 2001).
Accession C18977 (k = 0.012) was an influential accession and had a large pedigree (Cluster 6 in Fig. 2b). Accession C18977 was listed in 1996 and arose out of C9473/C12186. This accession is in the same pedigree as C121. The founder C40 and another early accession, C72 (another progeny of C40), were in the ancestry of C18977. These two accessions both fall under the 'North Canterbury Type 1' category, emphasizing the role that the Type 1 class had in population structuring.
Accession C21996 (k = 0.009) was the final influencing ancestor found in the dataset (Cluster 5 in Fig. 2b). The parentage was C19756/C8421, with the earliest recorded ancestors C4145/(C11225/C11248). Accession C4145 was an introduction from Blonei, Poland, in 1978 and only produced one recorded progeny, C8421, which went on to produce 56 progenies.
As can be observed from the size of Cluster 3 in Fig. 2b, accession C24138 was the least influential parent with k = 0.006; it had the parentage of C19756/C21047.
The main result from finding influencing ancestors (Clusters 1, 2, and 6 in Fig. 2b) was the impact of accessions associated with the 'Type 1' phenotype. The accessions bred from parents derived from this class influenced the population structure strongly. The founders, although not directly producing large amounts of progeny, produced high-performing offspring that continue to result in many commercial cultivars. Second, downstream of the dendrogram, clusters started to diverge on the basis of geographic origin and plant breeder decision-making patterns. There are some clear distinctions where the geographic origin and the relevant desirable phenotype would influence the structure, such as accessions introduced from the Mediterranean. In contrast, Cluster 4 in Fig. 2b is represented by germplasm released by a single plant breeder selecting material. Eight cultivars were found in Cluster 4, containing 12% of the total population.

Effective Number of Founders
There were 5727 accessions analyzed, subset on the basis of full parentage plus founder data. A total of 1004 accessions were identified as founders and 4723 accessions were identified as descendants. The estimated f e was 175.68, indicating the number of founders needed to recreate the population with the same amount of genetic diversity. Figure 7 shows the trend in f e across decades. The f e value increases proportionately when the number of founders increases. However, it increases at a faster rate when there is a large reduction in the number of founders. In 1970, there were 310 founders and an f e of 29.28, compared with 1980, when there were 212 founders and an f e of 64.22. Cumulatively, f e remained stable between 1940 and 1970, before an increasing rate between 1970 and 2000 and then a decreasing rate in 2010. This may be due to a large number of crosses performed between 1970 and 1990. Large numbers of crosses showed that as the population size increased through breeding, new introductions declined.
As a consequence, the influence of founding accessions was reduced as genetic distance increased between founders and offspring multiple generations downstream.
Estimates of f e are useful in predicting future changes in genetic variability (Hamann and Distl, 2008). The resulting f e was 17.50% of the total number of reference founders. A lower number of effective founders relative to the number of reference founders indicates that the contributions of individual reference founders towards the genetic makeup of the population is low. A degree of redundancy in the contributions among founder groups was found, likely between founders from similar genetic background. Some level of redundancy in germplasm collections can serve as a means to minimize risk of allele loss and increase of inbreeding level in further generations. In practice, our estimate of f e is also affected by a population with a small group of founders with large contributions.
Interestingly, the effect of an increase in the number of reference founders in a given decade was followed by an increase in f e one decade later. For example, in the 1970s, there were 310 reference founders with an f e of 29.28. In the 1980s, there was a decrease in the number of reference founders (212) but a rise in f e (64.22). This is shown again in the 1990s, when the number of reference founders rose to 301 and there was a steady f e of 63.17, likely caused by the drop in reference founders in the previous decade. However, in the 2000s, the f e rose to 92.15 as a consequence of an increase in reference founders in the previous decade. The f e then stabilized in the 2010s due to no further increases in reference founders in the prior decade.
The impact of founders and introductions on the genetic structure of white clover populations at the MFGC was largely unknown. Without the conformation of a pedigree map, the derivation of inbreeding and kinship coefficients and the visualization of the relatedness in a dendrogram, the population structure and relationships within would be unknown. With this information, the impacts and results of human decision-based breeding over the decades can be evaluated, better information will be available for future planning, and germplasm exchange efforts will be improved.
With increased use of next-generation sequencing, information on population structure remains a key piece of information to guide germplasm surveys and genomic selection efforts. The effectiveness of genomic selection relies on high prediction accuracies. To predict the performance of genomic selection models, it is often useful to simulate prediction accuracies by applying expected prediction accuracy estimators. Often, expected prediction accuracy equations contain a term that requires the number of independent chromosome segments or the effective number of loci (M e ). The M e term is partially defined by an estimate of the effective population size. An extension of the f e estimate would allow for an accurate and empirical determination of effective population size (N e ) over a large sample population to be used in genomic selection program planning.

Diversity and Inbreeding
Diversity A total of 73,865,935 pairwise combinations were calculated for kinship coefficient. The values ranged from 0 (no relatedness) to 1 (full relatedness). A heat map was used to visualize the relatedness across the population (Fig. 2a). The black diagonal represents the perfect relationship of each accession with itself, and the symmetric of diagonal elements represent kinship measures for pairs of accessions. Relatedness is indicated by color intensity. Overall mean relatedness was k = 0.002. The yellow or red clusters in Fig. 2a represent high relatedness clusters-for example, C1061/C480 (k = 0.75), C1042/C703 (k = 0.562), C2146/ C1863 (k = 0.531), and C104/C64 (k = 0.5).
A total of 5529 accessions (45.49%) had a mean kinship of 0, whereas 6625 accessions (54.51%) had a kinship level of <0.2, 96.5% had 0 relatedness, 3.49% had indirect relationships, and 0.01% had half or more kinship. These kinship values indicate that genetic relatedness within the germplasm collection is low.
A total of 96.5% of the pairwise combinations had 0 relatedness, and when all possible combinations were mean per accession, 45.49% showed kinship levels of 0. Accessions that had a mean kinship of 0 in both datasets may show promise for the exploration of divergent parental combinations.
In T. repens, inbreeding depression has been proven to affect some morphological traits. Michaelson-Yeates et al. (1997) used inbred lines of white clover, utilizing the self-fertility (Sf ) allele. It was noted that only half of the hybrids showed positive heterosis, and no other trait showed significant heterosis. However, the degree of heterosis was related to the extent of variation in morphological characters between the parental lines. Nichols et al. (2007) assessed the impact of inbreeding depression on nodal root system morphology. The roots became shorter and thicker, but the root architecture was largely unaffected; however, there was reduced nutrient uptake efficiency compared to the parent clover. These studies emphasize the risk that is associated with inbreeding depression in white clover.
Although both of the studies above show that in white clover inbreeding is deleterious to some morphological traits, it should be acknowledged that when monitored and used correctly, inbreeding can lead to increased genetic gain. Inbreeding can unmask positive recessive genetic variation and can be used to remove unfavorable genetic load (Rotili, 1991;Humphreys, 1997). Hybrid vigor has been shown in crosses between inbred lines (Michaelson-Yeates et al., 1997). Atwood (1945) used inbred lines produced by the self-fertility allele, and positive heterosis for dry matter production was seen in half of the hybrids.

Inbreeding
Inspection of the overall pedigree suggested no visible bottlenecks occurring, which relates to the low inbreeding coefficients found. The mean inbreeding level was 0.39%, and among the accessions with nonzero inbreeding coefficient, the mean was 8.83%. The frequency of inbreeding within the inbred accessions peaked at both coefficients 0.04 and 0.13. Among the accessions with nonzero inbreeding coefficient, the highest level of inbreeding was 0.33 and the lowest was 0.0002. Across the whole dataset, 11,624 accessions (95.64%) had an inbreeding coefficient of 0, whereas 530 accessions (4.36%) showed inbreeding.
The trend in inbreeding from 1940 to 2010 shows that from 1940 to 1960, there was an initial increase in inbreeding of 0.0018. From 1970 to 1990, the steepest increase in inbreeding was found at 0.009. There was an increase from 1990 to 2000 of 0.0013, before another increase of 0.0024 from 2000 to 2010 (Fig. 6).
The genetic consequences of inbreeding in outcrossing species can be adverse. Due to the high levels of heterozygosity, most outcrossing species carry a high genetic load of deleterious alleles and suffer from severe inbreeding depression ( Jones and Bingham, 2010). Inbreeding depression is the reduced fitness of a population as a result of inbreeding, where the recessive alleles increase in frequency but are less favorable than the dominant alleles, resulting in a reduction in performance.
Inbreeding depression has severe effects in alfalfa (Medicago sativa L.). Wilsie (1958) showed that one generation of selfing resulted in a mean loss of 20 to 30% in vegetative vigor and 80 to 90% in self-fertility. Dessureaux and Gallais (1969) investigated the pattern of inbreeding depression in two specific alfalfa genotypes and the impact on the first-generation hybrid as the parents became more inbred. Inbreeding depression increased in each generation, and by the third generation, the progeny had practically become self-sterile. In contrast, Busbice and Wilsie (1966) observed a 30% reduction in forage yield in alfalfa. Acquaah (2012) indicated that mating systems such as half-sib mating, full-sib mating, and backcrossing can increase inbreeding. Autopolyploids have multiple alleles and can accumulate deleterious alleles that may not show up until later generations. Inbreeding depression is usually more severe in autopolyploids than in diploids; however, the rate to homozygosity is much slower in autopolyploids.

Commercial Cultivar Development
The dendrogram in Fig. 2b shows six significant clusters and 10 breeding pools occurring in the population. To assess the impact of the MFGC white clover collection in the development of commercial cultivars of interest to research and industry, accessions associated to the title of "Grasslands cultivar" or "Non-Grasslands cultivar" were extracted from the relatedness data.
Cluster 4 in Fig. 2b had 37 accessions linked to commercial cultivars. The accessions in Cluster 4 were introduced between July 1996 and January 2016. The geographic origin of commercial accessions within Cluster 4 was confined to Australasia, with agronomic traits common to Australia and New Zealand that influenced the divergence of clusters. Eight accessions (C26366, C26367, C26368, C26594, C26794, C27071, C27072, and C27073) in Cluster 4 had specific commercial cultivars listed in their data. These cultivars were all listed within 4 yr of each other, and common and phenotypic characteristics between these cultivars were found (Supplemental Table S1). Six out of the eight cultivars were medium leaved, and another six were also bred with persistence as a breeding objective (Beuselinck et al., 1994).
The majority of New Zealand white clover cultivar releases thus far have occurred between the years of 1970 and 1990, and the increase in cultivar release in these decades is evidenced by the contributions of the number of introductions over the same decades, as shown in Fig.  5c. Patterns of introductions and releases peaked in 1990 and decreased thereafter. The historical patterns shown in Fig. 5c are evidence of the direct relationship associated with the role that germplasm centers and collections play in the development of elite cultivars for farmers.
The biological and economic importance of white clover germplasm and breeding to the New Zealand pastoral sector is immense, with high recognition during the 1990s. Mather et al. (1996) reported that the 1994 Organisation for Economic Cooperation and Development (OECD) Register listed 93 white clover cultivars, with a further 25 to 30 cultivars also known to commerce. Therefore, there is well in excess of 100 cultivars to fill the global annual market of 8500 to 10,500 Tg. As New Zealand provides 50 to 55% of this seed, there is increased motivation and economic benefits to breed elite cultivars for the market, and germplasm forms one the foundations for cultivar development, food, and agriculture (Ghimiray and Vernooy, 2017).
To have the capacity to address climate change, pre-breeding efforts will increasingly need to rely on germplasm banks. Worldwide, there are now >700 seed collections holding an estimated 2.5 million entries (Plucknett et al., 1987;Tanksley and McCouch, 1997). For example, the USDA-ARS National Plant Germplasm System (NPGS) is a major source for plant genetic resources worldwide. As of November 2016, the number of accessions in the USDA-ARS NPGS was 576,325, representing 15,116 species, and in 2015, 239,118 of those accessions were distributed (Clark et al., 1997;Byrne et al., 2018). The United States spends approximately US$20 million annually on the maintenance of those collections (Tanksley and McCouch, 1997).

Effect of Forage Breeding Strategies
Frequent monitoring of breeding programs and breeding decisions prevents the creation of genetic bottlenecks, which limit the ability to generate new genetic variation that enables continued genetic gain. Our results show that albeit low, inbreeding should be paid careful attention going forward. Casler et al. (1996) and Casler (1998) noted that the lack of improvement in forages despite immense breeding efforts can be attributed primarily to the long breeding cycle, as the majority are perennials, and secondly to the negative correlation between yield and many other economically important traits in the forages (Casler and Brummer, 2008).
White clover breeding programs have mainly relied on recurrent phenotypic selection and crossing the most elite plants (Bell, 1977;Hill, 2014). Quantitative genetics principles applied to plant breeding developed in the 1940s led to present time integration of population genetics theory into plant breeding programs, which has led to a better understanding of the genetic architecture of traits via nextgeneration sequencing (Hill, 2014). However, in order for genome-wide association studies to be successful, germplasm and knowledge of the population structure is crucial.
Forage breeding around the world is mainly performed by half-sib family selection. Half-sib family selection reduces genetic gain by half because there is only control over the female parent. Gain can be doubled by selfing each parent to obtain S 1 , then crossing to obtain half-sibs (Wilkins and Humphreys, 2003;Acquaah, 2012). These are most often crossed through a polycross mating system and are useful when selecting traits of high heritability. In contrast, full-sib family selection can be generated from biparental crosses using parents from the base population. The families are evaluated and the elite families are selected. The half-sib/full-sib family selection has a number of merits; it has been in place for a long period of time and produced numbers of cultivars, and it is cost effective and resource efficient (Acquaah, 2012). Although straightforward and cost effective, it does not capture the full variation and potential in the population. Reciprocal half-sib selection, also known as reciprocal recurrent selection, is a strategy for interpopulation improvement; two diverging populations are used, and each population is used as a tester to evaluate the other. Reciprocal full-sib selection is used for interpopulation improvement for species where the commercial product is hybrid seed. The selection cycle is completed in the fewest number of seasons by using plants from which both selfed and hybrid seed can be obtained (Fehr, 1991). Casler and Brummer (2008) and Hoyos-Villegas et al. (2018) proposed theoretical gains that could be captured if among-and within-family selection was used in the forages. Their findings showed that among-and withinhalf-sib-family selection is the most efficient and is superior to family selection under all circumstances for any positive value of within family heritability. Among-and withinfamily selection and progeny testing are more expensive and resource intensive than half-sib/full-sib family selection. However, the theoretical gain that could come from using these techniques is strong enough that the resources used in among-and within-family selection could be justified.
There were significant clusters that diverged from each other in our study (Fig. 2b), most likely due to different selection pressures. There is both theoretical and empirical evidence that supports the idea that hybrids developed by crossing populations that have diverged can outyield the better performing parental population (Brummer, 1999). However, it is likely that without proper intrapopulation selection, nonfavorable alleles of strong effects will be present, and any benefits from additive ´ additive variation will not be observed. In contrast, with the right strategy, pre-breeding efforts aimed at generating new genetic variation in white clover will benefit from largescale population structure information.

CONCLUSIONS
To our knowledge, this is the first study of its kind in white clover. The construction of pedigree maps and relevant demographic information showed that Australia, France, Germany, Greece, Italy, and Spain were the countries that had the most consistent introductions over the 75 yr that the MFGC has collected germplasm. The genetic diversity in white clover was >96%, reflected by low inbreeding levels. Although there was a steep increase in inbreeding from 1970 to 1990, it should be noted that inbreeding did not exceed 2%. Low inbreeding is a positive sign of the amount of diversity contained within the collection, slowing the loss of unique and favorable alleles occurring in the species if properly used.
The founder accessions related to 'Type 1' white clover had a large influence on the population. Identification of founder accessions will inform future studies on the uniqueness of germplasm stored at the MFGC in relation to other collections worldwide.
The results of our study allowed the visualization of historical patterns of relatedness and inbreeding in white clover germplasm. The ultimate aim of this process will be to increase genetic gain in white clover. Information obtained from population structure and breeding pools will enable opportunities to perform new crosses, design new breeding strategies among and within clusters, and contribute to better germplasm utilization.
Increases in knowledge and application of quantitative and population genetics models in combination with new technologies and use of interpopulation and intrapopulation pre-breeding strategies will enable efficiencies in breeding. However, germplasm collection remains a critical component to maintain progress.
The limitations of pedigree analysis are largely due to the quality of the records maintained. However, in white clover, a group of plants are often polycrossed. Current pedigree analysis software can only handle two parents, excluding polycross data from analyses, and limits the scope of results.
Overall, estimates of kinship and inbreeding indicate that the MFGC has been successful in maintaining and elevating genetic diversity in white clover. This achievement has been realized by continuous germplasm collection trips and exchange in a demonstrated relationship with cultivar development. Despite successful breeding efforts, the increasing demand for adaptation to climate change and more sustainable animal production requires better and more aggressive utilization of white clover germplasm in the future.

Supplemental Material
Supplemental material is available online for this article.

Conflict of Interest
The authors declare that there is no conflict of interest

Data and Germplasm Availability
Data and pedigree map queries are available on request to the authors. Requests for MFGC germplasm may be considered depending on availability of germplasm and the purpose of the request.