Development of a High-Density Linkage Map and Tagging Leaf Spot Resistance in Pearl Millet Using Genotyping-by-Sequencing Markers
SMP and JGW contributed equally to this work.
Abstract
Pearl millet [Pennisetum glaucum (L.) R. Br; also Cenchrus americanus (L.) Morrone] is an important crop throughout the world but better genomic resources for this species are needed to facilitate crop improvement. Genome mapping studies are a prerequisite for tagging agronomically important traits. Genotyping-by-sequencing (GBS) markers can be used to build high-density linkage maps, even in species lacking a reference genome. A recombinant inbred line (RIL) mapping population was developed from a cross between the lines ‘Tift 99D2B1’ and ‘Tift 454’. DNA from 186 RILs, the parents, and the F1 was used for 96-plex ApeKI GBS library development, which was further used for sequencing. The sequencing results showed that the average number of good reads per individual was 2.2 million, the pass filter rate was 88%, and the CV was 43%. High-quality GBS markers were developed with stringent filtering on sequence data from 179 RILs. The reference genetic map developed using 150 RILs contained 16,650 single-nucleotide polymorphisms (SNPs) and 333,567 sequence tags spread across all seven chromosomes. The overall average density of SNP markers was 23.23 SNP/cM in the final map and 1.66 unique linkage bins per cM covering a total genetic distance of 716.7 cM. The linkage map was further validated for its utility by using it in mapping quantitative trait loci (QTLs) for flowering time and resistance to Pyricularia leaf spot [Pyricularia grisea (Cke.) Sacc.]. This map is the densest yet reported for this crop and will be a valuable resource for the pearl millet community.
Abbreviations
-
- DArT
-
- Diversity Arrays Technology
-
- GBS
-
- genotyping-by-sequencing
-
- LD
-
- linkage disequilibrium
-
- LG
-
- linkage group
-
- LOD
-
- logarithm of odds
-
- NGS
-
- next-generation sequencing
-
- QTL
-
- quantitative trait locus
-
- RFLP
-
- restriction fragment length polymorphism
-
- RIL
-
- recombinant inbred line
-
- SNP
-
- single-nucleotide polymorphism
-
- SSR
-
- simple sequence repeat
Core Ideas
- Pearl millet [Pennisetum glaucum (L.) R. Br; also Cenchrus americanus (L.) Morrone] is an important forage and grain crop in many parts of the world but genomic resources for this species are needed to facilitate crop improvement.
- The reference genetic map developed using 150 recombinant inbred lines contained 16,650 single-nucleotide polymorphisms and 333,567 sequence tags spread across all seven chromosomes.
- This map is the densest yet reported for this crop and will be a valuable resource for the pearl millet community.
- Genome mapping studies are a prerequisite for tagging agronomically important traits.
- Genotyping-by-sequencing markers can be used to build high-density linkage maps, even in species lacking a reference genome.
Pearl millet, widely known for its tolerance to heat, drought and soil toxicity, is grown for both grain and forage in many parts of the world, particularly in warm, dry regions (Burton and Powel, 1968; Chemisquy et al., 2010). Pearl millet has higher water-use efficiency and nitrogen-use efficiency than many other cereals (Muchow, 1988; Maman et al., 2006; Vadez et al., 2012) and shows useful genetic variation for tolerance to high temperatures during seedling establishment (Peacock et al., 1993; Howarth et al., 1994) and during reproductive growth stages (Gupta et al., 2015) and can thrive on acidic, sandy, or infertile soils where few other crops can grow (Andrews and Kumar, 1992). For these reasons, pearl millet is an essential staple food grain and/or fodder crop in many developing countries.
The market for pearl millet grain is also increasing in the United States because of consumers preferring gluten-free food and demand for millet flour by many ethnic groups (Dahlberg et al., 2004; Gulia et al., 2007). In addition, alternative sources to maize- (Zea mays L.) and soybean [Glycine max (L.) Merr.]-based livestock feed are sought to lower production costs for the poultry industry in the southeastern United States (Durham, 2003; Farrell, 2005; Cunningham and Fairchild, 2012). Whole pearl millet grain has been shown to be a satisfactory feed ingredient for broiler chickens and for egg production while reducing feed costs (Collins et al., 1997; Davis et al., 2003; Garcia and Dale, 2006). Compared to sorghum [Sorghum bicolor (L.) Moench], pearl millet grain offers lower starch, superior protein quality and content, a higher protein efficiency ratio, and greater metabolizable energy levels for poultry diets (Sullivan et al., 1990; Bramel-Cox et al., 1992; Andrews et al., 1993; Nambiar et al., 2011). Over 70% of the approximately 10 Mha of pearl millet grown annually in India is sown to F1 hybrids (Yadav and Rai, 2011; Yadav et al., 2011a) and the development of pearl millet grain hybrids in the United States has shown some progress. For example, the USDA-ARS at Tifton, GA, in collaboration with the University of Georgia, released ‘TifGrain 102’ as a commercial grain hybrid (Durham, 2003; Lee et al., 2004). TifGrain 102 offers several advantages compared to other row crops, especially its ability to grow on sandy, acidic soils with minimum inputs and its resistance to root knot nematode (Meloidogyne incognita Kofoid & White), rust (Puccinia substriata Ellis & Barth. var. indica Ramachar & Cummins), and Pyricularia leaf spot [Pyricularia grisea (Cke.) Sacc. (teleomorph: Magnaporthe grisea (T.T. Herbert) M.E. Barr](Hanna and Wells, 1989; Wilson et al., 1989; Timper et al., 2002; Gupta et al., 2012). Because of its high forage quality, pearl millet is also grown as an annual fodder crop in the southeastern United States (Burton and Powel, 1968; Chemisquy et al., 2010).
Pearl millet is diploid with seven pairs of homologous chromosomes and an estimated genome size of 2350 Mb (or 2C = 4.71 pg based on flow cytometry), much of which consists of repetitive sequences (Bennett and Smith, 1976; Wimpee and Rawson, 1979; Martel et al., 1997; Jauhar and Hanna, 1998; Thomas et al., 2000). Some DNA markers have been developed and used over the past two decades in pearl millet for genetic research or for applied breeding and selection (Hash and Bramel-Cox 2000; Bidinger and Hash, 2004; Gale et al., 2005). Nonetheless, pearl millet crop improvement suffers from a relative lack of genetic and genomic resources compared to most other cereals. Characterization and utilization of pearl millet diversity can be aided by expanding the (currently few) genomic resources available in this crop.
Genetic markers are the building blocks for constructing linkage maps. Linkage maps further support numerous applications in plant breeding. Genetic maps of several pearl millet populations have been made using different marker sets over the past 20 y (Liu et al., 1994; Devos et al., 2000; Qi et al., 2004; Pedraza-Garcia et al., 2010; Supriya et al., 2011; Sehgal et al., 2012). Recently, a simple-sequence repeat (SSR) consensus map with 174 loci was developed using four RIL mapping populations (Rajaram et al., 2013). Despite these efforts, pearl millet linkage maps frequently have large gaps at the distal ends, which is probably caused by (i) a lack of either sufficient markers or polymorphisms in these regions, (ii) extremely high rates of genetic recombination in these regions requiring large numbers of physically closely linked markers to permit linkage detection, and/or (iii) the nature of the markers and parents used in these studies (Devos et al., 2000; Vadez et al., 2012). A suggestion that these gaps are caused by some combination of the latter two explanations is provided by Supriya et al. (2011), who demonstrated greatly improved genome coverage with Diversity Arrays Technology (DArT) markers compared with that provided by available SSR markers. Most of the remaining previous maps have generally relied on SSRs, restriction fragment length polymorphisms (RFLPs), or related markers; however, in many crops, linkage maps based on SNPs are now becoming common because of the low cost of high-throughput sequencing methods (Ganal et al., 2009; Kumar et al., 2012). Because of their abundance in the genome, SNPs can be used to build much denser linkage maps than other types of markers. Such SNP-based genetic maps are highly informative as they not only reveal the complexity of genome architecture (structure and organization) but also trace the genetic basis of QTLs underlying a trait with better resolution (Krawczak, 1999; Mammadov et al., 2012). Next-generation sequencing (NGS) technologies have facilitated the rapid detection of genome-wide SNP markers. Genotyping-by-sequencing is one such powerful approach to develop genome-wide SNP datasets (Elshire et al., 2011). This technique uses restriction enzymes to selectively digest genomic DNA; next, ‘barcoded’ DNA adapters are ligated to the fragments to multiplex many samples in a single sequencing lane (Elshire et al., 2011). The choice of restriction enzyme(s) and multiplexing makes GBS a versatile system and the ability to multiplex enables low-cost, high-throughput-marker discovery (Poland and Rife, 2012). Importantly, it also works in less exploited crops, including those for which no reference genome sequence is available publicly, such as pearl millet. The potential utility of GBS markers in developing high-density molecular maps for several cereal crops, including maize, barley (Hordeum vulgare L.) and oat (Avena sativa L.) has been extensively reviewed and shown to be useful (He et al., 2014). Recently, a pearl millet linkage map was also developed with 2809 high-quality SNP markers using a modified GBS protocol (Moumouni et al., 2015).
The objectives of this study were to construct a high-density linkage map using GBS-derived markers to provide a platform for downstream studies and to develop genomic resources for the greater pearl millet research community. The two parents used in this experiment (Tift 99D2B1 and Tift 454) are also the parents of the commercial grain hybrid TifGrain 102 (Hanna et al., 2005a,2005b). Flowering time was chosen as a phenotypic trait for QTL analysis to demonstrate the utility of this map. Also, Tift 99D2B1 carries genes for resistance to Pyricularia leaf spot (Hanna and Wells, 1989) and hence this mapping population was used to evaluate resistance to this disease as well.
Materials and Methods
Pearl Millet Mapping Population
The parental lines used in this study are Tift 99D2B1 and Tift 454, where Tift 99D2B1 was used as the female parent. Both Tift 99D2B1 and Tift 454 are dwarf, early-maturing grain types that share genes from Tift 23D2. Tift 99D2B1 has rust and Pyricularia leaf spot resistance alleles and Tift 454 has nematode resistance and pollen fertility restorer capability. This population was developed by Dr. Jeffrey P. Wilson (USDA-ARS (retired), Tifton, GA) and was provided to Fort Valley State University as part of the collaborative pearl millet project funded by USDA-National Institute of Food and Agriculture, Grant # GEOX-2008–02595 to Dr. Bharat Singh (retired). The population used for sequencing was a set of 184 RILs at the F7 generation.
Plant DNA Preparation for Sequencing
Plant leaf tissue was collected from 1.5-mo-old seedlings raised in the greenhouse. The tissue was lyophilized for 8 h and then genomic DNA was isolated with a DNeasy 96 Plant Kit (6) (Qiagen Inc., Valencia, CA). The DNA was quantified to contain 10 ng μL–1 per sample and 50 μL of each sample from 184 lines was sent in 96-deep well plates to the Genomic Diversity Facility at Cornell University in Ithaca, NY, for GBS marker development. Each plate included DNA samples from both parents and TifGrain 102 in random wells as well as a random blank well containing only water.
Genotyping-by-Sequencing
Library preparation and sequencing were performed by the Genomic Diversity Facility at Cornell University, Ithaca, NY. Genomic complexity reduction was performed with the ApeKI restriction enzyme (recognition site G/CWCG) and samples were sequenced in 96-plex on an Illumina HiSeq 2000 (Illumina Inc., San Diego, CA). One hundred and eighty-four RILs were sequenced; five samples yielded less than 5000 reads each and were excluded from further analysis. Single-nucleotide polymorphisms were called from the remaining 179 lines.
Single-Nucleotide Polymorphism Calls
Raw FASTQ files were processed to SNP calls using the GBS pipeline in TASSEL (version 4.3.6) (Glaubitz et al., 2014). Reads were aligned against ∼19,000 contigs of pearl millet genome sequence provided by the Pearl Millet Genome Sequencing Consortium (Varshney et al., unpublished data, 2015) using Bowtie2 (Langmead and Salzberg, 2012). To see the effect of using a reference genome for sequence alignment, we also generated a map that did not use the reference genome to align tags. This pipeline was identical to that used for the reference-based map, except that tags were aligned to each other using the UNEAK (Lu et al., 2013) filter in TASSEL version 5.2.1.15 (commands --UTagCountToTagPairPlugin and --UTagPairToTOPMPlugin).
Initial Map Generation and Ordering
Map creation was done in three iterative steps. All scripts and parameters used in this process are included in Supplemental File S1. First, high-quality SNP calls were selected by filtering for those with at least 70% coverage across RILs and with allele frequencies between 0.25 and 0.75. Sites showing >12% heterozygosity were removed as probable paralog misalignments. All RILs showing >50% missing data or >10% heterozygosity were then removed. Potential outcrosses were also identified by using the fraction of rare alleles (minor allele frequency ≤ 0.05) in each RIL to define a normal distribution. All RILs whose value had <1% probability after Benjamini–Hochberg correction (Benjamini and Hochberg, 1995) were excluded. Filtering resulted in a dataset of 146 RILs and 17,400 SNPs.
To order SNPs, heterozygous calls were first set to “missing” and the genotypes were transformed to numerical equivalents using TASSEL (Bradbury et al., 2007). The SNPs were then clustered using the hclust() function in R (R Core Team, 2014). The cluster trees were split at various levels and linkage disequilibrium (LD) among clusters were manually inspected for the smallest level that clearly separated all seven linkage groups (LGs). Each LG was separated and markers were imputed on the basis of nearest-neighbor analysis; only perfectly cosegregating SNPs were used to impute each other. Redundant markers were then removed and 100 bootstraps of each LG were made by randomly resampling the RILs. Each bootstrap was ordered independently using MSTmap (Wu et al., 2008) and the results were merged, keeping the 95% most stable markers. Marker position was fine-tuned with the ripple() function in R/qtl (Broman et al., 2003). Map distances were also estimated with R/qtl using the Kosambi mapping function.
Using the first iteration map as a base, a second iteration map was built by testing all original SNPs’ linkage to one of the first iteration LGs; only those with an R2 value ≥0.6 were taken as being anchored. These SNPs were then filtered for those with calls in at least 60 RILs, minor allele frequencies ≥0.25, and heterozygosity ≤0.05. Each LG was then bootstrapped and reordered with MSTmap (Wu et al., 2008) as above, then cleaned with PLUMAGE (Spindel et al., 2014) and rippled with R/qtl (Broman et al., 2003).
The marker order from this second iteration was used to impute marker genotypes using FSFHap (Swarts et al., 2014), which uses a hidden Markov model to impute genotypes in bi-parental populations. The imputed genotypes were again bootstrapped, ordered, and cleaned as above. To get the final map, we put the original genotypes into the order identified by the imputed map, cleaned them with PLUMAGE (Spindel et al., 2014), and estimated map distances with R/qtl (Broman et al., 2003).
Comparison to the Consensus Map
Map LGs were numbered and oriented on the basis of their correlation to the consensus map of Rajaram et al. (2013). Three hundred and five SSR primer sequences from the consensus map were aligned to the contigs used in SNP-calling using Bowtie2 (Langmead & Salzberg, 2012). The position of the SSR was taken at the contig's location in the consensus map. Its corresponding location in the current map was calculated as the consensus location of the SNPs originating from each contig. Linkage groups were numbered and oriented on the basis of their best correlation to the consensus map.
Anchoring Sequencing Tags
Sequence tags were anchored based on the dominant-marker method of Elshire et al. (2011), where each tag's distribution across RILs was compared to the SNPs from the final map using a binomial test of segregation. The SNPs whose best p-value was below 0.0001 were considered to be anchored; all others were discarded. In this way, 333,567 tags (out of 9.33 million in total) were anchored to the genetic map.
Test for Segregation Distortion
In a recombinant inbred population, the expected segregation ratio for any given marker should be 50% from each parent. Each of the 16,650 markers was tested for segregation distortion using a χ2 test with 1 degree of freedom at α = 0.05 using Microsoft Excel (Microsoft Corp., Redmond, WA). The critical χ2 value was adjusted for multiple testing using the false discovery rate procedure of Benjamini and Hochberg (1995).
Field Layout for RILs
One hundred and seventy-nine RILs, two parental lines, and TifGrain 102 were sown in single-row plots that were 1.5 m long and 0.7 m apart, with a 1-m alley between plots at Fort Valley Agricultural Research Station farm (32°31′N, 83°53′W) on 16 July 2013. The experimental design was a randomized complete block with three replications. Grain sorghum was planted as a border around the experiment plot.
Measurement of Phenotypic Traits
Multiple phenotypic traits were scored among the 179 RILs for one season; two traits are reported here as test cases for the linkage map and the others will be reported in a separate publication. The number of days from sowing to 50% flowering was recorded for each plot. The 50% flowering date was decided when at least half of the plants in each plot had started flowering and half of the panicles on individual plants had exserted stigmas. Pyricularia leaf spot infestation had occurred under natural conditions because of the rainy, humid weather during our experiment. Ten plants in each plot were visually scored and given an average rating for that plot. The disease manifestation was very clear and conspicuous on all RILs and on their parents. The disease scoring was per ICRISAT using a 1–9 scale (Thakur et al., 2011), where 1 indicates no disease and 9 indicates complete death of the plant from disease. As disease progress depends on growth stage, some of the late-maturing lines showed a different disease response from other lines. Therefore, disease scores for plants that vary widely in maturation rates were adjusted based on their maturity and disease progress curve (Wilson and Hanna, 1992).
QTL Analysis
Before performing QTL mapping, the raw flowering time scores were transformed using Box–Cox transformation as coded in the MASS package for R (Venables and Ripley, 2002) because the data were not normally distributed. The optimal value for λ was determined by testing all values between −2.0 and +2.0 in steps of 0.008 (500 steps in total); λ = –1.335 had the highest log-likelihood value and so was used for transformation.
Mapping of QTLs was then performed using single-marker regression as coded in the R/qtl package for R (Broman et al., 2003). The phenotypes used were the raw disease scores and Box–Cox transformed flowering time scores and the genotypes were the final linkage map. We also smoothed the logarithm of odds (LOD) scores in 5-cM sliding windows, taking the maximum LOD within each window to identify peaks of association more clearly.
Results
Genotyping-by-Sequencing Analysis and SNP Calling
One hundred and eighty-four RILs were sequenced, which generated a total of 438.6 million reads that, with the exception of five failed samples, are spread mostly evenly across the samples (Supplemental Fig. S1). As the two parental lines and their commercial hybrid were sequenced twice, we had high-depth coverage of 5,964,312 reads for Tift 99D2B1, 3,077,835 reads for Tift 454, and 4,704,803 reads for TifGrain 102. The mean read depth across all successful samples was 2.2 ± 0.95 million (CV = 0.43), the pass filter rate was 88%, and the median was 2.16 million reads per sample. The total number of good reads among 179 RILs was 387,339,046; the individual with the fewest reads had 390,047 and the individual with the highest reads had 4,854,147. Raw reads were then converted to SNP calls using the TASSEL-GBS pipeline (Glaubitz et al., 2014; see the Methods section for the parameters used). Since pearl millet does not yet have a published reference genome, we aligned the reads against a collection of ∼19,000 scaffolds and contigs kindly provided by the Pearl Millet Genome Sequencing Consortium (Varshney et al., unpublished data 2015). During SNP calling, 88.8% of the total reads were mapped to scaffolds and contigs from the pearl millet genomic sequence.
Relationship between Founder Lines and the RILs
The sequencing data from the RILs shows a close relationship to the parental lines used in this study (Supplemental Fig. S2). The RILs cluster around the theoretical value of 50% relatedness to each parent (0.5, 0.5). As expected, a few individuals show up to 80% or higher relatedness to one parent or the other, as a result of stochastic gamete sampling during meiosis.
Identification of Polymorphic Markers
Calling of SNPs resulted in >500,000 raw SNPs, many of which were false positives caused by sequencing errors. Filtering for SNPs with calls in at least 60% of lines and with minor allele frequencies above 0.25 resulted in ∼24,000 high-quality polymorphic SNPs. To filter out false SNPs from paralogous sequences aligning together, we also removed sites that showed >12% heterozygosity (the 12% cutoff was determined empirically by looking at the distribution of heterozygous sites). We then also removed any RILs with >50% missing data or >10% heterozygosity. This resulted in using 17,400 sites across 146 RIL individuals, where missing data was in the range of 0.5 to 43.4% per individual (median 8.7%) and 12.7% missing across the entire dataset.
Construction of the Genetic Map
We built the linkage map in a series of iterative steps. First a subset of very high-quality “core” SNPs was taken and used to define LGs and an initial ordering. We obtained a core set of 1192 unique markers with stringent filtering (see Materials and Methods) covering seven LGs. Once the core map was assembled, lower-quality SNPs were anchored to LGs and the ordering was repeated. This second ordering was then used to impute all the markers using FSFHap (Swarts et al., 2014), which uses a hidden Markov model to impute individuals in biparental populations. These iterative steps added another 15,458 markers to the map across 150 RILs. A heat map of LD shows clear clusters between seven different LGs corresponding to the seven pearl millet chromosomes (Supplemental Fig. S3). These three iterative steps resulted in the final genetic map of 16,650 SNPs in 1191 unique recombination bins (Fig. 1). Our LGs were then renumbered and reoriented to match those of Rajaram et al. (2013), which were based on mapping to the amplicons used to generate their map. For comparison, we also created a map without using the genomic contigs to anchor the sequencing reads. Instead, GBS reads were aligned against each other with the UNEAK filter (Lu et al., 2013; see Materials and Methods); all other steps were identical. The final genome-free map included 4900 markers. This is still a significant number of markers, and if no genomic data were available, they would still form a useful map. However, the >3× higher number of markers from the original map demonstrate the value of having genomic sequences to align against, even if these sequences are not assembled into a reference genome.
Expanding the Genetic Map with Sequencing Tags
After obtaining the final map, we then anchored sequencing tags (the same 64-bp reads used in the GBS pipeline) to it. We used the dominant-marker method of Elshire et al. (2011), which anchored 333,567 (out of 9.33 million) tags onto the genetic map.
To gauge the accuracy of mapping, we looked at the overlap between sequencing reads and the SNPs they generated. There is only partial overlap between the set of tags that give rise to SNPs in the map and the tags that were mapped on their own (Fig. 2). This is mostly because (i) a tag can still be anchored even if any SNPs it gives rise to are filtered out, (ii) some tags are caused by presence–absence variation and so will not give rise to SNPs themselves but can still be anchored to nearby SNPs, and (iii) sequencing errors can make a tag appear unique, so even if a good SNP can be called in one part of a tag, an error elsewhere in it makes the tag too rare for the binomial segregation test to work.
Of the tags that do overlap, ∼87% of them anchor to within 10 cM of their associated SNP, many of them to the exact same recombination bin. This implies that our mapping accuracy is high and the positions of the reads should be very close to their true position.
In total, 16,650 SNPs and 333,567 additional tags were distributed on all seven LGs (Table 1). Overall, 20.10% of data were missing for 16,650 loci across 150 individuals. The missing values across these loci ranged from 4 to 55 per locus (2.6–36.6%). In the final map containing 16,650 SNPS and 333,567 tags, the average densities of SNP markers and of additional tags across all chromosomes were 23.23 and 465.42 tags per cM, respectively, covering the genome length of 716.7 cM. The marker densities per LG were spread from a minimum of 9.24 cM–1 on LG 4 to a maximum of 35.13 cM–1 on LG 7. When only unique linkage bins were counted, marker densities are in the range of 0.81 bins cM–1 (LG 4) to 1.90 bins cM–1 (LG 2) with an average density of 1.66 bins cM–1 across the genome.
LG | Length (cM) | Markers | Anchored tags | Marker density per cM | Tag density per cM |
---|---|---|---|---|---|
LG 1 | 96.9 | 2509 | 49,855 | 25.89 | 514.50 |
LG 2 | 98.1 | 2986 | 62,754 | 30.44 | 639.69 |
LG 3 | 175.3 | 3000 | 61,367 | 17.11 | 350.07 |
LG 4 | 55.5 | 513 | 15,789 | 9.24 | 284.49 |
LG 5 | 118.3 | 3085 | 58,902 | 26.08 | 497.90 |
LG 6 | 112.6 | 2449 | 46,685 | 21.75 | 414.61 |
LG 7 | 60.0 | 2108 | 38,215 | 35.13 | 636.92 |
Total | 716.7 | 16650 | 333,567 | 23.23 | 465.42 |
Among all the chromosomes, LG 4 had the fewest markers and tags, whereas LG 5 had the highest number of SNPs and LG 2 had the highest number of tags. The highest numbers of SNP markers were anchored and ordered on LG 5, which had 3085 markers in the final map.
Comparison to an Existing Pearl Millet Consensus Map
Rajaram et al. (2013) recently produced a consensus pearl millet map by combining SSR data from four different linkage populations. The current map was matched to the consensus using 305 SSR primer pairs from Rajaram et al. (2013). Of these, 191 aligned uniquely while being in the correct relative orientations and distances apart; 16 aligned concordantly but at multiple locations, one aligned discordantly (incorrect orientation), and 97 either did not align at all or had only partial alignments (meaning one primer was aligned but not both) (Supplemental Fig. S4).
The lengths of each chromosome in the current map ranged from 55.5 cM (LG 4) to 175.3 cM (LG 3), with an average length of 102.3 cM per chromosome. In the core map made from 1192 sites, the average intermarker distances between two adjacent markers ranged from 0.52 cM (LG 2) to 1.23 cM (LG 4). The inter-marker distance of 0.01 cM was least on LG 2 and LG 3, and the maximum distance of 11.71 cM was observed on LG 4, with an overall average marker distance of 0.67 cM across the entire core genetic map. There were three intervals [5.52 cM (LG 4), 6.14 cM (LG 2), and 11.71 cM (LG 4)] that were more than 5 cM between neighboring markers. The rest of the intervals were below 5 cM distances, which reflects that more than 99% of the map had small spacings between neighboring markers. Linkage Group 3 here appeared to be extended longer than LG 3 of the consensus map, whereas LG 7 fairly represented its counterpart. The rest of the chromosomes were shorter than the consensus map. We also compared our LG lengths with four LGs (LGA, LGB, LGC, and LGG) in the GBS-based SNP map by Moumouni et al. (2015), which revealed that our map was extended in LG 1 and LG 6, but it was shorter in LG 2, LG 4, and LG 7. These extensions are very common in telomeric regions, which also have been observed in DArT-based maps of pearl millet (Supriya et al., 2011). The maps reported in all the previous studies used Haldane mapping functions, whereas our map used the Kosambi mapping function distances, which could be one reason for discrepancies in map lengths.
The current map appears to have roughly equal coverage to the consensus map but with some caveats. Many individual markers and some groups of markers were localized to different locations in the two maps. Some of this may be a result of technical error, such as misalignment of the primer sequences or misassembly caused by sequencing errors. Some of the discrepancies are probably biological, however, and represent small- and large-scale structural variations between the populations used to make the two maps. Pearl millet has significant genetic diversity (Oumar et al., 2008), to the point that only a single SSR from the consensus map was mappable in all four of its input populations (Rajaram et al., 2013). In that context, finding significant variation with a fifth population (the one used in this study) is to be expected here as well.
Segregation Distortion
Of the 16,650 mapped SNP markers, 6652 (39.41%) showed significant segregation distortion after adjustment for multiple comparisons (Benjamini and Hochberg, 1995). Most of these distorted markers occurred in large linkage blocks. Linkage Group 3 showed the greatest amount of segregation distortion, with nearly the entire LG (98.67% of mapped markers) significantly biased in favor of Tift 99D2B1. In contrast, LG 1 was also highly distorted (80.71% of mapped markers) but was biased in favor of the other parent, Tift 454. Linkage Group 2 also had several highly distorted blocks, biased toward the Tift 99D2B1 parent, and LG 6 had one major linkage block biased toward Tift 99D2B1. Linkage Group 4 showed the least segregation distortion, with only two markers (0.39%) distorted (Supplemental Fig. S5).
Mapping Leaf Spot Resistance and Days to 50% Flowering Traits
The linkage map developed in this experiment was used in regression analysis to identify QTLs for two phenotypic traits: leaf spot resistance and days to 50% flowering. The H2 was quite high for these two traits. For days to flowering, H2 = 0.7578. For Box–Cox transformed days to flowering, H2 drops to 0.5110. For raw disease score, H2 = 0.7978; for the adjusted disease score, it is 0.9163.
The two parents showed significant differences for these two traits in the field, whereas their F1 hybrid, TifGrain 102, showed good leaf spot resistance, similar to Tift 99D2B1, but flowered later, similar to Tift 454 (Supplemental Table S1). R/qtl results identified leaf spot resistance loci on LG 5 and LG 7 with significant threshold LOD values above 3.0 (Fig. 3, Table 2). These QTLs were found to be minor, with phenotypic variance of 4.83 to 5.05% and a favorable allelic effect (lower disease score) from Tift 454. Two more QTLs for leaf spot resistance with a favorable allelic effect from Tift 99D2B1 were located on LG 2 and LG 3, having LOD values just above 2.0. A significant QTL for flowering time with a LOD value above 3.0 was located on the upper arm of LG 2, which explained 6.0% of the phenotypic variance, with the positive allelic effect (later flowering) coming from the parent Tift 454 (Fig. 3, Table 2). The rest of the QTLs for flowering time were detected below LOD 3.0 on LG 1, LG 5, and LG 7 with 0.49 to 4.75% phenotypic variance and positive additive effects coming from the other parent, Tift 99D2B1.
Flowering time | ||||||
---|---|---|---|---|---|---|
LG§ | Location | SNP interval | Peak SNP | LOD | Variance | Additive effect† |
cM | % | d | ||||
1 | 32.3 | S1_1423–S1_3590 | S1_2196 | 2.61 | 3.03 | 1.8 |
2 | 23.3 | S2_1896–S2_2803 | S2_2223 | 4.86 | 6.00 | −2.0 |
5 | 0.0 | S5_0012–S5_1669 | S5_0451 | 2.38 | 4.75 | 1.5 |
7 | 14.4 | S7_0244–S7_2067 | S7_0774 | 2.48 | 0.49 | 1.3 |
Leaf spot disease | ||||||
---|---|---|---|---|---|---|
LG | Location | SNP interval | Peak SNP | LOD | Variance | Effect‡ |
cM | % | |||||
2 | 85.0 | S2_7773–S2_8331 | S2_7983 | 2.18 | 1.78 | −0.6 |
3 | 114.2 | S3_0019–S3_4763 | S3_4544 | 2.25 | 1.82 | −0.5 |
5 | 30.5 | S5_2145–S5_4145 | S5_3817 | 4.56 | 4.83 | 0.9 |
7 | 30.5 | S7_0738–S7_3864 | S7_2251 | 3.01 | 5.05 | 0.9 |
- † A negative sign indicates that the later flowering allele was derived from the Tift 454 parent, whereas a positive sign indicates that the allele from parent Tift 99D2B1 delayed flowering.
- ‡ 1 indicates no disease symptoms; 9 indicates complete susceptibility. A negative sign indicates that the Tift 99D2B1 allele increased resistance (lower score), whereas a positive sign indicates that the Tift 454 allele increased resistance.
- § LG, linkage group; SNP, single-nucleotide polymorphism; LOD, logarithm of odds.
Discussion
Importance of a High-Density Genetic Map and Its Comparison to Existing Maps
Next-generation sequencing technologies have revolutionized marker discovery and enabled high-throughput plant genotyping through several new marker platforms like GBS (Poland and Rife, 2012). Genotyping-by-sequencing is a cost-effective and efficient system for developing high-density markers, which are concurrently discovered and genotyped in larger mapping populations (He et al., 2014). These abundant markers, coupled with well-developed bioinformatics, facilitate the development of dense molecular linkage maps. In this experiment, we had high-depth coverage and abundant high-quality SNPs.
Ever since the first pearl millet genetic map was made from RFLPs in 1994 (Liu et al., 1994), there has been a continuous effort to improve such maps with greater marker density and uniformity. Many of these maps had large gaps in the distal regions of chromosomes, probably caused by very high recombination rates, so most improvement efforts targeted these regions. For example, expressed sequence tag and genomic SSRs were added by Senthilvel et al. (2008), DArT markers by Supriya et al. (2011), and gene-based SNP and conserved intron spanning primers markers by Sehgal et al. (2012). Despite these efforts, large gaps of more than 30 cM were still present in most of the distal regions of chromosomes. The most recent consensus map (Rajaram et al., 2013) used expressed sequence tag SSRs and also contained large gaps in the range of 18 to 27 cM on every chromosome. Using NGS, Moumouni et al. (2015) made a GBS map from 314 nonredundant SNPs. Although the map developed by Moumouni et al. (2015) was uniform in coverage with no interval greater than 20 cM in length and only 10 intervals larger than 10 cM, it still had a maximum gap of 19.7 cM on LG 2 that corresponds to 3.0% of the total map length. The linkage map in the current study has a maximum gap of 11.71 cM on LG 4, equating to 1.6% of total map length and representing a significant improvement in reduced gap size.
To our knowledge, this map represents the densest genetic map in pearl millet so far. It contains 16,650 SNPs and 333,567 sequence tags covering all seven LGs. Here, we report an average density of 1.66 linkage bins cM–1 and 23.23 SNP cM–1 in the final map, which significantly surpasses the 0.51 SNP cM–1 of the next-densest map (Moumouni et al., 2015). The linkage map constructed in this study is more dense, uniform, and highly saturated, which is reflected through smaller marker spacing (<5cM) than any previously published pearl millet genetic map. The mean distance between two neighboring markers is the least: 0.6 cM compared to 2.1 cM (Moumouni et al., 2015) and 3.7 cM (Supriya et al., 2011) published so far. The small marker spacings on every chromosome with several cosegregating redundant markers shows that with the exception of LG 4, this map is extensive and reasonably uniform in genome coverage. Therefore, our map complements the recent pearl millet linkage map developed by Moumouni et al. (2015), which contains 2809 GBS markers from 85 F2 progenies. At 716.7 cM in total length, our map is slightly longer than that of Moumouni et al. (2015) (640.6 cM), which used an F2 population and thus is expected to be shorter. The high quality and quantity of markers found in this experiment were possible because of high-depth coverage for two parents in calling SNPs and the large number of RILs (150 individual progenies) available after stringent filtering.
Genetic map distances are relative distances based on recombination frequencies, unlike physical maps, which estimate actual distances in base pairs. The map distances and positions of individual markers can vary from one mapping population to the other depending on the parents used in the initial cross and type of mapping population used. Our map distances are represented through the Kosambi mapping function although previous studies used Haldane mapping function, which may explain some of the differences in map length. The comparison between our map and the previous consensus map has shown some agreement but also some discrepancies. For example, some markers are at different locations in the two maps (Supplemental Fig. S4). Our total map length is shorter than the total map lengths reported by Supriya et al. (2011), Sehgal et al. (2012), and Rajaram et al. (2013). Although some of these disagreements are probably caused by technical differences in the ways each map was prepared, many of the disagreements are probably a result of biological differences, including a few large linkage blocks that may represent actual translocations in one population relative to the other. Given the quality of LD within the current map (Supplemental Fig. S3), any major discrepancies are probably caused by structural variations originating from the germplasm used in the current study.
High-density maps developed through GBS not only support functional genomics through connecting phenotype to genotype but they also serve as reference maps in fundamental studies like genome sequencing to refine, order, and assemble scaffolds and contigs of pseudochromosomes (Poland and Rife, 2012; Ward et al., 2013). This map has been partly used in contig assembly of the pearl millet genome sequencing project led by ICRISAT. Furthermore, a well-ordered dense map allows a comparative genome structure analysis and informs about important evolutionary changes (Gale and Devos, 1998). This linkage map will also help other researchers working on mapping traits in pearl millet. For example, others can directly use the 64-bp tags used to develop SNPs in this study for the same purpose. The resulting datasets can be used to make genetic maps, mine alleles, and characterize diverse pearl millet accessions.
Imputation of SNP Data
The major drawback of sequencing-based genotyping technology is the large amount of missing data; GBS is no exception. Several approaches can be used to reduce these missing data, such as sequencing to high depth, filtering to save only high-quality data, or performing imputation of haplotypes (Poland and Rife, 2012). We used careful filtering to achieve a missing rate of 20.1% in our final (unimputed) genetic map, although one of the steps used to generate it included imputing another version down to only ∼3% missing data. We focus our analyses on the unimputed map because imputation can introduce biases. Both the imputed map and unimputed map are available in Supplemental File S2.
The Parents and Their Ancestry
The parents of this mapping population, Tift 99D2B1 and Tift 454, are dwarf, early-maturing grain types. Both parents carry the recessive dwarfing gene d2, which lies on LG 4 (Parvathaneni et al., 2013). We discovered very few markers on LG 4 compared to other LGs. Since the two parents inherited genomic regions from Tift 23D2B1, it is possible that this LG has few SNPs because of a region of common descent around the dwarfing gene d2. The male-sterile A-line Tift 99D2A1 and Tift 454 are the parents of the commercial hybrid known as TifGrain 102 (Hanna et al., 2005a, 2005b). Tift 99D2B1 was selected for resistance to rust and is derived from Tift 89D2 and also shares some genomic regions with Tift 23D2 (Hanna and Wells, 1993; Hanna et al., 2005b). It also appears to have resistance to Pyricularia leaf spot. Tift 454 was derived from an interspecific cross between pearl millet Tift 23D2A1 and a napiergrass [Cenchrus purpureus (Schum.) Morrone]–pearl millet hybrid and carries at least one A′ chromosome from the napiergrass parent (Hanna et al., 2005a). Tift 454 is resistant to nematodes [Meloidogyne areniaria (Neal) Chitwood and Meloidogyne incognita Kofoid & White] and has male-fertility restorer capability in A1 cytoplasm.
Regions of significant segregation distortion have been reported in previous genetic mapping studies in pearl millet (Qi et al., 2004; Rajaram et al., 2013; Moumouni et al., 2015), so it is not surprising that they were detected in this population as well. However, we found two regions of segregation distortion in this population that each spans nearly an entire LG (LG 1 and LG 3) (Supplemental Fig. S5). Such large regions of segregation distortion have not been reported in previous studies in pearl millet. Linkage Group 1 and 3 also had the highest number of discrepancies in comparison to the map of Rajaram et al. (2013) (Supplemental Fig. S4). According to Hanna et al. (2005a), the parental line Tift 454 (2n = 2x = 14) carries at least one pair of chromosomes from the A′ genome of napiergrass in place of a homologous chromosome pair from the A genome of pearl millet. The evidence here, namely nearly complete segregation distortion of two entire LGs along with a large number of map discrepancies, suggests that Tift 454 may in fact carry two napiergrass chromosomes. Linkage Groups 1 and 3 appear to represent two A–A′ chromosome pairs. Though the A and A′ genomes are reported to be homologous (Hanna, 1990), it is possible that the rate of recombination between the napiergrass and pearl millet chromosomes is lower than the rate of recombination between chromosomes originating from the same species. Evidence reported by Techio et al. (2006) suggests that the A and A′ chromosomes are likely to be homeologous rather than homologous. In addition, meiotic irregularities have also been reported in triploid (Techio et al., 2006) and hexaploid (Paiva et al., 2012) pearl millet–napiergrass hybrids. Interestingly, most of LG 1 is biased in favor of the Tift 454 parent, suggesting that the A′ chromosome transmits more frequently, whereas LG 3 is biased in favor of Tift 99D2B1, suggesting reduced frequency of transmitting this A′ chromosome. Though the RILs were selected randomly, the bias toward one parent or the other may also be an artifact of unintentional selection based on characteristics such as pollen viability or seed set under the selfing bag.
Utility of the Map in Tagging Disease Resistance Loci and Flowering Traits
The high-density GBS-based linkage map was validated by mapping QTLs for flowering time and Pyricularia leaf spot resistance. The leaf spot resistance loci identified in this study indicate that this trait is controlled by several loci from different LGs. In a previous study, a random amplified polymorphic DNA marker was identified as being associated with Pyricularia leaf spot resistance but was not assigned to any LG (Morgan et al., 1998). Research from ICRISAT, India, has mapped a leaf spot resistance QTL to LG 4 in a RIL population based on ‘ICMB841-P3’ × ‘863B-P2’ (Dr. R K Srivastava, personal communication, 2015), which was also associated with stover quality traits and was introgressed into the hybrid seed parent ‘ICMA/B 95222’ (Nepolean et al., 2006). ICMA/B 95222 is the seed parent of hybrid ‘HHB 146’ released from Chaudhary Charan Singh Haryana Agricultural University, Hisar (Dwivedi et al., 2012). The present study also identified a significant flowering time QTL on LG2, the same LG where the PHYC gene was significantly associated with flowering time (Saïdou et al., 2009) and several other flowering and drought tolerance QTLs were reported (Yadav et al., 2002, 2004, 2011b; Bidinger et al., 2007; Sehgal et al., 2012). Primer sequences from these studies were used to compare their location on our map (Supplemental Table S2). Based on their marker position in our map, our flowering time QTL locations do not correspond to the locations reported in these previous studies. However, it will be interesting to explore the potential candidate genes once the complete pearl millet genome sequence is available. The amount of phenotypic variation explained by these QTLs was low for these two traits despite the fact that H2 was quite high [H2 = 0.511 for days to flower (transformed) and H2 = 0.916 for adjusted disease score] and were comparable to other studies (Yadav et al., 2002, 2004; Nepolean et al., 2006; Dwivedi et al., 2012; Sehgal et al., 2015). The heritabilities for flowering trait were reported to be in the range of 47 to 94% in the previous studies (Yadav et al., 2004; Sathya et al., 2014; Sehgal et al., 2015). One explanation is that numerous QTLs, each with a very small effect, contribute to these traits (Yadav et al., 2003). Additionally, the QTL detection method used here (single-marker regression in R/qtl software) may underestimate individual QTL effects (Lander and Botstein, 1989; Zeng, 1994). The relative lack of markers on LG 4 [because of the apparent descent of much of LG 4 in both parents of the RIL populations from a common ancestor (Pyricularia leaf spot-susceptible Tift 23D2B1)], where a leaf spot resistance QTL was previously identified, could also explain why we did not identify this QTL. When these traits were mapped using a genetic map made without genomic sequences, many of the QTLs were still identifiable but appeared to have lost some significance, probably because they lacked the SNPs that were in tightest linkage with the causal locus (Supplemental Fig. S6). This also reflects that having genome sequence information will enhance QTL mapping. The QTL results reported here are based on a single season of data, so they will need to be validated by additional studies in more environments. Even so, the examples presented here demonstrate the utility of this genetic map for identifying QTLs.
This study used a RIL population, which allowed for a replicated field screen for disease response and flowering time. Such replication increases the accuracy of phenotyping, despite having only one season of data, and is not possible with F2 populations. Additionally, seeds of the RIL population can be distributed to other researchers to map other traits of interest without the need to reconstruct the genetic map.
Conclusions
Pearl millet is considered a minor crop in the United States and Europe, so development of genetic and genomic resources in this crop has lagged behind other cereals. It is, however, an essential staple crop in many parts of the world, particularly developing countries in hot semiarid and arid regions where little else will grow. Thus improvement of this crop is critically important for food security in these areas and may become critical to currently more favorable areas if global climate change continues unabated. Tools like molecular markers can facilitate rapid advances in crop improvement but the development of such resources was a formidable task in pearl millet until the advent of NGS-based markers like GBS. In this experiment, GBS markers were successfully used to make a high-density map containing 16,650 SNPs and 333,567 additional sequence tags, which is the densest map yet created in pearl millet. High-density linkage maps provide better map resolution and abundant genomic resources. A recombinant inbred mapping population created from an elite germplasm was used to construct this map so that useful and repeatable variation can be studied using this resource. These genome-wide markers can be used for applications such as marker-assisted selection, genomic selection, diversity studies, and comparative genomic analyses. The results will also help to identify and tag several traits related to disease and nematode resistance in pearl millet. In addition, understanding the genes underlying important traits in pearl millet, such as drought tolerance and nitrogen use efficiency, could help to improve these traits in other crops.
Supplemental Information Available
Supplemental material is available with this article.
Supplemental Table S1: Leaf spot scores and days to 50% flowering for parental lines, their F1 hybrid (TifGrain 102), and the RIL population.
Supplemental Table S2: Marker positions on the current map based on basic local alignment search tool (BLAST) hits.
Supplemental Figure S1: Read depth per sample. The number of sequencing reads matched to each individual is shown in order of increasing read depth. Gray bars represent RILs that were sequenced once each; black bars are the two parents (Tift 99D2B1 and Tift 454) and their F1 hybrid (TifGrain 102), which were sequenced twice (once on each plate). Five samples were removed because they had <5000 mapped reads each.
Supplemental Figure S2: Relatedness of RILs to parents. RILs (pale blue circles) are plotted according to their degree of relatedness relative to both parents. Darker colors indicate where points have stacked on top of each other.
Supplemental Figure S3: Linkage disequilibrium heatmap. Linkage disequilibrium (r2) heat map shown across the final genetic map for all pairwise SNP comparisons. Single-nucleotide polymorphisms are arrayed in map order on both the x and y axes and each point shows the pairwise linkage disequilibrium between a set of SNPs. The size of each block is proportional to the number of SNPs in each LG; the small number of SNPs in LG 4 is probably caused by a large chromosomal segment that is identical in both parents that is likely to have been inherited from their common ancestor, Tift 23D2B1.
Supplemental Figure S4: Comparison to existing pearl millet consensus map. Simple sequence repeat primer sequences from an existing SSR consensus pearl millet map (Rajaram et al., 2013) were aligned against the contigs used to call SNPs in the current map. The linkage map from this study (left-hand side, dark gray) is compared with the SSR consensus map (right-hand side, light gray). Black bars indicate markers that could be identified in both maps, with colored lines connecting each marker position to its corresponding position in the other map. Solid lines indicate markers that map to matching linkage groups (LGs); dashed lines indicate markers that map to different LGs; and line color indicates the LG in the current SNP-based map. Although many markers show good correlation, many also show inconsistent ordering. Large blocks of inconsistent markers may represent large translocations, such as between the consensus LG 1 and our LG 4 and between the consensus LG 6 and our LG 1.
Supplemental Figure S5: Map of segregation distortion in the pearl millet RIL population. Markers shaded in red are biased in favor of Tift 99D2B1; markers shaded blue are biased in favor of Tift 454. Markers with a χ2 value greater than the critical value are significantly distorted.
Supplemental Figure S6: Effect of genomic sequence on mapping quality. Quantitative trait locus maps for flowering time and leaf spot disease compared between the full linkage map and the map made without aligning sequences to the pearl millet genomic data.
Supplemental File S1 (Text files): All scripts and parameters used in the current experiment.
Supplemental File S2 (Excel files): Genotypic data for 16,550 loci used for final map creation and phenotypic data for leaf spot disease and flowering traits in 179 RILs.
Acknowledgments
We thank Eli Rodgers-Melnick for part of the R code for rippling LGs, the Pearl Millet Genome Sequencing Consortium for use of prepublication contigs and the Genomic Diversity Facility (Cornell University) for helpful advice on GBS analysis. We thank Ms. Chrisdon B. Bonner for helping us to improve the quality of the manuscript. We are also grateful for funding support received from the capacity building project by USDA-National Institute of Food and Agriculture Grant # GEOX-2008-02595, NSF Grant IOS-1238014, the University of Georgia, and the USDA-ARS. The authors declare that they have no competing interests related to the contents of this manuscript. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.