Journal list menu

Volume 5, Issue 1 e20030
Open Access

A UAV-based high-throughput phenotyping approach to assess time-series nitrogen responses and identify trait-associated genetic components in maize

Eric Rodene

Eric Rodene

Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Center for Plant Science Innovation, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Contribution: Data curation, Formal analysis, Methodology, Writing - original draft

Search for more papers by this author
Gen Xu

Gen Xu

Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Center for Plant Science Innovation, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Contribution: Data curation, Formal analysis, Writing - review & editing

Search for more papers by this author
Semra Palali Delen

Semra Palali Delen

Data curation

Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Center for Plant Science Innovation, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Search for more papers by this author
Xia Zhao

Xia Zhao

Writing-review & editing

Cereal Institute, Henan Academy of Agricultural Sciences, Henan International Joint Laboratory on Maize Precision Production, Zhengzhou, Henan, 450002 China

Search for more papers by this author
Christine Smith

Christine Smith

Data curation

Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Search for more papers by this author
Yufeng Ge

Yufeng Ge

Center for Plant Science Innovation, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Dep. of Biological Systems Engineering, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Contribution: Data curation, Writing - review & editing

Search for more papers by this author
James Schnable

James Schnable

Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Center for Plant Science Innovation, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Contribution: Data curation, Resources, Writing - review & editing

Search for more papers by this author
Jinliang Yang

Corresponding Author

Jinliang Yang

Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA

Center for Plant Science Innovation, Univ. of Nebraska-Lincoln, Lincoln, NE, 68583 USA


Jinliang Yang, Dep. of Agronomy and Horticulture, Univ. of Nebraska-Lincoln, Lincoln, NE 68583, USA.

Email: [email protected]

Contribution: Conceptualization, Data curation, Formal analysis, Funding acquisition, Project administration, Supervision, Writing - review & editing

Search for more papers by this author
First published: 11 February 2022
Citations: 10

Assigned to Associate Editor Seth Murray.


Advancements in the use of genome-wide markers have provided unprecedented opportunities for dissecting the genetic components that control phenotypic trait variation. However, cost-effectively characterizing agronomically important phenotypic traits on a large scale remains a bottleneck. Unmanned aerial vehicle (UAV)-based high-throughput phenotyping has recently become a prominent method, as it allows large numbers of plants to be analyzed in a time-series manner. In this experiment, 233 inbred lines from the maize (Zea mays L.) diversity panel were grown in the field under different nitrogen treatments. Unmanned aerial vehicle images were collected during different plant developmental stages throughout the growing season. A workflow for extracting plot-level images, filtering images to remove nonfoliage elements, and calculating canopy coverage and greenness ratings based on vegetation indices (VIs) was developed. After applying the workflow, about 100,000 plot-level image clips were obtained for 12 different time points. High correlations were detected between VIs and ground truth physiological and yield-related traits. The genome-wide association study was performed, resulting in n = 29 unique genomic regions associated with image extracted traits from two or more of the 12 total time points. A candidate gene Zm00001d031997, a maize homolog of the Arabidopsis HCF244 (high chlorophyll fluorescence 244), located underneath the leading single nucleotide polymorphisms of the canopy coverage associated signals were repeatedly detected under both nitrogen conditions. The plot-level time-series phenotypic data and the trait-associated genes provide great opportunities to advance plant science and to facilitate plant breeding.


  • FDR
  • false discovery rate
  • GWAS
  • genome-wide association study
  • HTP
  • high-throughput phenotyping
  • NDVI
  • Normalized-Difference Vegetation Index
  • RGB
  • red–green–blue
  • SNP
  • single nucleotide polymorphism
  • UAVs
  • unmanned aerial vehicles
  • VARI
  • Visible Atmospherically Resistant Index
  • VEG
  • Vegetative Index
  • VIs
  • vegetation indices

    Crop improvement has long been an important goal in agriculture and much research has been conducted toward improving plant traits such as grain yield and nitrogen (N) use efficiency. High-throughput phenotyping is an important and recent development for agriculture. It enables the quick and economical scoring of crop phenotypes on a large scale, potentially accelerating efforts toward further crop improvement via plant breeding. Numerous data acquisition methods have been used to collect data on crop plants for high-throughput phenotyping including the use of red–green–blue (RGB), thermal, infrared, and hyperspectral cameras carried on either unmanned aerial vehicles (UAVs) or orbital platforms (Araus & Cairns, 2014; Liebisch et al., 2015; Sankaran et al., 2021; Zaman-Allah et al., 2015). Unmanned aerial vehicle imaging can collect data from all plots in large field experiments more frequently than a person can walk the field and manually inspect plant plots. Aerial images can also identify differences in plot-level plant health, which may not be readily apparent from visual inspection on the ground. Furthermore, different crop genotypes often may react differently to the present growing conditions.

    Unmanned aerial vehicle imagery has seen extensive use in agriculture for a variety of purposes. It has been investigated for estimating plant height (Anderson et al., 2020; Han et al., 2018; Li et al., 2016; Pugh et al., 2018) and detecting genetic loci influencing this trait (Anderson et al., 2020). Numerous papers have investigated using UAVs to collect spectral data from crop field trials, which can be used to predict yield (Geipel et al., 2014; Ramos et al., 2020), generate crop surface models (Geipel et al., 2014), estimate plant height (Han et al., 2018; Li et al., 2016) or biomass (Li et al., 2016), and detect N stress (Buchaillot et al., 2019). Spectral data is often assessed in the form of vegetation indices (VIs).

    Numerous VIs have been defined and evaluated for quantifying different plant properties from the sensor data collected from cameras carried by aerial or orbital platforms (Araus & Cairns, 2014; Liebisch et al., 2015; Zaman-Allah et al., 2015). For cameras that incorporate measurement of near-infrared or red edge wavelengths, these include the Normalized-Difference Vegetation Index (NDVI) (Rouse et al., 1973), the Soil-Adjusted Vegetation Index (Huete, 1988), and the Leaf Water Content Index (Hunt et al., 1987). Indices also exist for use with RGB image data. As a result, aerial photography presents a simple way to rate the environmental response of each genotype through the plants’ greenness ratings derived from such RGB-based indices (Casadesús et al., 2007; Liebisch et al., 2015; Zaman-Allah et al., 2015). The greenness ratings can also correlate with the onset of flowering or senescence (Liebisch et al., 2015; Zaman-Allah et al., 2015). This process provides a high-throughput method by which various genotypes can easily be rated, and with proper understanding of these correlations, the developmental progression of plants in individual plots can be predicted based on the photography. Several previous studies have also investigated the feasibility of employing VIs to predict crop yield (Bolton et al., 2013; Gracia-Romero et al., 2017; Maresma et al., 2016; Panda et al., 2010; Satir et al., 2016; Vergara-Díaz et al., 2016). Various different indices have been developed, each one with unique properties and applications. Normalized-Difference Vegetation Index was found to be generally more effective than the Visible Atmospherically Resistant Index (VARI) and the Triangular Greenness Index in assessing field health although there were individual use cases where different indices were more effective (McKinnon et al., 2017). Indices with strong correlations to leaf area index, such as the VARI and NDVI, were found to be the most effective at predicting grain yield in rice, but indices incorporating near-infrared or red edge reflectance also tended to perform better than RGB-based indices (Zhou et al., 2017).

    Core Ideas

    • A workflow was developed to clip, filter, and analyze aerial RGB images.
    • Correlations were detected between image-extracted traits and ground truth phenotypes.
    • Biological relevant GWAS signals were repeatedly detected under different nitrogen conditions.

    Before the image ratings can be calculated, it is best to eliminate debris, shadows, and bare soil (Meyer et al., 2008) – all features unrelated to the crop foliage, and which will therefore affect the accuracy of the greenness ratings. A common method to eliminate unwanted pixels in crop images is to use VIs. Probably the most commonly-used index is the NDVI (Rouse et al., 1973), however, this method relies on near-infrared spectral data, which provided challenges for cost-effectively accessible RGB data.

    In this study, maize (Zea mays L.) RGB imagery data was collected for a maize diversity panel using UAV-based high-throughput phenotyping from a replicated field experiment with standard and N deficient management practices. A pipeline was developed using commercial software to extract images pertaining to individual genotypes in each plot. With these plot-level images, several RGB-based VI calculations were developed to compare N responses for different genotypes and over the different plant developmental stages. Correlation studies of the VIs with the plant physiological and yield-related traits suggested some VIs can be employed as indicators for predicting the phenotypic traits of interest. Finally, a genome-wide association study (GWAS) was conducted to identify genetic loci associated with image extracted traits. This plot-level high-throughput phenotyping method and the identified genetic loci have the great potential to benefit plant breeding.


    2.1 Field experimental design

    The plants were grown at the Havelock Research Farm of the University of Nebraska-Lincoln on the east edge of Lincoln, NE. In the previous year, maize was planted in the field with the assumption that N had been exhausted. The field was partitioned into four quadrants (NE, SE, NW, and SW). Before planting, two quadrants were applied urea (dry fertilizer) as a source of N at the rate of 120 lbs/acre (approximately 134.5 kg/ha) (the +N quadrants, NE and SW), and two quadrants received no treatment (the –N quadrants, NW and SE). Each quadrant was divided into six ranges, with 42 two-row subplots to each range, for a total of 252 subplots per quadrant. Row spacing was 30 inches (76.2 cm) and within row plant spacing was 6 inches (15.24 cm) with 38 plants per row, resulting in total subplot dimensions of 5 ft × 20 ft (1.524 m × 6.096 m). A total of 233 inbred genotypes from the maize diversity panel were grown using an incomplete split plot block design with between 27 and 37 plots of the check genotype inserted per quadrant. The planting date for the experiment was 1 June 2019.

    2.2 Phenotypic data collection

    On 1 Aug. 2019 during the plant development, plant leaf samples were collected from a representative plant when roughly 50% of the genotypes were tasseling or had already tasseled. Following our previous sampling procedure (Ge et al., 2019), three leaves (i.e., leaf 2, 3, and 4; leaf 1 was the flag leaf) were cut at the stem and immediately stored in an ice cooler. For each genotype, leaf area was measured and computed as the average of the three leaves. Later, the dried plant leaf samples were sent to Midwest Laboratories (a commercial lab) to measure N concentration using a LECO FP428 N analyzer.

    After maturity, three ears per two-row plot were hand harvested and dried at 37 °C for 2–3 d. Ears were hand shelled and 20 undamaged and otherwise representative kernels were collected from the bulked seed and weighed to determine 20 kernel weight.

    2.3 UAV imagery data collection and processing

    A total of 12 flights were conducted on dates between 6 July and 5 September using a Phantom 4 Pro UAV equipped with a RGB sensor with a resolution of 5,472 × 3,648 pixels. All raw image data is available via CyVerse (10.25739/4t1v-ab64). The resulting individual images were used to construct separate orthomosaic models of the field for each time point using Plot Phenix (Progeny Drone Inc.). Plot boundaries were manually defined for each time point. Plot Phenix was then used to extract the cropped original images of each plot from the raw images employed to generate the original orthomosaic. A given plot is often captured multiple times from different angles as the UAV makes its way across the field. Phenix will designate one of these replicates as that plot's reference replicate, which is the most nadir image available out of those replicates. Extracted plot images were of varying resolution (typically between 250 × 1,000 and 300 × 1,200 pixels). The reference image designated by Phenix for each plot was employed for downstream analyses.

    2.4 Vegetation indices calculation

    The process used in this study was to extract clipped images representing individual plots from aerial photographs, filter the plots to eliminate soil, shadows, and debris, and calculate a series of greenness ratings for each plot to be able to track fluctuations in the ratings throughout the season (see Supplementary Information for more details). In the analysis, different filtering methods were employed. To determine which method provided the best performance of eliminating soil and shadows while retaining the most foliage, we calculated the genetic variance using the check plots and selected the Excess Green Index filtration method with a threshold of 131, as this method returned the smallest genetic variance for the check plots.

    2.5 GWAS

    The best linear unbiased predictions values were used as phenotypes in GWAS. The best linear unbiased prediction values were estimated using an R add-on package “lme4” (Bates et al., 2015). In the analysis, the following model was fitted to the data: lmer (Y∼(1|Check) + (1|Genotype) + (1|Block) + (1|SubBlock) + (1|SubSplitBlock), where Y represents the phenotype. In the model, the Check, Genotype, Block, Sub Block, and Sub Split Block were treated as random effects. The single nucleotide polymorphisms (SNP) genotype of the maize diversity panel was downloaded from maize HapMap3 (Bukowski et al., 2018) with AGPv4 coordinates. After filtering out SNPs with minor allele frequency ≤ 0.05 among the 231 lines phenotyped in this study, approximately 21 million SNPs were retained for further analysis. Genome-wide association study was conducted using a mixed linear model (MLM) implemented in GEMMA (v 0.98.3) (Zhou et al., 2012). In conducting GWAS, the first three principal components calculated by PLINK 1.9 (Chang et al., 2015) and the genomic relationship matrix (or kinship) computed by GEMMA were fitted to control for the population structure and genetic relatedness, respectively. The threshold for the significant association SNPs was set to 1.2×10−6 (1/n, = 769,690 is the number of independent SNPs). Here, the independent SNP number was determined by using PLINK 1.9 with the indep-pairwise option (window size 10 kb, step size 10, r2 ≥ 0.1). GWAS peaks were then determined by considering a window 50 kb upstream and downstream of the significant SNPs. Overlapping regions were merged, and the regions with more than five significant SNPs were defined as high confidence association regions.


    3.1 A computational workflow to extract plot-level images from a replicated field experiment

    In the summer of 2019, a maize diversity panel consisting of 233 inbred lines drawn from the maize association panel (Flint-Garcia et al., 2005) was grown at a University of Nebraska-Lincoln experimental station based on an incomplete block design with two replications (Supplementary Figure S1). Each replication included two main plots either with or without N fertilizer treatment. For each plot, four split plots were blocked by plant height and maturity (see Materials and Methods). Each split plot was further subdivided into three split plot blocks. A hybrid check was randomized within each split plot block as a subplot. During the growing season, a Phantom 4 Pro UAV equipped with a RGB sensor was flown at about 22–27 m above ground and captured between 210 and 360 images with about 60% overlap each flight. A total of 12 flights were conducted between 6 July and 5 September. An average of approximately 2,500 clipped plot-level RGB images per quadrant per date were generated from the initial UAV imagery.

    To obtain phenotypic values for each genotype (i.e., the two-row subplots) from these images, a computational pipeline was developed to process the UAV data into plot-level images (see Materials and Methods). Briefly, (a) the aerial UAV images collected from each day were used to form an orthomosaic model of the entire field; (b) clipped images of individual subplots, depicting separate genotypes, were extracted using the commercial software Plot Phenix (Progeny Drone Inc.); (c) these plot-level images were filtered to remove non-foliage pixels; (d) the resulting binary mask was then used to avoid non-foliage pixels when calculating phenotypic values (i.e., Vis) (Figure 1). After the individual plot images were extracted, a pixel filtration procedure was implemented to remove non-foliage pixels (see Materials and Methods), as including non-foliage pixels in these calculations affects the accuracy of the greenness ratings (Meyer et al., 2008). It should also be noted that the ground area covered by the plot-level images varied somewhat, depending on the UAV's altitude relative to the plot.

    Details are in the caption following the image
    Workflow diagram of the unmanned aerial vehicle (UAV) image data processing. Individual UAV images (a) were used to generate an orthomosaic model (b) using Pix4D and Plot Phenix software. The partitioned genotypes (c) were extracted from the original UAV images by Plot Phenix, generating multiple replicates per plot. These images were then filtered using binary masks (d) to remove non-foliage pixels (e). The resulting images were then processed using a variety of vegetation indices to calculate average greenness ratings for each genotype at the point in the growing season the images were collected (f)

    3.2 Elevated canopy coverage in the N applied field

    Each designated two-row plot was captured in multiple UAV images taken in different positions above the field. The viewing angle between replicated photos taken of a single plot during the same flight is highly variable, significantly altering estimates of canopy coverage based on the binary masks (as seen in Figure 2). The most nadir image (or the image clip that has the shortest distance from the center of the photo to the UAV) in the set of images collected for each individual plot was selected as the reference replicate, and these reference images were used to calculate percent canopy coverage for each of the two row plots at each time point (Supplementary Table S1).

    Details are in the caption following the image
    Variability of plot replicate images. These images represent all replicate images of an example plot after being filtered to remove non-foliage pixels. They have been sorted in increasing order by the percentage of white pixels, representing foliage. These percentages give an estimate of canopy coverage of the plot. However, depending on the viewing angle, this percentage can vary dramatically. It is important to use the reference replicate (red box) as a standard, as this guarantees better consistency when comparing different plots

    The canopy coverages of the hybrid check in the +N quadrants were higher than the –N quadrants, although there is overlap in the overall ranges of values (Figure 3a). The differences were statistically significant on earlier days; that is, 6 July (8.0% difference, false discovery rate (FDR) = 1.6 × 10-10) and 12 August (10% difference, FDR = 2.4 × 10-31), but the differences became insignificant as plants reached maturity and began to senesce, that is, 5 September (1.6% difference, FDR = 0.06). This observation was expected since the N fertilizer was applied before planting, the hybrid check would reduce its positive reaction to N treatment as they develop. A similar pattern of reduced N reactions was observed for the n = 233 diverse inbred lines (Figure 3b). On July 6, the average ratio of canopy coverage (+N/–N) was 1.2 (paired t test, FDR = 2.0 × 10-36). The ratios reduced to 1.1 in 12 August (paired t test, FDR = 7.4 × 10-5) and stayed around this value later on and become insignificant on 14 August and 5 September (Figure 3b). Although the results suggested most of the genotypes react positively in terms of canopy coverage under +N treatment (i.e., 17 genotypes exhibiting a ratio > 1.5), there were a number of genotypes that reacted negatively to N (i.e., 76 genotypes showing a ratio < 0.8) (Supplementary Figure S2A). The responses of these genotypes to N treatments seemed largely consistent throughout the growing season (Supplementary Figure S2B).

    Details are in the caption following the image
    Time series canopy coverages throughout the plant growing season. (a) Canopy coverage for the hybrid checks with (+N) and without (-N) nitrogen treatment. The reference replicate images of each of the plots containing the hybrid check were used on each date. (b) The ratio of canopy coverage (+N/–N) for the n = 233 maize genotypes throughout the season. Asterisk indicates multiple test adjusted p-value < 0.05 using the false discovery rate (FDR) approach

    3.3 VIs correlated with leaf nutrient traits and ear-related traits

    Eight VIs were calculated from the images collected in this study: the Excess Green Index, the Red–Green–Blue Vegetation Index, the Normalized Green-Red Difference Index, the Green Leaf Index, the Modified Green–Red Vegetation Index, the VARI, the Vegetative Index (VEG), and the Woebbecke Index (Table 1). Consistent with previous reports (Hague et al., 2006), the VIs were sensitive to the light value and color saturation of the images (Supplementary Figure S3). After dividing by light and color saturation values, the normalized VIs can clearly distinguish the +N and –N quadrants using the hybrid check, where most of the VIs (7/8) exhibited higher values under +N treatment and the Woebbecke index showed significantly lower values (Supplementary Figure S4). Consistently, average ratios of the VIs (+N/–N) for the 233 genotypes were deviated from 1 (Supplementary Figure S5), suggesting their sensitivity in detecting N treatments (Supplementary Table S2).

    TABLE 1. A listing of the vegetation indices used in this study, with their formulas
    Vegetation index Formula

    Normalized Green–Red Difference Index

    Normalized Difference Index

    Green-Red Vegetation Index

    G R G + R $\frac{{G - R}}{{G + R}}$
    Modified Green–Red Vegetation Index G 2 R 2 G 2 + R 2 $\frac{{{G^2} - {R^2}}}{{{G^2} + {R^2}}}$
    Red–Green–Blue Vegetation Index G 2 R B G 2 + R B $\frac{{{G^2} - RB}}{{{G^2} + RB}}$
    Visible Atmospherically Resistant Index G R G + R B $\frac{{G - R}}{{G + R - B}}$
    Excess Green Index 2 g r b $2g - r - b$
    Green Leaf Index 2 G R B 2 G + R + B $\frac{{2G - R - B}}{{2G + R + B}}$
    Vegetative Index g r 0.667 b 0.333 $\frac{g}{{{r^{0.667}}{b^{0.333}}}}$
    Woebbecke Index g b | r g | $\frac{{g - b}}{{| {r - g} |}}$
    • Note. The variables R, G, and B represent the values of the red, green, and blue channels, respectively, of a given pixel. The variables r, g, and b represent these values normalized by dividing the red, green, or blue channel of a given pixel by the sum of the values of all three channels.

    The VI values were correlated with measurements of leaf N levels, leaf areas, and 20 kernel weights from sampled leaves and mature ears collected from the same field to evaluate the reliability of using VIs to predict physiological and yield-related traits (see Materials and Methods). Overall, statistically significant correlations were observed with mean Pearson correlation coefficients of r = 0.27 ± $ \pm $ 0.17 for leaf N level, r = 0.24 ± $ \pm $ 0.12 for leaf areas, and r = 0.16 ± $ \pm $ 0.08 for 20 kernel weight (Figure 4a). Over the growing season, the degree of correlation between VIs and ground truth traits exhibited remarkable fluctuations. The coefficient values peaked on 12 August and 22 August and dramatically reduced on 16 August and 26 August, partly because images were taken late in these days (between 7:30 and 8:00 p.m., local time), which had a noticeable effect on the light values of the images: 12 August had light values between 9.97 and 10.97, 22 August had light values between 10.64 and 11.95, but 16 August and 26 August both had light values between 6.97 and 8.32. Notably, unlike the other VIs, the VEG performed better on these unusual days, especially at predicting leaf area.

    Details are in the caption following the image
    Correlation of the vegetation indices (VIs) with physiological and yield-related traits. (a) Time-series correlation coefficients of the eight VIs with leaf nitrogen levels, leaf areas, and 20 kernel weights. The grey dashed line indicates the selected date for further investigation. The initials MGR, RGB, and NGR refer to the Modified Green–Red Vegetation Index, Red–Green–Blue Vegetation Index, and Normalized Green–Red Difference Index, respectively. Scatter plots of the VEG vs. nitrogen (b), Woebbecke vs. leaf area (c), and 20 Visible Atmospherically Resistant Index (VARI) vs. kernel weight (d) in the selected date of August 22 for +N and –N fields separately. Blue lines denote the fitted regression line using a linear model and grey shaded areas indicate the 95% confidence intervals. Text in the plots shows the Pearson correlation coefficients and associated p-values

    The eight VIs varied substantially in their predictabilities for different traits. The VEG outperformed other indices for the leaf N level, exhibiting the highest correlations on 22 August (Figure 4b). Although the Woebbecke index displayed a negative correlation with other traits, its’ absolute r values were mostly above other indices for predicting the leaf area trait (Figure 4c). It became more challenging to predict the yield-related trait (i.e., 20 kernel weight). Occasionally, the VARI exhibited a slightly higher correlation coefficient than the other VIs (i.e., in 22 August) (Figure 4d), but the overall performance for yield-related trait prediction was considerably lower than the leaf N level and leaf area traits.

    3.4 GWAS identified canopy coverage and VI-associated loci

    GWAS was conducted using a mixed linear model for the time-series canopy coverage and VI traits and a set of more than 20 million SNPs obtained from the whole-genome sequencing data (Bukowski et al., 2018). Using a modified Bonferroni threshold of 1.2×10−6 (1/n, = 769,690 is the number of independent SNPs) (Yang et al., 2021) and more than five significant SNPs within a 100-kb window as a criteria (see Materials and Methods), a total of nine unique regions were detected with significant associations with variation in canopy coverage (Figure 5, Supplementary Figures S7 and S8) and 129 regions exhibited significant associations with VI traits (see Supplementary Table S3 for trait-associated SNPs). For canopy coverage traits, three overlapping GWAS regions were detected between +N and –N fields, while nine (+N) and two (–N) treatment specific GWAS regions were found, suggesting genetic control for canopy coverage are moderately distinct under different N treatments. Compared with the canopy coverage traits, fewer consistent GWAS peaks at different dates were detected for each VI trait. For example, 10 (+N) and 10 (–N) unique GWAS regions were detected for the VEG, the index exhibiting the highest correlation with leaf N content; however, very few of these regions were repeatedly detected on more than two different dates (Supplementary Figures S9–S11).

    Details are in the caption following the image
    Genome-wide association study (GWAS) results for canopy coverage at different dates with (a) and without nitrogen (b) treatments. Red dots highlight the GWAS regions with more than five significant single nucleotide polymorphisms (SNPs) within a 100-kb window. Horizontal dashed lines indicate the GWAS thresholds

    A set of 29 unique regions were associated with the same image extracted feature in distinct analyses conducted on data from three or more of the 12 total time points phenotyped. Among these GWAS signals, three (10.3%) regions were also associated with variation in two or more distinct image extracted features (Figure 6a) (Bates et al., 2015). Notably, a canopy coverage associated GWAS peak located on the Chr1 208.8-209.1 Mb region was repeatedly detected in multiple days under both N conditions (Figure 5 and Figure 6a). The Zm00001d031997 gene, located 4.6 kb downstream of the leading SNP (Figure 6b), is a homolog of the Arabidopsis gene HCF244 (high chlorophyll fluorescence 244), which plays a role in the assembly of photosystem II (Chotewutmontri et al., 2020). The abundance of mRNA transcripts derived from Zm00001d031997 is much higher in mature leaves than in other tissues (Figure 6c), suggesting its potential effects on plant development.

    Details are in the caption following the image
    Single nucleotide polymorphism (SNP)–trait association frequency for the canopy coverage and vegetation index (VI) traits. (a) Physical positions and detection frequency (1 of 12 total time points, signals with SNP–trait association frequency < 1 were excluded to reduce the potential false discoveries) of genomic intervals associated with variation in canopy coverage and the VI traits. Dots and triangles indicate the phenotypes collected under +N and –N conditions, respectively. The vertical green dash lines indicate regions significantly associated with multiple traits. (b) The zoom-in plot of the association signals at Chr1 208.8-209.1 Mb region for the canopy coverage trait collected in the –N field on August 12. The dashed horizontal line indicates the Bonferroni-adjusted significance threshold (1.2×10−6). The gray rectangles indicate the gene models and the red rectangle highlights location of the candidate gene Zm00001d031997. (c) The gene expression level of Zm00001d031997 in different tissues (data obtained from Kremling et al., 2018; Zhou et al., 2020)


    In this study, a UAV-based high-throughput phenotyping pipeline has been developed to extract plot-level images, which were then used for GWAS to map a number of genetic loci associated with image-extracted traits. For the plot-level image processing pipeline, an algorithm was applied to the clipped images to filter out non-foliage elements and to calculate vegetation index averages over the surviving pixels. These greenness ratings provide a “snapshot” of that plot at specific points in the season. Some of the greenness indices have been demonstrated to be effective predictors of genotype response. For instance, the VEG can be used to predict response to N treatment, especially under –N conditions.

    By using the VIs and canopy coverage as the phenotypic traits, about 30 unique trait-associated loci were identified on at least two different dates, which may reduce the false discovery rate—a major drawback of the GWAS method (Miao et al., 2018). However, we didn't consider the number of different traits tested, as conventional multiple test correction procedures might be too stringent for the correlated and time-series traits. To resolve this limitation, more rigorous statistical methods need to be developed to better analyze time-series phenotypes for conducting GWAS. Nevertheless, a candidate gene (HCF244 or Zm00001d03199) was located in one of the most significant GWAS peaks for the canopy coverage trait under both +N and –N conditions. The HCF244 gene encodes a putative NAD(P)H-binding protein (Komenda et al., 2019). Recent evidence from studies in maize and Arabidopsis suggests the HCF244 is part of a protein complex involved in photosystem II (PS II) assembly and repair (Chotewutmontri et al., 2020). Because the protein in the PS II reaction center is subject to photodamage, the HCF244-related complex may play a key role in replacing the damaged protein with nascent one. Therefore, we speculate that the natural variations in the HCF244 gene or its regulatory modulation might affect the efficiency of the PS II protein assembling or repairing and eventually lead to the phenotypic consequences that can be detected by UAV. However, additional experiments need to be carried out to further elucidate the molecular mechanism of the genetic locus.

    Compared with multispectral systems, RGB cameras are less expensive and tend to provide high-resolution data. In addition, RGB data is easy to analyze as it can be evaluated by the human eyes to ensure the data quality. The high-resolution RGB images become especially critical to manually label the objects for training machine learning models. Recently, several machine learning models have been developed to detect flowering time using manually labeled RGB images (Alzadjali et al., 2021). Our discovery that the Woebbecke index showed a moderate level of correlation with the yield-related trait may provide a useful RGB-based index for further plant improvement.

    Nevertheless, the study of the UAV-based RGB image data has revealed several limitations. Virtually all of the selected VIs are highly sensitive to the color saturation of the images, producing higher values for images with greater saturation. This can easily be corrected for by normalizing the calculated VI ratings with the average saturation of each image. As was seen with the difference between the July 6 images compared with the rest of the season, it is also crucial to ensure identical camera settings are used. Furthermore, as exhibited in Figure 4a, it is also important to ensure the images collected are taken at a similar time of day, as images taken much later in the day will have lower light values. The Pearson correlation coefficient of the VIs has been shown to be sensitive to the light values of the image, and it will be important in the future to minimize variation in this regard. It should also be noted that there will naturally be some variation in the canopy coverage estimates of the plots, as this measurement is affected not only by how large the plants are, but also how well centered the plot is in the image, the ground surface area the image encompasses (which can be affected by UAV altitude), and how much intrusion there may be from leaves in adjacent rows. However, this can also be minimized by flying at a consistent altitude on each date and making use of the reference replicate images, which in most cases are the most centered and also depict a given plot from the most nadir angle.

    Given that UAV-imagery data limitations are adequately compensated for by applying consistent data collection techniques, it is the hope that UAV data can provide accurate prediction of crop response, such as using canopy coverage as a proxy to predict plant architecture, quantify photosynthetic efficiency, and ultimately predict final grain yield. In addition, the promising GWAS signals detected using the image-extracted trait provide great opportunities to advance plant sciences. Ultimately, the UAV-based pipeline and advanced statistical models have the potential to greatly benefit crop improvement in the future.


    Unmanned aerial vehicle-based high-throughput crop phenotyping has received increased attention in recent years. The ability to accurately characterize the plants growing in a testing plot, and to assess their phenotypes, provides a useful tool for plant breeding and precision agriculture. The pipeline we have developed focuses on the plot level, genotype-specific phenotyping in a time-series manner. The assessment of a given plot's health, and how it changes through a season, based on VI-derived ratings, is a useful tool to track a given genotype's development, including its response to N treatment. This study has sought a number of different results. We have selected a variety of traits of interest, such as leaf N level and 20 kernel weight. By analyzing a variety of different VIs, we were hoping to identify specific formulas that would serve as accurate predictors of these traits. To this end, this study has determined the VEG to be an accurate predictor of leaf N level. The Woebbecke index exhibits a negative correlation with leaf area. Finally, the VARI exhibited a fairly strong correlation with 20 kernel weight. Our GWAS analysis also identified a number of candidate genes relating to the UAV imagery data extracted traits, which may be elucidated further in future studies. Such high-throughput phenotyping methods could benefit agriculture by not only predicting crop response but also by identifying potential genes of interest for future crop improvement.


    This project is supported by the National Science Foundation under award number OIA-1557417 for Center for Root and Rhizobiome Innovation (CRRI), the Agriculture and Food Research Initiative Grant number 2019-67013-29167 from the USDA National Institute of Food and Agriculture, and the University of Nebraska-Lincoln Start-up fund. We thank Brandi Sigmon for assistance designing and conducting the maize field experiment and Alexandra Bradley, Nate Pester, Leighton Wheeler, and Mackenzie Zwiener for assistance in conducting the maize field experiment.


      Eric Rodene: Data curation; Formal analysis; Methodology; Writing – original draft. Gen Xu: Data curation; Formal analysis; Writing – review & editing. Semra Palali Delen: Data curation. Xia Zhao: Writing – review & editing. Christine Smith: Data curation. Yufeng Ge: Data curation; Writing – review & editing. James Schnable: Data curation; Resources; Writing – review & editing. Jinliang Yang: Conceptualization; Data curation; Formal analysis; Funding acquisition; Project administration; Supervision; Writing – review & editing.


      The authors declare no competing interests.


      The original UAV images, the clipped plot-level images, and the associated metadata used in this study have been deposited in CyVerse (10.25739/4t1v-ab64). The scripts used for processing and evaluating the imagery data have been deposited on GitHub (