Journal list menu

Volume 5, Issue 1 e20055
ORIGINAL RESEARCH
Open Access

EarCV: An open-source, computer vision package for maize ear phenotyping

Juan M. Gonzalez

Juan M. Gonzalez

Plant Cellular & Molecular Biology Dep., Univ. of Florida, Gainesville, FL, 32611 USA

Contribution: Conceptualization, Data curation, Formal analysis, Methodology, Software, Validation, Writing - original draft

Search for more papers by this author
Nayanika Ghosh

Nayanika Ghosh

Computer & Information Science & Engineering Dep., Univ. of Florida, Gainesville, FL, 32611 USA

Contribution: Software

Search for more papers by this author
Vincent Colantonio

Vincent Colantonio

Horticultural Sciences Dep., Univ. of Florida, Gainesville, FL, 32611 USA

Contribution: Data curation, Methodology, Software, Validation, Writing - review & editing

Search for more papers by this author
Francielly de Cássia Pereira

Francielly de Cássia Pereira

Federal Univ. of Lavras, Minas Gerais, Brasil

Contribution: Formal analysis, Resources

Search for more papers by this author
Ricardo A. Pinto Jr.

Ricardo A. Pinto Jr.

Federal Univ. of Lavras, Minas Gerais, Brasil

Contribution: Resources

Search for more papers by this author
Chase Wasson

Chase Wasson

Horticultural Sciences Dep., Federal Univ. of Lavras, 3200 E Canal St. S, Belle Glade, FL, 33430 USA

Contribution: Resources

Search for more papers by this author
Kristen A. Leach

Kristen A. Leach

Horticultural Sciences Dep., Univ. of Florida, Gainesville, FL, 32611 USA

Contribution: Resources, Writing - review & editing

Search for more papers by this author
Marcio F. R. Resende Jr.

Corresponding Author

Marcio F. R. Resende Jr.

Plant Cellular & Molecular Biology Dep., Univ. of Florida, Gainesville, FL, 32611 USA

Horticultural Sciences Dep., Univ. of Florida, Gainesville, FL, 32611 USA

Correspondence

Marcio F. R. Resende Jr., Sweet Corn Breeding & Genomics Lab, Horticultural Sciences Dep., University of Florida, Gainesville, FL 32611, USA.

Email: [email protected]

Contribution: Conceptualization, Funding acquisition, ​Investigation, Methodology, Project administration, Supervision, Writing - review & editing

Search for more papers by this author
First published: 19 October 2022
Citations: 2

Assigned to Associate Editor Melanie Ooi.

Abstract

Fresh market sweet corn (Zea mays L.) is a row crop commercialized as a vegetable, resulting in strict expectations for ear size, color, and shape. Ear phenotyping in breeding programs is typically done manually and can be subjective, time consuming, and unreliable. Computer vision tools have enabled an inexpensive, high-throughput, and quantitative alternative to phenotyping in agriculture. Here we present a computer vision tool using open-source Python and OpenCV to measure yield component and quality traits relevant to sweet corn from photographs. This tool increases accuracy and efficiency in phenotyping through high-throughput, quantitative feature extraction of traits typically measured qualitatively. EarCV worked in variable lighting and background conditions, such as under full sun and shade and against grass and dirt backgrounds. The package compares ears in images taken at varying distances and accurately measures ear length and ear width. It can measure traits that were previously difficult to quantify such as color, tip fill, taper, and curvature. EarCV allows users to phenotype any number of ears, dried or fresh, in any orientation while tolerating some debris and silk noise. The tool can categorize husked ears according to the predefined USDA quality grades based on length and tip fill. We show that the information generated from this computer vision approach can be incorporated into breeding programs by analyzing hybrid ears, capturing heritability of yield component traits, and detecting phenotypic differences between cultivars that conventional yield measurements cannot. Ultimately, computer vision can reduce the cost and resources dedicated to phenotyping in breeding programs.

Abbreviations

  • a*
  • axis relative to the green–red opponent colors
  • b*
  • axis relative to the blue–yellow opponent colors
  • BGEM
  • double haploid germplasm enhancement of maize
  • BLUPs
  • best linear unbiased predictors
  • CIELAB
  • International Commission on Illumination
  • COV
  • coefficient of variance
  • CV
  • computer vision
  • DSLR
  • digital single-lens reflex camera
  • HSV
  • hue saturation, value
  • KRN
  • kernel row number
  • L*
  • lightness value
  • ML
  • machine learning
  • PCA
  • principal component analysis
  • QR
  • quick-response
  • RGB
  • red–green–blue
  • RMSc
  • average root mean squared distance for single color checker in RGB space
  • RMSt
  • total root mean squared distance averaged over all 24-color checkers
  • 1 INTRODUCTION

    Continuous yield increase of agricultural plants is imperative to sustainably meet the global demand for food by a growing population. Breeding has been combining field experimentation, selection, and quantitative genetics to make genetic gains in most major crops, as exemplified in corn (Zea mays L.; Andorf et al., 2019), rice (Oryza L.; Yu et al., 2020), and wheat (Triticum aestivum L.; Venske et al., 2019). In sweet corn, conventional field breeding programs are still the main method driving genetic gain. Genomic selection is expected to accelerate vegetable breeding as it has in row crops but will require efficient and robust phenotyping to calibrate models (Shakoor et al., 2019). One of the biggest challenges toward increasing genetic gain in breeding programs is the ability to phenotype large populations, especially in the case of vegetable breeding programs where many phenotypes are jointly utilized to drive selection decisions. In large breeding programs it is often impractical, time-consuming, and expensive to have trained experts measure many plant features (Wu et al., 2019).

    High-throughput plant phenotyping via computer vision (CV) and machine learning (ML) can be used to address a diversity of phenotyping challenges in plant sciences, including increasing plant breeding efficiency and understanding the molecular underpinnings of traits of interest (Gaillard et al., 2020; Gehan et al., 2017; Shakoor et al., 2019). In the last decade, many high-throughput plant phenotyping methods have improved phenotyping by increasing the number of individuals phenotyped, providing novel quantitative phenotyping to previously qualitative approaches, increasing the speed at which features are measured, and reducing subjectivity, time, and labor (Yang et al., 2021). Deployment of high-throughput phenotyping applications can be split into collection, extraction, and modeling. Methods in each category have different barriers of entry based on cost, skill, generalizability, and sensitivity. Some examples of data collection systems include tower-based, gantry-based, ground mobile, low- and high-altitude aerial, and satellite-based systems (Jiang & Li, 2020). However, these methods can generate massive amounts of data, which must be efficiently stored, processed, and managed to maximize its utility (Coppens et al., 2017). Image data can vary by type such as digital, near-infrared, fluorescence, thermal, multi/hyperspectral, and 3D imaging (Yang et al., 2021). There are many methods used for data extraction, all of which must be developed for a specific data type. Data collection, extraction, and modeling must balance resources while maximizing data volume, accuracy, customizability, and ease of use. For example, PlantCV is an image analysis software that contains processing and normalization tools, leaf segmentation, morphometrics tools, and ML modules (Gehan et al., 2017). Machine learning and deep learning can be used to extract complex features from images, usually by training a model using a ground truth dataset. This model is applied to new images to extract relevant information. For example, these models have been for semantic segmentation of maize ears, Arabidopsis leaves, maize kernels, and rice foliar diseases (S. Chen et al., 2021; Hüther et al., 2020; Shakoor et al., 2019; Warman & Fowler, 2021; Yang et al., 2021). Although high-throughput plant phenotyping can address major phenotyping bottlenecks in plant science research, digital phenotyping comes with its own set of challenges that need to be addressed to obtain the desired information accurately and efficiently. Ultimately, these approaches can improve the speed and accuracy of phenotyping while reducing manual labor (Perez-Sanz et al., 2017; Shakoor et al., 2019).

    Core Ideas

    • EarCV enables high-throughput phenotyping for corn ear traits, yield components.
    • Tip fill, taper, curvature, and color can be quantitatively measured.
    • We analyzed 5,392 diverse inbred, early-stage hybrid, and commercial hybrid ears with EarCV.
    • EarCV has demonstrable applications in quantitative genetics and breeding.

    Originally developed by indigenous American peoples since pre-Columbian times, sweet corn is now consumed in all continents. Unlike field corn or popcorn, sweet corn is picked roughly 21 d after pollination while the water content is high. One of its main markets is the commercialization of fresh ears with little to no processing, resulting in unique ear requirements (Hallauer, 2000). Ear length, ear width, tip fill, taper, curvature, color, and kernel row number (KRN) are key sweet corn traits that determine marketable yield and may affect the selling price for the grower (USDA-Agricultural Marketing Service, 2021). Measuring ear length, ear width, and KRN is possible but time-consuming, labor-intensive, and prone to errors, which constrains most breeding programs in their ability to perform large scale evaluation for these traits. The interest in characterizing these traits in the breeding program of other types of corn is also growing, because they are component traits of total yield that generally have higher heritabilities than overall yield (Peng et al., 2011). Therefore, it is possible to characterize and select distinct lines with enhanced specific individual yield components before developing a combined commercial line (Miller et al., 2017). Using this principle, multiple groups have developed CV and ML approaches toward ear and kernel phenotyping (Brichet et al., 2017; Kienbaum et al., 2021; Liang et al., 2016; Makanza et al., 2018; Miller et al., 2017; Warman et al., 2021). These tools have demonstrable advantages compared with manual phenotyping. However, the deployment of some of these tools is not feasible in our use-case due to material expenses and/or inflexibility in the image capturing requirements. In addition, none of these available tools comprehensively cover feature extraction for phenotypes relevant to sweet corn. We did not explore the use of tools with large initial costs such as drones or hyperspectral sensors, which are expensive and have significant data-processing bottlenecks (Shi et al., 2016). We avoided supervised ML methods because these require a large initial investment in labeling data (Kienbaum et al., 2021). We sought to develop a phenotyping tool able to make comparisons across images and environments and focused on ear traits that are relevant for fresh market sweet corn breeding. Our goal was to develop a tool with a low barrier of entry in terms of cost and imaging requirements, while balancing ease-of-use, generalizability, and sensitivity.

    Here we report the implementation of a newly developed package EarCV, which uses the open-source CV library, OpenCV, and Python to automatically measure whole-ear traits for any number of nonoverlapping ears in a single image regardless of background and ear orientation. Our objectives were to (a) develop a CV-based approach for efficient, reproducible, accurate, and objective whole-ear phenotyping; (b) validate package performance against manually acquired data; and (c) probe EarCV's application in a public sweet corn plant breeding program. First, we tested EarCV's performance in variable lighting and background conditions using a common set of sweet corn ears. Then, we validated EarCV-derived length and width estimates against ground-truth measurements. We justified complex feature extraction methods for color, tip fill and curvature by exploring the resulting distributions of an inbred sweet corn ear dataset. We tested ML predictive ability on KRN. Lastly, we deployed the tool in applied breeding by using it to evaluate commercial hybrid sweet corn cultivars and early-stage hybrids. This algorithm is open-source and can be used by the public to accelerate their own phenotyping efforts.

    2 MATERIALS AND METHODS

    2.1 Image collection

    A Nikon model D750 was used with a Nikon DX lens, 29.87 cm aperture, 52 mm diameter, and ISO 1200 fixed to the ceiling using a double-jointed ball-head clamp. The clamp allows the camera to be fixed over a table with a uniformly colored background. We linked the camera to a phone via wireless connection for rapid picture taking. Included in Dataset 1 and 2 images was a color checker for color normalization and/or a square piece of paper with known dimensions for pixel to centimeter or inches conversion. Images were saved in .raw and .jpeg formats with an image size of 2,586 × 3,235 pixels. An 8-megapixel iPhone 6 camera was used under standard, automatic focus, and exposure settings.

    2.2 Datasets

    We built three different sweet corn ear datasets: commercial hybrids (Dataset 1), early-stage hybrids (Dataset 2), and diverse inbreds (Dataset 3). In addition, to demonstrate the broad application of the algorithm, a collection of 182 publicly available photographs from the double haploid germplasm enhancement of maize (BGEM) inbred collection was downloaded and phenotyped using EarCV (Dataset 4) (Vanous et al., 2018).

    The commercial hybrid trial (Dataset 1) consisted of a set of eight hybrid cultivars grown in two replicates in Hastings, FL, in the winter of 2020 using a randomized complete block design (Ribeiro da Silva et al., 2021). Each rep consisted of 16 blocks and each treatment was four 9 m rows of the same genotype, 1 m spacing between rows, and 15.50 cm spacing between plants. These blocks repeated twice per genotype per rep for a total of 16 blocks. Only the middle two rows per block were harvested at 24–28 days after antithesis. Five ears from each subplot, for a total of 247 hybrids, were phenotyped by hand for width, USDA fancy grade, and KRN. A subset of 120 were phenotyped for length. Length was measured using a traditional ruler, and the width of the middle of each ear was obtained using a digital caliper. KRN was counted by hand. Quality grade was assigned based on the published USDA guidelines (USDA-Agricultural Marketing Service, 2021). This dataset was phenotyped without drying to simulate grading for quality and fresh market consumption. An analysis of variance was used to determine if there were any significant differences in features, ear weights, and yield estimates between cultivars with a significance threshold of p < .05 in R followed by mean separation using Tukey's honestly significant difference test with a confidence level of 0.95. To test how lighting, background color, and camera can affect the performance of EarCV, a subset of six hybrid ears were photographed in 21 different lighting and background conditions using two cameras, a Nikon DSLR (digital single-lens reflex camera) and an iPhone 6 camera.

    The early-stage hybrid trial (Dataset 2) was planted in Belle Glade in the winter of 2020. The hybrid trial was derived from a set of 155 inbreds crossed between each other using a North Carolina II scheme resulting in a total of 531 lines. Seeds were hand planted approximately 15.5 cm apart in 1 m rows. Ears were harvested at 21 d after pollination, dried for 3 wk, and photographed. Using a pedigree, broad sense heritability was calculated for each feature and hybrids were ranked based on the resulting best linear unbiased predictors (BLUPs) using the following mixed linear model on ASREML-R:
    Y ijk = μ + R i + C m / B k + B k + T l + e ijkl \begin{equation}{Y}_{{\rm{ijk}}} = {\rm{\ \mu }} + {R}_i + {C}_m/{B}_k\ + {B}_k\ + \ {T}_l + {e}_{{\rm{ijkl}}}\end{equation} (1)
    where Yijk is the phenotypic performance of the hybrid; μ is the overall mean; Ri is the fixed effect of rep i; Cm/Bk is the fixed effect of the incomplete block k given by check m; Bk is the random effect of incomplete block k where Bk ∼ N(0,Iσk2); Tl is the random effect of maternal genotype l, where Tl ∼ N(0,Aσg2); A is a numerator pedigree matrix; and eijkl is the random effect error, distributed as eijkl ∼ N(0,Iσe2). Iσk2 is the block variance, Aσg2 is the genetic variance, Iσe2 is the residual variance, and i = (m+l). Broad-sense heritability was calculated as the proportion of genetic variance over the total phenotypic variance.

    The diverse inbred dataset (Dataset 3) is a subset of a recently established sweet corn inbred diversity population, that expanded on previously developed sweet corn populations (Baseggio et al., 2019). The population consists of 693 diverse genotypes, most of which are inbred lines representing the sweet corn diversity in the United States (Hu et al., 2021). Diverse inbreds were grown in Citra, FL, in the spring of 2019 in rows of 12 plants 18 cm apart with 1 m row spacing. Two-hundred fifty distinct inbred ears were dried for at least 3 wk after being harvested about 60 d after pollination. Ears were manually phenotyped for length and width and photographed. Length was measured using a traditional ruler, and the width of the middle of each ear was obtained using a digital caliper.

    2.3 Optional features: QR Code, color normalization, and pixels per metric

    We developed three optional features to enable robust comparisons across images in a breeding context. The quick-response (QR) code extractor module is an implementation of the pyzbar.decode function to analyze QR codes printed on white envelope stickers (Natural History Museum of London, 2016/2020). This module scans the image for the QR code, extracts the information from the code, and masks the sticker for future analysis (Figure 2a). Pixels per metric is determined via a filtering approach based on size and aspect ratio to segment out the single largest uniform-colored square from the image and measures its side length. The function takes a numerical argument that specifies the length units (e.g., centimeters or inches) then translates the square's pixel length to unit length. This pixels per metric ratio is used downstream for ear feature extraction (Figure 2b).

    The color normalization module is an implementation of the plantcv color pallet extraction module and Finlayson's color homography correction approach (Berry et al., 2018; Finlayson et al., 2016; Gehan et al., 2017). This module segments out the color checker passport from the reference and target images, creates gray-scale masks for each color checker, calculates a transformation matrix, and applies this transformation matrix to the source image for color normalization. An alternating least square technique (Finlayson et al., 2016) is used for the calculation of the color homography matrix between color space of target and reference images. The root mean square error is computed before and after normalization for each target image with respect to the reference image to monitor performance:
    RMS c = R s R t 2 + G s G t 2 + B s B t 2 3 \begin{equation}{\rm{RMS}}_c = \frac{{\sqrt {{R}_s - {R}_t^2 + {G}_s - {G}_t^2 + {B}_s - {B}_t^2} }}{3}\end{equation} (2)
    RMS t = RMS c 24 \begin{equation}{\rm{RMS}}_t = \frac{{{\rm{RMS}}_c}}{{24}}\end{equation} (3)
    where Rs, Rt, Gs, Gt, Bs, and Bt are the red, green, and blue intensity values for the reference and target images, respectively; RMSc is the root mean squared distance for each checker, and RMSt is the root mean squared distance averaged across all 24 checkers. Theoretically, the RMSt difference can range from 0 to 255, or the full spectrum of grayscale intensities.

    2.4 Finding, cleaning up, and orienting ears

    Find_ears.py is the first default module in the package. Its purpose is to segment out any number of ears in frame, filter out silks and debris, and orient the ears. This module is subdivided into four customizable functions: background segmentation, connected component filtration, clean-up, and orientation. User-defined channel and threshold settings allow for default or custom background segmentation. The resulting foreground is then piped into the ear filter function, which removes foreground objects based on area, aspect ratio, and solidity. Area is calculated using Green's theorem (Lang, 1987). Solidity is a measure of density, useful to quantify the amount and size of concavities in an object boundary (Zdilla et al., 2016). Under default settings, this filter selects objects that are 1.5% < x < 15% of the total image area, have an aspect ratio 0.19 < x < 0.6, and a solidity < 0.98. These settings were optimized using all included datasets to find roughly oval shaped objects of a reasonable size range. These filter parameters are customizable.
    Solidity = Area of object Area of convex hull \begin{equation}{\rm{Solidity\ = }}\frac{{{\rm{Area\ of\ object}}}}{{{\rm{Area\ of\ convex\ hull}}}}\end{equation} (4)
    Aspect ratio = Width Length \begin{equation}{\rm{Aspect\ ratio\ = }}\frac{{{\rm{Width}}}}{{{\rm{Length}}}}\end{equation} (5)
    To avoid two ears being analyzed as one, or co-segmenting (Figure 3c), in an image, EarCV monitors the area coefficient of variance (COV) of all the ears in a single image:
    Area COV = x x ¯ 2 n x ¯ \begin{equation}{\rm{Area\ COV}}\ = \ \frac{{\sqrt {\frac{{\sum {{\left| {x - \bar{x}} \right|}}^2}}{n}} }}{{\bar{x}}}\end{equation} (6)
    where x is the area of each ear, x ¯ $\bar{x}$ is the average area of all ears in the image, and n is the number of ears. Co-segmentation of ears artificially inflates the area COV, triggering the ear clean-up function if area COV is greater than 0.35. This value was found to be generally robust in the identification of co-segmented ears. Co-segmentation is addressed by performing morphological opening to remove noise until the area COV falls below the default threshold or iterates 10 times. If the area COV is high due to natural variation in the ear sizes as opposed to co-segmentation, then morphological opening should hold above said threshold after the 10th iteration. At this point, the COV filter is ignored, and the next module is executed.
    Besides co-segmentation, silk debris may remain which can affect the accuracy of feature extraction such as width and length (Figure 3b). An adaptive silk clean up feature was developed to remove silk noise from ears. To clean up silk and other debris, change in convexity is monitored after morphological opening. Convexity is a measure of roundness, or the ratio of the perimeter of an object's convex hull to the perimeter of the object. Convexity is sensitive to differences in the perimeter, which make it a good measurement for detecting silk and debris noise. Under default settings, a convexity score of less than 0.87 will trigger the convexity clean up module. This value was optimized based on our datasets to clean up ears with silks.
    Convexity = Arclength of object Arclength of convex hull \begin{equation}{\rm{Convexity\ = }}\frac{{{\rm{Arclength\ of\ object}}}}{{{\rm{Arclength\ of\ convex\ hull}}}}\end{equation} (7)
    Δ convexity = Convexity 1 Convexity 0 \begin{equation} \Delta\mathrm{convexity} =\mathrm{Convexity}_{1} - {\mathrm{Convexity}}_{0} \end{equation} (8)

    Morphological opening is performed to remove noise until the ∆convexity is >4% or iterates 10 times. A threshold is placed on the change in convexity such that each ear is cleaned up relative to its own convexity score and it caps at 10 iterations. Each iteration increases the strength of the opening morphological operation by increasing the opening kernel size in proportion to the iteration (e.g., if i = 3, then kernel is 3×3).

    Lastly, the rotation function is optional and useful if the ears were photographed with the tips of different ears pointing in different directions in the same image. This simple function works by rotating all the ears such that the length is always longer than the width. Once all the ears are arranged vertically, they are broken into three equal portions. If the tip is wider than the base, then the object is rotated 180°. This approach is based on the assumption that ears have a wider base than tip, which may not hold true in very diverse germplasm or in fasciated ears. The final part of this module finds the smallest possible rectangle that will encompass the entire object. Each ear is segmented using these rectangular coordinates to generate and save the ear region of interest as a .png file.

    2.5 Curvature, taper, shape descriptors, color, and tip fill

    Tip fill is an important trait in sweet corn. Tip fill is a ratio calculated between the kernel area and the total ear area. An algorithm was developed to segment kernels from exposed cobs. To simplify the approach, the task is partitioned into two steps: (a) segment kernels from exposed cob at the base of the ear and (b) segment kernels from exposed cob at the tip of the ear. First, ears are separated by kernel color. White kernels are best segmented from the cob in the saturation channel, whereas yellow-orange kernels are easily segmented in the hue channel. Both channels are handled similarly downstream: the best threshold to separate the cob from the kernel is calculated via Otsu's method. This algorithm calculates an intensity threshold that minimizes the weighted within-class variance arising from separating gray scale pixel values into two classes (Otsu, 1975). Morphological closing of the resulting object using a 3×3 kernel is applied to address inconsistencies in segmentation between ears, such as aborted embryos around the tip that have a kernel like appearance but are not a proper kernel. For tip fill calculation, only the largest object is considered the kernel-containing portion of the ear and everything else is cob.
    Tip fill = Kernel area Total ear area \begin{equation}{\rm{Tip\ fill\ = \ }}\frac{{{\rm{Kernel\ area}}}}{{{\rm{Total\ ear\ area}}}}\end{equation} (9)

    To analyze curvature, the ear is split into 20 regions along its length. For each region, EarCV calculates the standard deviation of the center of gravity along the x axis. The curvature of an ear is proportionate to the standard deviation along the x axis. Taper is calculated by measuring the object width within regions along the top half the ear. The more these widths vary, the higher the taper. Convexity and solidity are used to describe the shape of the ear. For a detailed description of all ear features, see Supplemental Table 1.

    Length and width were extracted from the contour. Ear length and width measurements are approximated in three different ways: (a) by drawing the smallest possible box enclosing the contour and using its dimensions, (b) by finding the longest possible line along the length and width of the contour, and (c) by fitting an ellipse to the contour. Hereafter these methods are referred to as box dimensions, maximum distance, and ellipse fit, respectively. These three approaches were compared with ground truth measurements for the diverse inbred trial and the commercial hybrid trial. To compare which approach, out of the three yielded the best results, a multivariate model was fit with the CV methods as the response variable and the ground truth measurement as the predictor variable. Here the regression coefficients are equal to the correlation coefficients.

    EarCV measures dominant colors in the red–green–blue (RGB), HSV (hue, saturation, value), and CIELAB (International Commission on Illumination) color spaces using a K-means approach. Pixels are clustered in two groups, foreground (the ear excluding cob) and the background, and the average color of the foreground represents the dominant color in each ear.

    2.6 KRN

    First, we tested the possibility of predicting KRN, a 3D feature, from a 2D image using a CV and signal processing approach. For each ear, peak detection analysis is performed iteratively over the bottom two-thirds of the ear in overlapping windows. Each 2D ear slice is converted to a 1D signal by taking the sum of green intensity values along the horizontal axis. The green channel is used because it maximizes contrast between kernel boundaries and the middle of the kernel. The dark regions around the edges of the kernels correspond to lower values in the 1D array, whereas the brighter regions of the kernel correspond to the higher values (Figure 7a). This signal is smoothed with scipy by fitting a fourth degree polynomial within sliding windows of 81 pixels wide along the x axis (Virtanen et al., 2020) (Figure 7a). Local minima are called that correspond to kernel boundaries. The average distance between minima is used to estimate the mean kernel width and kernel width standard deviation. The kernel width, or length of the chord, is calculated as the distance between the two inner peaks. These features are averaged across slices along the length of the ear. Kernels are assumed to be roughly periodic, resulting in an expected low kernel width standard deviation. To prevent erroneous minima calls, kernels with a width standard deviation higher than 10 are filtered out. The known diameter of the cob and chord distance, or kernel width, are used to calculate how many chord slices (kernels) fit in the predicted area of the circle (cob) to estimate KRN:
    1. chord, c = median distance bewtween peaks & radius, r = half the width of the ear &

    2. central angle , a = sin 1 ( c 2 / r ) 2 ${\rm{central\ angle}},\ a\ = \ {\sin }^{ - 1}{( {\frac{c}{2}/r} )}^2$

    3. A rea of sector contained in central angle , A s = a * r 2 2 $\mathrm{A}\mathrm{rea}\ \mathrm{of}\ \mathrm{sector}\ \mathrm{contained}\ \mathrm{in}\ \mathrm{central}\ \mathrm{angle},\ \ {A}_{s}=\frac{a\ast {r}^{2}}{2}$

    4. Area of the circle , A c = π r 2 ${\rm{Area\ of\ the\ circle}},{\rm{\ \ }}{A}_c = {\rm{\ \pi }}{r}^2$

      KRN = A c A s \begin{equation}{\rm{KRN\ }} = \ \frac{{{A}_c}}{{{A}_s}}\end{equation} (10)

    The estimated KRN values were further classified into three general categories useful for breeding using a K-means clustering approach. The categories represent materials with low KRN at 12 or 14 rows, medium KRN at 16 rows, and high KRN at 18 or more rows. Classification rate was determined by calculating the ratio of correctly classified KRN ̂ $\widehat {{\rm{KRN}}}$ to incorrectly classified KRN ̂ $\widehat {{\rm{KRN}}}$ .

    We sought to improve upon this method by leveraging information from other features generated by EarCV in addition to the predicted EarCV KRN ̂ $\widehat {{\rm{KRN}}}$ . To do this, we implemented a gradient boosting ML approach using XGBoost (T. Chen & Guestrin, 2016). In total, all 38 features generated by EarCV were used to train the model (Supplemental Table 1). Model classification rate was assessed across a 10-fold cross validation by calculating the ratio of correctly classified KRN ̂ $\widehat {{\rm{KRN}}}$ to incorrectly classified KRN ̂ $\widehat {{\rm{KRN}}}$ .

    2.7 USDA Sweet corn grade standards

    The USDA provides predefined standards for husked sweet corn that correspond to grades “Fancy”, “No.1”, and “Off-grade” (Table 1). These are based on length, tip fill, and damage or decay. A simple feature was implemented that scores length and tip fill based on these predefined standards (USDA-Agricultural Marketing Service, 2021).

    TABLE 1. Definition of ear quality grades based on tip fill and length
    Grade Length Tip fill
    inch cm %
    Fancy 5 ≥ 12.7 ≥ 87.5 ≥
    No. 1 5 > x > 4 12.7 > x > 10.16 87.5 > x > 79.2
    Off-grade ≤4 ≤10.16 ≤79.2

    3 RESULTS

    3.1 Data and general workflow

    We developed an open-source package to automate measurements of sweet corn ear quality and yield component traits. Basic imaging considerations are as follows: (a) ears should have the shanks removed before photographing; (b) photographs can be taken with any device, at a roughly perpendicular angle from the ears; and (c) images should not be severely under or overexposed (Supplemental Figure 2) or have nonuniform lighting (Supplemental Figure 3). Our command-line, Python-based package is available for public use in the github repo: https://github.com/Resende-Lab/EarCV. Image analysis can be done with default settings in a single command. However, if more control is desired for a specific function, each function can run with custom settings. Our workflow analyzed as many nontouching ears as fit within frame, segmented ears independent of background color and texture, assuming high contrast against ears (e.g., black cloth, white poster board, grass, dirt). We included optional features to enable high-throughput analysis of images for breeding: A QR code printed on a white background was used to automatically attach field information to ears in the image. A color checker passport was used for color correction and as a size reference. In the absence of a color checker passport, a square solid-colored piece of paper with known dimensions was used as a size reference.

    Four datasets comprising diverse phenotypes for traits of interest were evaluated: (a) a commercial hybrid trial (267 ears, 8 cultivars, Dataset 1) (Ribeiro da Silva et al., 2021), (b) early-stage hybrid trial (2,780 ears, 531 genotypes, Dataset 2); (c) diverse inbred sweet corn panel of 250 ears with no genotypic replication (Dataset 3) (Hu et al., 2021); and (d) the BGEM collection (1,408 ears, 182 genotypes, Dataset 4) (Vanous et al., 2018). These datasets capture a large phenotypic range to test the flexibility of the algorithm. We found EarCV implementation to be more efficient than manual phenotyping of our traits of interest. In our case, it takes at least 25 s to manually measure the length, width, and KRN of a single ear. In comparison, it takes about as much time to photograph one image with 20 ears in it. A basic personal computer can extract length, width, curvature, taper, color, and tip fill from > 3,000 ears overnight at about 10 s per ear.

    3.2 Image analysis pipeline

    A gray scale image is a matrix of values that range from 0 to 255. A color image is a compilation of three gray scale matrices of identical dimensions (e.g., red, green, and blue channels produce an RGB color image). At its core, EarCV uses morphological transformations, color-space transformations (e.g., RGB to HSV), to filter out and describe (e.g., area, aspect ratio, solidity) regions of interest from an image matrix. EarCV integrates image and signal processing functions from PlantCV, OpenCV, numpy, scipy, and pyzbar—all open source libraries in Python (Bradski, 2000; Gehan et al., 2017; Harris et al., 2020; Natural History Museum of London, 2016/2020; Virtanen et al., 2020). The image analysis process is partitioned into QR code extraction, color normalization, pixels per metric calculation, ear segmentation, and feature extraction modules (Figure 1). After feature extraction, all measurements are normalized against the pixels per metric calculation to determine their unit length values. Each module has default settings and is customizable to fit the user's specific needs. Each module creates a proof and saves results in a database so the user can monitor performance.

    Details are in the caption following the image
    EarCV workflow. The input image is optionally scanned for quick response (QR) code information, color corrected using a color checker, and measured using a size reference. Each ear in the image is first segmented and then analyzed individually. Outputs include image proofs to monitor performance of each module and a database of results. Orange icons represent image files, pink icons represent database outputs, green icons represent functions, and blue icons represent user defined parameters. Dashed lines represent optional modules and solid lines represent default modules

    3.3 Color normalization

    To test the performance of this algorithm, we performed color normalization on 12 different environments using a common color checker passport reference. The 12 environments were a combination of the following factors: photographs taken with Nikon D750 or an iPhone 6; the use of a light box or lack thereof; a black cloth, asphalt, soil, or grass background; inside lighting; outside shadowy lighting; and outside full sun lighting at different angles (Figure 2c). The light box was a wooden frame covered with polyester diffusion fabric and did not include any added light-source. The fabric helped diffuse location dependent ambient light. Color normalization effectively reduced the root mean squared difference in color between the reference and target images as measured by RMSt. The RMSt decreased by almost 80 relative units in every image photographed with the Nikon D750(Figure 2d). Color normalization underperformed when using a reference image taken with the Nikon to correct images taken with the iPhone and failed when there were uneven shadows cast on the color checker. RMSt only decreased by 5.03 in the image with uneven lighting cast on the color checker and by 60 or less with any image taking with an iPhone (Figure 2d).

    Details are in the caption following the image
    Optional features. Quick response (QR) code scanner proof (a). The bounding box shows where the QR code was found, and the information contained in the QR code is printed in red font. Pixels per metric proof. The purple line shows pixel length of the reference object that is used to calculate the pixels per metric conversion ratio (red font) based on a used defined reference length (b). Examples of four environments used to test color normalization performance: outside, light box, uneven shade (c); inside, light box (d); outside, dirt, shadow (e); outside, grass, shadow (f). Color normalization performance across 12 environments as measured by mean root mean squared red–green–blue (RGB) distances averaged across 24 color checkers (g)

    3.4 Segmentation stability across environments

    Ear segmentation based on default K-means background segmentation, size, aspect-ratio, and convexity thresholds works for ∼85% of the 5,392 ears photographed in this project. Failed segmentation was easily corrected with custom thresholding settings. Figure 3a shows the default segmentation performance for an image with 10 ears. To test the performance of our segmentation algorithm in variable background and lighting conditions, we segmented the same six ears in 12 environments and measured their area. A wide range of exposure settings were used to capture ears in these environments. Shutter speeds ranged from 1/30 to 1/12,500 and f-number ranged 11/5 to 160/10(Supplemental Table 4). Considering the camera, settings used can inform the functional range of image capture settings that gives a positive segmentation result. Overall, EarCV was able to segment the ears in all backgrounds tested and resulted in high correlations of ear area across environments (Figures 2c and 3d, Supplemental Table 4). The adaptive segmentation feature was able to segment ears that were very close or even touching (Figure 3c). The rotation feature used to align ears worked in every hybrid ear but failed in the case of inbreds ears that have a wider tip than base.

    Details are in the caption following the image
    Ear segmentation module. Input image and proof after default background segmentation and object filtering based on size, aspect ratio, and convexity (a). Results from the silk clean up module based on change in convexity after morphological closing (b). The red lines show the width of the object before and after silk clean up. Example of co-segmentation of ears that are connected by silks. Clean up feature triggered by area coefficient of variance and addressed with morphological closing (c). Correlation matrix of ear area calculated by EarCV for 6 ears across 12 environments (d). Environments with the lowest correlations were caused by grass or sunlight interference as shown by ear proofs. Concrete 1, outside, concrete, shadow; Concrete 2, outside, concrete, direct sun; Dirt 1, outside, dirt, direct sun; Dirt 2, outside, dirt, shadow; ; Full Sun, outside, direct sun; Grass, outside, grass, shadow; Inside 1, inside, light box; Inside 2, inside, no light box; iPhone 1, iPhone, inside, light box; iPhone 2, iPhone, inside, no light box; Outside 1, outside, uneven shadow; Outside 2, outside, light box

    3.5 Ear features

    3.5.1 Length & width

    Following gray scale background segmentation and the removal of debris and silk features, regions of interest were converted into binary images for contour analysis to define the border of each ear. Length and width were extracted from the contour. Ground truth measurements were compared with the measurements derived from EarCV to analyze performance. Correlations were high for both inbreds and hybrids for all approaches with an R2 ranging from 0.88 to 0.99 for width and length, respectively (Figure 4a–d). Overall, length estimates were more accurate than width estimates. A least-squares means method was used to make pairwise comparisons for each method. These comparisons gave no significant differences at a p value threshold of 0.05 (Supplemental Table 2).

    Details are in the caption following the image
    Length and width validation using box dimensions, maximum distance, and ellipse fit methods. Length correlations of dried diverse inbred ears (a). Width correlations of dried diverse inbred ears (b). Length correlations of fresh commercial hybrid ears (c). Width correlations of fresh commercial hybrid ears (d)

    3.5.2 Color analysis

    EarCV was used to extract color information in the RGB, HSV, and CIELAB color spaces—which can be useful as a breeding target and offers a quantitative approach to color measurement. The diverse inbred trial (Dataset 3) captured a wide range of colors within sweet corn from pale white to deep orange (Figure 5a). The BGEM collection (Dataset 4) also contained a wide range of colors in field corn (Figure 5b). Principal component analysis is used to depict color variation in 2D space (Supplemental Table 2). In both cases, the RGB color space effectively separated individuals based on color compared with HSV and CIELAB color spaces

    Details are in the caption following the image
    Dominant color principal component analysis in red–green–blue (RGB) color space. The numerical values correspond to the proportion of variance explained by each principal component. Color of each point in the scatter plot corresponds to the kernel red, green, and blue intensities extracted by EarCV. (a) Diverse inbred dataset (Dataset 3) (a). BGEM dataset (Dataset 4) (b); each point represents the average color for all ears for each genotype

    3.6 Curvature, taper, and tip fill

    Curvature, taper, and tip fill were features that other published tools cannot capture and are key yield component traits in fresh market sweet corn. These feature extraction methods were not validated using ground truth measurements because they are difficult to measure quantitatively. Instead, we used our datasets to survey the performance of these feature extractors by examining the resulting distributions. The extremes of taper, curvature, and tip fill distributions in our diverse inbred dataset (Dataset 3) correspond to expected phenotypes. For example, a high taper score corresponds to a very pointy ear, a low curvature score corresponds to a straight ear, and a tip fill score close to one corresponds to well-filled ears (Figure 6, Supplemental Figure 1).

    Details are in the caption following the image
    Distribution violin plots of taper, curvature, and tip fill for the diverse sweet corn inbred dataset. Examples of ears at the extremes of the distributions for each trait are shown

    3.7 Classifying KRN

    To test our ability to predict KRN, we first scored the KRN of 250 ears by hand and then estimated KRN values for those same ears using EarCV (EarCV KRN ̂ $\widehat {{\rm{KRN}}}$ ). The ground truth KRN values were assigned to low, medium, and high categories. We found the classification rate between the estimated EarCV KRN ̂ $\widehat {{\rm{KRN}}}$ and the ground truth KRN to be 0.56(Figure 7b). We sought to improve upon this method by implementing a gradient boosting ML approach using XGBoost (T. Chen & Guestrin, 2016). The XGBoost model achieved a classification rate of 0.68(Figure 7b). The ML approach improved classification rate by 12%. Nonetheless, we think that the classification rate is still limited by the amount of data used in the training set and more improvements can be achieved. We consider limitations of this approach in the discussion section.

    Details are in the caption following the image
    Kernel row number (KRN) prediction. Flattening the Green channel intensities results in a 1D signal (Blue) which is smoothed by fitting a fourth-degree polynomial within overlapping windows of 81 pixels (Red) (a). Dark regions around the edges of the kernels correspond to local minima (Green triangles) in the 1D array, whereas the bright yellow regions of the kernel correspond to the local maxima. Classification rates of computer vision-based and machine learning-based KRN predictions with ground truth (b). Orange circles correspond to correct calls and green circles correspond to incorrect calls. Circle size is proportionate to the number of call in each group

    3.8 Applications in breeding

    3.8.1 Early-stage hybrids evaluation

    The features extracted from real breeding datasets using EarCV was used to show its usefulness in a breeding program. The early-stage hybrid trial (Dataset 2) was derived from a set of 155 inbreds crossed using a North Carolina II scheme resulting in a total of 531 crosses. One to three ears from each plot were photographed, and EarCV was used to extract features. To see how well EarCV captures genetic variation for sweet corn traits of interest, we calculated BLUPs in our experiment. Using a pedigree, broad sense heritability was calculated for each feature and hybrids were ranked based on the resulting BLUPs. Heritabilities are summarized in Table 2. BLUPs for ear width, average color intensity as described in the red channel, ear area, and tip fill are depicted in Figure 8A. Heritabilities for these quality yield component traits range from 0.01 to 0.66. These results indicate that EarCV may not robustly capture the genetic signal of color as explained by hue (h2 = 0.01). In addition, the results suggest that some traits may be highly dependent on environment such as curvature (h2 = 0.10)—where the physical position of ear emergence against the stem can greatly impact how the ear curves. Area, width, and RGB color descriptors had high heritabilities, whereas shape descriptors and curvature had low heritabilities. All heritabilities had a low standard error.

    TABLE 2. Broad sense heritability for ear features derived from an early-stage hybrid trail (Dataset 4). Features extracted using EarCV
    Trait Broad sense heritability SE
    Krnl_Area 0.65886 0.01926
    Red 0.53762 0.02458
    Vol 0.52383 0.02483
    Blue 0.4819 0.02454
    B_chnnl 0.47691 0.02576
    Light 0.46786 0.02546
    Ear_Width 0.44456 0.02438
    Green 0.43539 0.02576
    A_chnnl 0.42413 0.02495
    Sat 0.41662 0.02566
    KRN_Grnd 0.40081 0.02444
    Ear_Area 0.31566 0.02482
    Convexity 0.30391 0.02507
    USDA_Grade_Len 0.30268 0.02509
    Tip_Fill 0.30151 0.02519
    Tip_Area 0.30046 0.02474
    Ear_Length 0.29668 0.02475
    Widths_Sdev 0.27842 0.02482
    Perimeter 0.27729 0.02441
    Taper_Convexity 0.26302 0.02533
    Taper 0.25468 0.02475
    Solidity 0.20723 0.02483
    USDA_Grade_Fill 0.17208 0.02357
    Median_Kernel_Width 0.15606 0.02205
    Curvature 0.10605 0.0198
    Hue 0.01536 0.00809
    • Note. All colors (blue, red, green, hue, saturation, volume, light, a channel, b channel), pixel intensity value of the kernel area; Bottom_Area, area of the bottom of the ear not including kernels; Bottom_Fill, ratio of bottom cob area over total ear area; Cents_Sdev, SD of the midpoint of 20 evenly-spaced slices along the length of the ear; Convexity, ratio of ear perimeter over convex hull perimeter; Convexity_polyDP, ratio of smoothed ear perimeter over convex hull perimeter; Ear_Area, area of the entire ear; Ear_Box_Area, area of the smallest bounding box containing the ear; Ear_Box_Length, length of the smallest bounding box containing the ear; Ear_Box_Width, width of the smallest bounding box containing the ear; Ear Number, ears enumerated from left to right; Krnl_Area, area of the kernel portion of the ear; Krnl_Fill, ratio of kernel area over total ear area; Krnl_Convexity, ratio of ear perimeter over convex hull perimeter containing the kernels; Kernel_Length, length of the smallest bounding box containing the kernels; Max_Width, width as measured at the widest part of the ear; Perimeter, perimeter of the ear contour; Solidity, ratio of ear area over the convex hull area; Taper, SD of the 10 slices along the top half of the ear; Taper_Convexity, ratio of perimeter over convex hull perimeter of the top half of the ear; Taper_Convexity_polyDP, ratio of smoothed perimeter over convex hull perimeter of the top half of the ear; Taper_Solidity, ratio of area over the convex hull area of the top half of the ear; Tip_Area, area of the cob tip not including kernels; Tip_Fill, ratio of tip cob area over total ear area; Widths_Sdev, SD of the width of 20 evenly spaced slices along the length of the ear.
    Details are in the caption following the image
    EarCV used to analyze two real world hybrid datasets. Early-stage hybrid trial (Dataset 2) best linear unbiased estimator distributions for select traits (a). Heritabilities for eat trait reported in blue. Comparison of widths across eight elite hybrid cultivars (b). Letters denote significance. Correlations of yield to EarCV features in the commercial cultivar experiment (c)

    3.8.2 Commercial hybrids evaluation

    Eight commercial hybrid cultivars (Dataset 1) were evaluated with EarCV and conventional yield measurements. Harvest yields among the eight cultivars ranged from 780 to 1,079 crates ha–1 (assuming 48 ears/crate), and the difference between the lowest yielding cultivar, Cultivar 3 was significantly different from highest yielding Cultivar 7. There was no significant difference in unhusked weight among the top seven cultivars (data not shown). As in the unhusked weight experiment, cultivars did not have a significant effect on most EarCV features (Supplemental Table 5). However, there were significant differences between cultivars for width, tip area, tip fill and green channel intensities. For example, Cultivar 3 was significantly wider than 2 and 7 (Figure 8b). In addition, features from EarCV were correlated to yield data from the same experiment after averaging by commercial cultivar. Ear length, ear area, curvature, and tip fill all had correlations to yield between 0.47 and 0.7 (Figure 8c).

    4 CONCLUSIONS AND FUTURE DIRECTIONS

    4.1 Algorithm implementation and segmentation performance

    Overall, EarCV is an open-source photometry tool to automate extraction of features specifically relevant in fresh-market sweet corn ears. EarCV can phenotype any number of nontouching ears without shanks, dried or fresh, in any orientation, in a wide range of backgrounds and lighting conditions. For example, the color normalization feature allows the user to photograph ears throughout the day while controlling for variation in lighting. The use of a QR code scanner and a pixel to metric conversion allows the user to easily compare thousands of images even if the images were taken at variable distances from the objects of interest. Although the tool can run on default settings by only providing an input path, each major step can be customized to fit the user's need with custom flags. This tool may be reasonably applied to phenotype a variety of other objects with the right custom settings such as other fruits or vegetables. We evaluated three datasets (Dataset 1–3) comprising commercial hybrids, early-stage hybrids, and diverse inbreds for ear traits of interest as well as an independent dataset from the BGEM repository (Dataset 4). EarCV successfully segmented and cleaned up most ears across all datasets. Images with unusually small ears, large pieces of debris, or dark colored kernels required further processing. Nevertheless, custom background segmentation and filter settings successfully addressed the issues. The tool was designed with high-throughput application in mind by making it flexible and customizable while maintaining simplicity. We segmented the same six ears in different lighting and backgrounds. Ear area has highly correlated across lighting and background conditions showing that segmentation performance was adaptable. The iPhone 6 and DSLR cameras did not have any white-balance issues. Segmentation performance dropped off when light at low angles cast long shadows and when the color checker had nonuniform lighting due to shadows. These problems were much more common when using low-budget point-and-shoot cameras (Supplemental Figures 2 and 3). As cameras on hand-held devices improve, we expect dependency on DSLR cameras for high-quality imaging to shrink. Features on newer phone cameras could help simplify the segmentation of ears from background, such as the use of ‘portrait mode’ on newer iPhone models. How automatic blurring of the background from the subject with portrait mode affects the use of EarCV was not explored. However, based on the principle of K-means segmentation, we expect images taken with portrait mode to easily integrate into the use of EarCV: blurred pixels of similar background colors should cluster together and away from sharp pixels contained in the ear.

    There are various segmentation solutions in the field of ML, including supervised and unsupervised methods. Unsupervised clustering methods for image segmentation can be categorized as hierarchical divisive, hierarchical agglomerative, partitional soft, and partitional hard (Mittal et al., 2021). Special consideration must be taken when budgeting the amount of work required to train supervised methods. We took a parsimonious approach in our segmentation method development by testing the simplest segmentation methods first. Segmentation is addressed within EarCV using the Otsu and the K-means methods of segmentation. The Otsu method is a robust, yet simple heuristics-based approach to binary image segmentation and has been used in CV for over 40 yr (Otsu, 1975). The K-means approach is simple, relatively fast compared with other clustering segmentation methods, and always converges (Mittal et al., 2021). K-means is limited by the fact that the number of clusters must be known a priori. However, because our images always seek to separate the background from the subject, we always used two clusters. Furthermore, another limitation of a K-means approach is sensitivity to color (Warman et al., 2021). We address this problem by performing color normalization using a color checker across images before performing ear segmentation. In short, we used simple and easy to deploy segmentation methods instead of more complicated methods. The performance of these methods in our testing of different background, lighting conditions, and diverse phenotypes did not merit exploring more complex segmentation algorithms. Nevertheless, a more robust progression of this tool would include the use of supervised ML methods. With the right training data, ML semantic segmentation models for ear, cob, kernel, shank, and ear damage could automate this tool such that the user would not need to optimize performance with custom flags. Current literature highlights the applicability of such methods applied for ear segmentation. For example DeepCOB (Kienbaum et al., 2021), uses a convolutional neural network-based deep learning model to automate segmentation of diverse ears against uniform colored backgrounds and extract basic morphological features. The bottleneck to this approach is the initial image labels for training and the computing power required to train the models with the benefit of automating segmentation for images downstream. It is important to consider that retraining of models may be required in cases where new input images are very different from the original images the algorithm was trained on. Overall, EarCV can tolerate variation in image background and lighting while still accurately segmenting ears using simple, proven segmentation methods.

    4.2 Feature extraction and performance

    Length, width, KRN, and quality grades were validated against manually derived ground-truth measurements in diverse, dried inbred ears and fresh commercial ears. We compared three methods for deriving length and width but found no significant difference in the resulting correlations against ground-truth measurements. However, based on average correlation performance for dried, fresh, inbred, and hybrid ears, we propose the maximum distance method as the most robust approach. We also extract complex features such as taper, curvature, tip fill, and color in the diverse inbred dataset. Based on the resulting trait distributions, we show that calculating the x axis centroid standard deviation along the length of the ear works well as a proxy for curvature. We were able to demonstrate that measuring the width standard deviation along the length of the top half of the ear works well as a measure of taper and tip fill can be quantitatively measured by segmenting the cob from the kernel region using an adaptive Otsu's method. Our principal component analysis of color shows that different channels contribute to different proportions of the observed variance with little commonality between the datasets (Supplemental Table 3). This suggests that each dataset should be analyzed independently for color. These differences could be explained by the variable conditions in which the datasets were photographed, as well as the fact that the BGEM collection (Dataset 4) is a field corn collection with mostly smooth field corn kernels, whereas the diverse inbred dataset (Dataset 3) is dried sweet corn with textured, wrinkly kernels. Regardless, in both cases working in the RGB space most effectively separates individuals based on color and ear segmentation worked for both datasets.

    4.3 Applications in breeding

    This high-throughput photogrammetric approach to phenotyping maize ears greatly increases efficiency without sacrificing accuracy in breeding programs. We demonstrated its application in breeding by evaluating an early-stage hybrid trial (Dataset 2). We apply a mixed linear model with pedigree information to the features extracted by EarCV to calculate trait heritability and rank genotypes based on BLUPs. Trait heritability for width and length fall within expected ranges; previous experiments in Citra, FL, using F4 sweet corn found broad-sense heritability between 0.09–0.24 for these traits (U. Kumar, unpublished data, 2018). Using EarCV, we quantitatively measured tip fill, curvature, and taper. We used these phenotypic data to calculate heritability for these traits. This analysis shows that EarCV can be used to capture heritable variation of ear quality yield component traits. Currently, breeders bundle these traits into an ear package score. This tool could be used to guide selections directly for each of these ear quality yield component traits instead of indirectly by selecting on these traits by using an ‘ear package’ quality score.

    In addition, we used EarCV in a yield experiment to compare eight commercial hybrid cultivars. There was significant differences in the number of ears and crates per hectare (Ribeiro da Silva et al., 2021). Our approach found significant differences between cultivars in width, tip fill, and color as described by the red channel, among other traits. Similar to the yield weight and measures of quality on the same field experiment, there was no statistical differences for most other EarCV traits between cultivars (Ribeiro da Silva et al., 2021). The lack of significant differences may be due to the small sample size of commercial material whose ear traits have already been nearly fixed to match strict commercial production expectations. Regardless, this analysis shows that EarCV can be used to compare even similar commercial cultivars and score them quantitatively for traits that a routine yield experiment could not account for. Correlations of ear length, tip fill, ear area, and curvature to yield show that these features are positively correlated with yield, further defining these features as yield component traits. Because of its more efficient and quantitative approach to phenotyping, we expect that this tool will enable faster and more accurate selections for quality and yield components in fresh market sweet corn. For estimating KRN, our peak calling approach alone did not correlate with the ground truth KRN as well as other features. However, the implementation of the XGBoost approach made use of image features, statistical moments, and chord slices to improve our KRN predictive ability. This approach was limited to well-filled ears because the textured surface of dried sweet corn ears dilutes the kernel signal in the 1D array. Other sources of noise to the 1D signal that impaired our ability to call kernel boundaries was kernel abortion, row merging, and cases where kernels twisted along the length of the ear. Furthermore, our XGBoost model was limited by the small dataset and the uneven representation of KRN classes to predict. At this time, KRN predictions are exploratory and not recommended for deployment in routine phenotyping efforts.

    4.4 Limitations and future work

    This tool is built upon basic CV principles using common algorithms without the need to manually label thousands of datapoints or use expensive computing resources. Instead, EarCV uses basic descriptors such as area, aspect ratio, and solidity to segment out ears. For this reason, it cannot differentiate between an ear of corn and a similar object such as a banana using default settings. There are special cases where segmentation and feature extraction fail due to a lack of contrast between the regions of interest and the background. For example, segmentation can fail when ears do not contrast well against the background, and tip fill estimation can fail when cob color is similar to kernel color. In these cases, EarCV can be used with custom settings to circumvent segmentation issues. The algorithm does not segment individual kernels from each other, missing out on interesting yield component traits such as kernel shape descriptions that could improve KRN prediction or be used in yield prediction. The features extracted our KRN estimation methods may include error due to inherent texture of kernels, bicolored nature of kernels, and the fact that some ears have disorganized rows of kernels. Improved classification accuracies from this approach may be obtained with a larger training dataset and more even representation of materials across the different KRN categories.

    EarCV can process ears with many diverse phenotypes in varying imaging conditions but has some limitations. For example, most of the time required to photograph ears is spent dehusking, removing silks, and detaching shanks. Several researchers have used infrared radiation to phenotype kernel chemical composition, kernel fungal contamination, and viability (Agelet et al., 2012; Oury et al., 2021; Wang et al., 2015; Yao et al., 2013). Using cameras to capture signals outside the visible spectrum has the potential to be used for phenotyping ears without removing husks. Infrared radiation methods have been developed to segment leaf veins from blade and to diagnose Anthracnose infection (Wang et al., 2015). Even more, near-infrared interactance spectroscopy has been used to detect internal necrosis in sweetpotatoes [Ipomoea batatas(L.) Lam.] (Kudenov et al., 2021). Infrared radiation or hyperspectral camera costs would present a bottleneck for large-scale deployment of this approach but has the potential to circumvent husking in the phenotyping process.

    Alhough other maize ear phenotyping tools have been developed, none of these comprehensively address key phenotypes important in sweet corn (Brichet et al., 2017; Kienbaum et al., 2021; Liang et al., 2016; Makanza et al., 2018; Miller et al., 2017; Warman et al., 2021). Unlike field corn, sweet corn ear aesthetics such as color, curvature, taper, and tip fill are key determinants of hybrid performance and yield. Many of these methods focus on kernel phenotypes extracted from ears instead of focusing on whole-ear phenotypes. Furthermore, the deployment of some of these tools is not feasible in our use-case due to material expenses and/or inflexibility in the image capturing requirements. For example, Kienbaum et al. (2021) developed a state-of-the art semantic segmentation algorithm, deepCOB, using supervised ML methods. However, deepCOB does not capture taper or tip fill. In conclusion, EarCV and other CV and ML tools have the potential to address phenotyping as a major limitation toward genetic gain in plant breeding programs by increasing efficiency, number of samples observed, accuracy, and reproducibility.

    ACKNOWLEDGMENTS

    This work was supported by the National Institute of Food and Agriculture (SCRI 2018-51181-28419 and AFRI 2019–05410 to Marcio F. R. Resende, Jr.) and by the University of Florida Plant Breeding Graduate initiative. We also thank Nicole Beisel and Christina Finegan for assisting with final manuscript reviews.

      AUTHOR CONTRIBUTIONS

      Juan M. M. Gonzalez: Conceptualization; Data curation; Formal analysis; Methodology; Software; Validation; Writing – original draft. Nayanika Ghosh: Software.Vincent Colantonio: Data curation; Methodology; Software; Validation; Writing – review & editing. Francielly de Cássia Pereira: Formal analysis; Resources. Ricardo A. Pinto, Jr.: Resources. Chase Wasson: Resources. Kristen A Leach: Resources; Writing – review & editing. Marcio F. R. Resende, Jr: Conceptualization; Investigation; Methodology; Project administration; Supervision; Writing – review & editing.

      CONFLICT OF INTEREST

      The authors declare no conflict of interest.