EarCV: An open-source, computer vision package for maize ear phenotyping
Assigned to Associate Editor Melanie Ooi.
Abstract
Fresh market sweet corn (Zea mays L.) is a row crop commercialized as a vegetable, resulting in strict expectations for ear size, color, and shape. Ear phenotyping in breeding programs is typically done manually and can be subjective, time consuming, and unreliable. Computer vision tools have enabled an inexpensive, high-throughput, and quantitative alternative to phenotyping in agriculture. Here we present a computer vision tool using open-source Python and OpenCV to measure yield component and quality traits relevant to sweet corn from photographs. This tool increases accuracy and efficiency in phenotyping through high-throughput, quantitative feature extraction of traits typically measured qualitatively. EarCV worked in variable lighting and background conditions, such as under full sun and shade and against grass and dirt backgrounds. The package compares ears in images taken at varying distances and accurately measures ear length and ear width. It can measure traits that were previously difficult to quantify such as color, tip fill, taper, and curvature. EarCV allows users to phenotype any number of ears, dried or fresh, in any orientation while tolerating some debris and silk noise. The tool can categorize husked ears according to the predefined USDA quality grades based on length and tip fill. We show that the information generated from this computer vision approach can be incorporated into breeding programs by analyzing hybrid ears, capturing heritability of yield component traits, and detecting phenotypic differences between cultivars that conventional yield measurements cannot. Ultimately, computer vision can reduce the cost and resources dedicated to phenotyping in breeding programs.
Abbreviations
-
- a*
-
- axis relative to the green–red opponent colors
-
- b*
-
- axis relative to the blue–yellow opponent colors
-
- BGEM
-
- double haploid germplasm enhancement of maize
-
- BLUPs
-
- best linear unbiased predictors
-
- CIELAB
-
- International Commission on Illumination
-
- COV
-
- coefficient of variance
-
- CV
-
- computer vision
-
- DSLR
-
- digital single-lens reflex camera
-
- HSV
-
- hue saturation, value
-
- KRN
-
- kernel row number
-
- L*
-
- lightness value
-
- ML
-
- machine learning
-
- PCA
-
- principal component analysis
-
- QR
-
- quick-response
-
- RGB
-
- red–green–blue
-
- RMSc
-
- average root mean squared distance for single color checker in RGB space
-
- RMSt
-
- total root mean squared distance averaged over all 24-color checkers
1 INTRODUCTION
Continuous yield increase of agricultural plants is imperative to sustainably meet the global demand for food by a growing population. Breeding has been combining field experimentation, selection, and quantitative genetics to make genetic gains in most major crops, as exemplified in corn (Zea mays L.; Andorf et al., 2019), rice (Oryza L.; Yu et al., 2020), and wheat (Triticum aestivum L.; Venske et al., 2019). In sweet corn, conventional field breeding programs are still the main method driving genetic gain. Genomic selection is expected to accelerate vegetable breeding as it has in row crops but will require efficient and robust phenotyping to calibrate models (Shakoor et al., 2019). One of the biggest challenges toward increasing genetic gain in breeding programs is the ability to phenotype large populations, especially in the case of vegetable breeding programs where many phenotypes are jointly utilized to drive selection decisions. In large breeding programs it is often impractical, time-consuming, and expensive to have trained experts measure many plant features (Wu et al., 2019).
High-throughput plant phenotyping via computer vision (CV) and machine learning (ML) can be used to address a diversity of phenotyping challenges in plant sciences, including increasing plant breeding efficiency and understanding the molecular underpinnings of traits of interest (Gaillard et al., 2020; Gehan et al., 2017; Shakoor et al., 2019). In the last decade, many high-throughput plant phenotyping methods have improved phenotyping by increasing the number of individuals phenotyped, providing novel quantitative phenotyping to previously qualitative approaches, increasing the speed at which features are measured, and reducing subjectivity, time, and labor (Yang et al., 2021). Deployment of high-throughput phenotyping applications can be split into collection, extraction, and modeling. Methods in each category have different barriers of entry based on cost, skill, generalizability, and sensitivity. Some examples of data collection systems include tower-based, gantry-based, ground mobile, low- and high-altitude aerial, and satellite-based systems (Jiang & Li, 2020). However, these methods can generate massive amounts of data, which must be efficiently stored, processed, and managed to maximize its utility (Coppens et al., 2017). Image data can vary by type such as digital, near-infrared, fluorescence, thermal, multi/hyperspectral, and 3D imaging (Yang et al., 2021). There are many methods used for data extraction, all of which must be developed for a specific data type. Data collection, extraction, and modeling must balance resources while maximizing data volume, accuracy, customizability, and ease of use. For example, PlantCV is an image analysis software that contains processing and normalization tools, leaf segmentation, morphometrics tools, and ML modules (Gehan et al., 2017). Machine learning and deep learning can be used to extract complex features from images, usually by training a model using a ground truth dataset. This model is applied to new images to extract relevant information. For example, these models have been for semantic segmentation of maize ears, Arabidopsis leaves, maize kernels, and rice foliar diseases (S. Chen et al., 2021; Hüther et al., 2020; Shakoor et al., 2019; Warman & Fowler, 2021; Yang et al., 2021). Although high-throughput plant phenotyping can address major phenotyping bottlenecks in plant science research, digital phenotyping comes with its own set of challenges that need to be addressed to obtain the desired information accurately and efficiently. Ultimately, these approaches can improve the speed and accuracy of phenotyping while reducing manual labor (Perez-Sanz et al., 2017; Shakoor et al., 2019).
Core Ideas
- EarCV enables high-throughput phenotyping for corn ear traits, yield components.
- Tip fill, taper, curvature, and color can be quantitatively measured.
- We analyzed 5,392 diverse inbred, early-stage hybrid, and commercial hybrid ears with EarCV.
- EarCV has demonstrable applications in quantitative genetics and breeding.
Originally developed by indigenous American peoples since pre-Columbian times, sweet corn is now consumed in all continents. Unlike field corn or popcorn, sweet corn is picked roughly 21 d after pollination while the water content is high. One of its main markets is the commercialization of fresh ears with little to no processing, resulting in unique ear requirements (Hallauer, 2000). Ear length, ear width, tip fill, taper, curvature, color, and kernel row number (KRN) are key sweet corn traits that determine marketable yield and may affect the selling price for the grower (USDA-Agricultural Marketing Service, 2021). Measuring ear length, ear width, and KRN is possible but time-consuming, labor-intensive, and prone to errors, which constrains most breeding programs in their ability to perform large scale evaluation for these traits. The interest in characterizing these traits in the breeding program of other types of corn is also growing, because they are component traits of total yield that generally have higher heritabilities than overall yield (Peng et al., 2011). Therefore, it is possible to characterize and select distinct lines with enhanced specific individual yield components before developing a combined commercial line (Miller et al., 2017). Using this principle, multiple groups have developed CV and ML approaches toward ear and kernel phenotyping (Brichet et al., 2017; Kienbaum et al., 2021; Liang et al., 2016; Makanza et al., 2018; Miller et al., 2017; Warman et al., 2021). These tools have demonstrable advantages compared with manual phenotyping. However, the deployment of some of these tools is not feasible in our use-case due to material expenses and/or inflexibility in the image capturing requirements. In addition, none of these available tools comprehensively cover feature extraction for phenotypes relevant to sweet corn. We did not explore the use of tools with large initial costs such as drones or hyperspectral sensors, which are expensive and have significant data-processing bottlenecks (Shi et al., 2016). We avoided supervised ML methods because these require a large initial investment in labeling data (Kienbaum et al., 2021). We sought to develop a phenotyping tool able to make comparisons across images and environments and focused on ear traits that are relevant for fresh market sweet corn breeding. Our goal was to develop a tool with a low barrier of entry in terms of cost and imaging requirements, while balancing ease-of-use, generalizability, and sensitivity.
Here we report the implementation of a newly developed package EarCV, which uses the open-source CV library, OpenCV, and Python to automatically measure whole-ear traits for any number of nonoverlapping ears in a single image regardless of background and ear orientation. Our objectives were to (a) develop a CV-based approach for efficient, reproducible, accurate, and objective whole-ear phenotyping; (b) validate package performance against manually acquired data; and (c) probe EarCV's application in a public sweet corn plant breeding program. First, we tested EarCV's performance in variable lighting and background conditions using a common set of sweet corn ears. Then, we validated EarCV-derived length and width estimates against ground-truth measurements. We justified complex feature extraction methods for color, tip fill and curvature by exploring the resulting distributions of an inbred sweet corn ear dataset. We tested ML predictive ability on KRN. Lastly, we deployed the tool in applied breeding by using it to evaluate commercial hybrid sweet corn cultivars and early-stage hybrids. This algorithm is open-source and can be used by the public to accelerate their own phenotyping efforts.
2 MATERIALS AND METHODS
2.1 Image collection
A Nikon model D750 was used with a Nikon DX lens, 29.87 cm aperture, 52 mm diameter, and ISO 1200 fixed to the ceiling using a double-jointed ball-head clamp. The clamp allows the camera to be fixed over a table with a uniformly colored background. We linked the camera to a phone via wireless connection for rapid picture taking. Included in Dataset 1 and 2 images was a color checker for color normalization and/or a square piece of paper with known dimensions for pixel to centimeter or inches conversion. Images were saved in .raw and .jpeg formats with an image size of 2,586 × 3,235 pixels. An 8-megapixel iPhone 6 camera was used under standard, automatic focus, and exposure settings.
2.2 Datasets
We built three different sweet corn ear datasets: commercial hybrids (Dataset 1), early-stage hybrids (Dataset 2), and diverse inbreds (Dataset 3). In addition, to demonstrate the broad application of the algorithm, a collection of 182 publicly available photographs from the double haploid germplasm enhancement of maize (BGEM) inbred collection was downloaded and phenotyped using EarCV (Dataset 4) (Vanous et al., 2018).
The commercial hybrid trial (Dataset 1) consisted of a set of eight hybrid cultivars grown in two replicates in Hastings, FL, in the winter of 2020 using a randomized complete block design (Ribeiro da Silva et al., 2021). Each rep consisted of 16 blocks and each treatment was four 9 m rows of the same genotype, 1 m spacing between rows, and 15.50 cm spacing between plants. These blocks repeated twice per genotype per rep for a total of 16 blocks. Only the middle two rows per block were harvested at 24–28 days after antithesis. Five ears from each subplot, for a total of 247 hybrids, were phenotyped by hand for width, USDA fancy grade, and KRN. A subset of 120 were phenotyped for length. Length was measured using a traditional ruler, and the width of the middle of each ear was obtained using a digital caliper. KRN was counted by hand. Quality grade was assigned based on the published USDA guidelines (USDA-Agricultural Marketing Service, 2021). This dataset was phenotyped without drying to simulate grading for quality and fresh market consumption. An analysis of variance was used to determine if there were any significant differences in features, ear weights, and yield estimates between cultivars with a significance threshold of p < .05 in R followed by mean separation using Tukey's honestly significant difference test with a confidence level of 0.95. To test how lighting, background color, and camera can affect the performance of EarCV, a subset of six hybrid ears were photographed in 21 different lighting and background conditions using two cameras, a Nikon DSLR (digital single-lens reflex camera) and an iPhone 6 camera.
The diverse inbred dataset (Dataset 3) is a subset of a recently established sweet corn inbred diversity population, that expanded on previously developed sweet corn populations (Baseggio et al., 2019). The population consists of 693 diverse genotypes, most of which are inbred lines representing the sweet corn diversity in the United States (Hu et al., 2021). Diverse inbreds were grown in Citra, FL, in the spring of 2019 in rows of 12 plants 18 cm apart with 1 m row spacing. Two-hundred fifty distinct inbred ears were dried for at least 3 wk after being harvested about 60 d after pollination. Ears were manually phenotyped for length and width and photographed. Length was measured using a traditional ruler, and the width of the middle of each ear was obtained using a digital caliper.
2.3 Optional features: QR Code, color normalization, and pixels per metric
We developed three optional features to enable robust comparisons across images in a breeding context. The quick-response (QR) code extractor module is an implementation of the pyzbar.decode function to analyze QR codes printed on white envelope stickers (Natural History Museum of London, 2016/2020). This module scans the image for the QR code, extracts the information from the code, and masks the sticker for future analysis (Figure 2a). Pixels per metric is determined via a filtering approach based on size and aspect ratio to segment out the single largest uniform-colored square from the image and measures its side length. The function takes a numerical argument that specifies the length units (e.g., centimeters or inches) then translates the square's pixel length to unit length. This pixels per metric ratio is used downstream for ear feature extraction (Figure 2b).
2.4 Finding, cleaning up, and orienting ears
Morphological opening is performed to remove noise until the ∆convexity is >4% or iterates 10 times. A threshold is placed on the change in convexity such that each ear is cleaned up relative to its own convexity score and it caps at 10 iterations. Each iteration increases the strength of the opening morphological operation by increasing the opening kernel size in proportion to the iteration (e.g., if i = 3, then kernel is 3×3).
Lastly, the rotation function is optional and useful if the ears were photographed with the tips of different ears pointing in different directions in the same image. This simple function works by rotating all the ears such that the length is always longer than the width. Once all the ears are arranged vertically, they are broken into three equal portions. If the tip is wider than the base, then the object is rotated 180°. This approach is based on the assumption that ears have a wider base than tip, which may not hold true in very diverse germplasm or in fasciated ears. The final part of this module finds the smallest possible rectangle that will encompass the entire object. Each ear is segmented using these rectangular coordinates to generate and save the ear region of interest as a .png file.
2.5 Curvature, taper, shape descriptors, color, and tip fill
To analyze curvature, the ear is split into 20 regions along its length. For each region, EarCV calculates the standard deviation of the center of gravity along the x axis. The curvature of an ear is proportionate to the standard deviation along the x axis. Taper is calculated by measuring the object width within regions along the top half the ear. The more these widths vary, the higher the taper. Convexity and solidity are used to describe the shape of the ear. For a detailed description of all ear features, see Supplemental Table 1.
Length and width were extracted from the contour. Ear length and width measurements are approximated in three different ways: (a) by drawing the smallest possible box enclosing the contour and using its dimensions, (b) by finding the longest possible line along the length and width of the contour, and (c) by fitting an ellipse to the contour. Hereafter these methods are referred to as box dimensions, maximum distance, and ellipse fit, respectively. These three approaches were compared with ground truth measurements for the diverse inbred trial and the commercial hybrid trial. To compare which approach, out of the three yielded the best results, a multivariate model was fit with the CV methods as the response variable and the ground truth measurement as the predictor variable. Here the regression coefficients are equal to the correlation coefficients.
EarCV measures dominant colors in the red–green–blue (RGB), HSV (hue, saturation, value), and CIELAB (International Commission on Illumination) color spaces using a K-means approach. Pixels are clustered in two groups, foreground (the ear excluding cob) and the background, and the average color of the foreground represents the dominant color in each ear.
2.6 KRN
-
chord, c = median distance bewtween peaks & radius, r = half the width of the ear &
-
-
-
(10)
The estimated KRN values were further classified into three general categories useful for breeding using a K-means clustering approach. The categories represent materials with low KRN at 12 or 14 rows, medium KRN at 16 rows, and high KRN at 18 or more rows. Classification rate was determined by calculating the ratio of correctly classified to incorrectly classified .
We sought to improve upon this method by leveraging information from other features generated by EarCV in addition to the predicted EarCV . To do this, we implemented a gradient boosting ML approach using XGBoost (T. Chen & Guestrin, 2016). In total, all 38 features generated by EarCV were used to train the model (Supplemental Table 1). Model classification rate was assessed across a 10-fold cross validation by calculating the ratio of correctly classified to incorrectly classified .
2.7 USDA Sweet corn grade standards
The USDA provides predefined standards for husked sweet corn that correspond to grades “Fancy”, “No.1”, and “Off-grade” (Table 1). These are based on length, tip fill, and damage or decay. A simple feature was implemented that scores length and tip fill based on these predefined standards (USDA-Agricultural Marketing Service, 2021).
Grade | Length | Tip fill | |
---|---|---|---|
inch | cm | % | |
Fancy | 5 ≥ | 12.7 ≥ | 87.5 ≥ |
No. 1 | 5 > x > 4 | 12.7 > x > 10.16 | 87.5 > x > 79.2 |
Off-grade | ≤4 | ≤10.16 | ≤79.2 |
3 RESULTS
3.1 Data and general workflow
We developed an open-source package to automate measurements of sweet corn ear quality and yield component traits. Basic imaging considerations are as follows: (a) ears should have the shanks removed before photographing; (b) photographs can be taken with any device, at a roughly perpendicular angle from the ears; and (c) images should not be severely under or overexposed (Supplemental Figure 2) or have nonuniform lighting (Supplemental Figure 3). Our command-line, Python-based package is available for public use in the github repo: https://github.com/Resende-Lab/EarCV. Image analysis can be done with default settings in a single command. However, if more control is desired for a specific function, each function can run with custom settings. Our workflow analyzed as many nontouching ears as fit within frame, segmented ears independent of background color and texture, assuming high contrast against ears (e.g., black cloth, white poster board, grass, dirt). We included optional features to enable high-throughput analysis of images for breeding: A QR code printed on a white background was used to automatically attach field information to ears in the image. A color checker passport was used for color correction and as a size reference. In the absence of a color checker passport, a square solid-colored piece of paper with known dimensions was used as a size reference.
Four datasets comprising diverse phenotypes for traits of interest were evaluated: (a) a commercial hybrid trial (267 ears, 8 cultivars, Dataset 1) (Ribeiro da Silva et al., 2021), (b) early-stage hybrid trial (2,780 ears, 531 genotypes, Dataset 2); (c) diverse inbred sweet corn panel of 250 ears with no genotypic replication (Dataset 3) (Hu et al., 2021); and (d) the BGEM collection (1,408 ears, 182 genotypes, Dataset 4) (Vanous et al., 2018). These datasets capture a large phenotypic range to test the flexibility of the algorithm. We found EarCV implementation to be more efficient than manual phenotyping of our traits of interest. In our case, it takes at least 25 s to manually measure the length, width, and KRN of a single ear. In comparison, it takes about as much time to photograph one image with 20 ears in it. A basic personal computer can extract length, width, curvature, taper, color, and tip fill from > 3,000 ears overnight at about 10 s per ear.
3.2 Image analysis pipeline
A gray scale image is a matrix of values that range from 0 to 255. A color image is a compilation of three gray scale matrices of identical dimensions (e.g., red, green, and blue channels produce an RGB color image). At its core, EarCV uses morphological transformations, color-space transformations (e.g., RGB to HSV), to filter out and describe (e.g., area, aspect ratio, solidity) regions of interest from an image matrix. EarCV integrates image and signal processing functions from PlantCV, OpenCV, numpy, scipy, and pyzbar—all open source libraries in Python (Bradski, 2000; Gehan et al., 2017; Harris et al., 2020; Natural History Museum of London, 2016/2020; Virtanen et al., 2020). The image analysis process is partitioned into QR code extraction, color normalization, pixels per metric calculation, ear segmentation, and feature extraction modules (Figure 1). After feature extraction, all measurements are normalized against the pixels per metric calculation to determine their unit length values. Each module has default settings and is customizable to fit the user's specific needs. Each module creates a proof and saves results in a database so the user can monitor performance.

3.3 Color normalization
To test the performance of this algorithm, we performed color normalization on 12 different environments using a common color checker passport reference. The 12 environments were a combination of the following factors: photographs taken with Nikon D750 or an iPhone 6; the use of a light box or lack thereof; a black cloth, asphalt, soil, or grass background; inside lighting; outside shadowy lighting; and outside full sun lighting at different angles (Figure 2c). The light box was a wooden frame covered with polyester diffusion fabric and did not include any added light-source. The fabric helped diffuse location dependent ambient light. Color normalization effectively reduced the root mean squared difference in color between the reference and target images as measured by RMSt. The RMSt decreased by almost 80 relative units in every image photographed with the Nikon D750(Figure 2d). Color normalization underperformed when using a reference image taken with the Nikon to correct images taken with the iPhone and failed when there were uneven shadows cast on the color checker. RMSt only decreased by 5.03 in the image with uneven lighting cast on the color checker and by 60 or less with any image taking with an iPhone (Figure 2d).

3.4 Segmentation stability across environments
Ear segmentation based on default K-means background segmentation, size, aspect-ratio, and convexity thresholds works for ∼85% of the 5,392 ears photographed in this project. Failed segmentation was easily corrected with custom thresholding settings. Figure 3a shows the default segmentation performance for an image with 10 ears. To test the performance of our segmentation algorithm in variable background and lighting conditions, we segmented the same six ears in 12 environments and measured their area. A wide range of exposure settings were used to capture ears in these environments. Shutter speeds ranged from 1/30 to 1/12,500 and f-number ranged 11/5 to 160/10(Supplemental Table 4). Considering the camera, settings used can inform the functional range of image capture settings that gives a positive segmentation result. Overall, EarCV was able to segment the ears in all backgrounds tested and resulted in high correlations of ear area across environments (Figures 2c and 3d, Supplemental Table 4). The adaptive segmentation feature was able to segment ears that were very close or even touching (Figure 3c). The rotation feature used to align ears worked in every hybrid ear but failed in the case of inbreds ears that have a wider tip than base.

3.5 Ear features
3.5.1 Length & width
Following gray scale background segmentation and the removal of debris and silk features, regions of interest were converted into binary images for contour analysis to define the border of each ear. Length and width were extracted from the contour. Ground truth measurements were compared with the measurements derived from EarCV to analyze performance. Correlations were high for both inbreds and hybrids for all approaches with an R2 ranging from 0.88 to 0.99 for width and length, respectively (Figure 4a–d). Overall, length estimates were more accurate than width estimates. A least-squares means method was used to make pairwise comparisons for each method. These comparisons gave no significant differences at a p value threshold of 0.05 (Supplemental Table 2).

3.5.2 Color analysis
EarCV was used to extract color information in the RGB, HSV, and CIELAB color spaces—which can be useful as a breeding target and offers a quantitative approach to color measurement. The diverse inbred trial (Dataset 3) captured a wide range of colors within sweet corn from pale white to deep orange (Figure 5a). The BGEM collection (Dataset 4) also contained a wide range of colors in field corn (Figure 5b). Principal component analysis is used to depict color variation in 2D space (Supplemental Table 2). In both cases, the RGB color space effectively separated individuals based on color compared with HSV and CIELAB color spaces

3.6 Curvature, taper, and tip fill
Curvature, taper, and tip fill were features that other published tools cannot capture and are key yield component traits in fresh market sweet corn. These feature extraction methods were not validated using ground truth measurements because they are difficult to measure quantitatively. Instead, we used our datasets to survey the performance of these feature extractors by examining the resulting distributions. The extremes of taper, curvature, and tip fill distributions in our diverse inbred dataset (Dataset 3) correspond to expected phenotypes. For example, a high taper score corresponds to a very pointy ear, a low curvature score corresponds to a straight ear, and a tip fill score close to one corresponds to well-filled ears (Figure 6, Supplemental Figure 1).

3.7 Classifying KRN
To test our ability to predict KRN, we first scored the KRN of 250 ears by hand and then estimated KRN values for those same ears using EarCV (EarCV ). The ground truth KRN values were assigned to low, medium, and high categories. We found the classification rate between the estimated EarCV and the ground truth KRN to be 0.56(Figure 7b). We sought to improve upon this method by implementing a gradient boosting ML approach using XGBoost (T. Chen & Guestrin, 2016). The XGBoost model achieved a classification rate of 0.68(Figure 7b). The ML approach improved classification rate by 12%. Nonetheless, we think that the classification rate is still limited by the amount of data used in the training set and more improvements can be achieved. We consider limitations of this approach in the discussion section.

3.8 Applications in breeding
3.8.1 Early-stage hybrids evaluation
The features extracted from real breeding datasets using EarCV was used to show its usefulness in a breeding program. The early-stage hybrid trial (Dataset 2) was derived from a set of 155 inbreds crossed using a North Carolina II scheme resulting in a total of 531 crosses. One to three ears from each plot were photographed, and EarCV was used to extract features. To see how well EarCV captures genetic variation for sweet corn traits of interest, we calculated BLUPs in our experiment. Using a pedigree, broad sense heritability was calculated for each feature and hybrids were ranked based on the resulting BLUPs. Heritabilities are summarized in Table 2. BLUPs for ear width, average color intensity as described in the red channel, ear area, and tip fill are depicted in Figure 8A. Heritabilities for these quality yield component traits range from 0.01 to 0.66. These results indicate that EarCV may not robustly capture the genetic signal of color as explained by hue (h2 = 0.01). In addition, the results suggest that some traits may be highly dependent on environment such as curvature (h2 = 0.10)—where the physical position of ear emergence against the stem can greatly impact how the ear curves. Area, width, and RGB color descriptors had high heritabilities, whereas shape descriptors and curvature had low heritabilities. All heritabilities had a low standard error.
Trait | Broad sense heritability | SE |
---|---|---|
Krnl_Area | 0.65886 | 0.01926 |
Red | 0.53762 | 0.02458 |
Vol | 0.52383 | 0.02483 |
Blue | 0.4819 | 0.02454 |
B_chnnl | 0.47691 | 0.02576 |
Light | 0.46786 | 0.02546 |
Ear_Width | 0.44456 | 0.02438 |
Green | 0.43539 | 0.02576 |
A_chnnl | 0.42413 | 0.02495 |
Sat | 0.41662 | 0.02566 |
KRN_Grnd | 0.40081 | 0.02444 |
Ear_Area | 0.31566 | 0.02482 |
Convexity | 0.30391 | 0.02507 |
USDA_Grade_Len | 0.30268 | 0.02509 |
Tip_Fill | 0.30151 | 0.02519 |
Tip_Area | 0.30046 | 0.02474 |
Ear_Length | 0.29668 | 0.02475 |
Widths_Sdev | 0.27842 | 0.02482 |
Perimeter | 0.27729 | 0.02441 |
Taper_Convexity | 0.26302 | 0.02533 |
Taper | 0.25468 | 0.02475 |
Solidity | 0.20723 | 0.02483 |
USDA_Grade_Fill | 0.17208 | 0.02357 |
Median_Kernel_Width | 0.15606 | 0.02205 |
Curvature | 0.10605 | 0.0198 |
Hue | 0.01536 | 0.00809 |
- Note. All colors (blue, red, green, hue, saturation, volume, light, a channel, b channel), pixel intensity value of the kernel area; Bottom_Area, area of the bottom of the ear not including kernels; Bottom_Fill, ratio of bottom cob area over total ear area; Cents_Sdev, SD of the midpoint of 20 evenly-spaced slices along the length of the ear; Convexity, ratio of ear perimeter over convex hull perimeter; Convexity_polyDP, ratio of smoothed ear perimeter over convex hull perimeter; Ear_Area, area of the entire ear; Ear_Box_Area, area of the smallest bounding box containing the ear; Ear_Box_Length, length of the smallest bounding box containing the ear; Ear_Box_Width, width of the smallest bounding box containing the ear; Ear Number, ears enumerated from left to right; Krnl_Area, area of the kernel portion of the ear; Krnl_Fill, ratio of kernel area over total ear area; Krnl_Convexity, ratio of ear perimeter over convex hull perimeter containing the kernels; Kernel_Length, length of the smallest bounding box containing the kernels; Max_Width, width as measured at the widest part of the ear; Perimeter, perimeter of the ear contour; Solidity, ratio of ear area over the convex hull area; Taper, SD of the 10 slices along the top half of the ear; Taper_Convexity, ratio of perimeter over convex hull perimeter of the top half of the ear; Taper_Convexity_polyDP, ratio of smoothed perimeter over convex hull perimeter of the top half of the ear; Taper_Solidity, ratio of area over the convex hull area of the top half of the ear; Tip_Area, area of the cob tip not including kernels; Tip_Fill, ratio of tip cob area over total ear area; Widths_Sdev, SD of the width of 20 evenly spaced slices along the length of the ear.

3.8.2 Commercial hybrids evaluation
Eight commercial hybrid cultivars (Dataset 1) were evaluated with EarCV and conventional yield measurements. Harvest yields among the eight cultivars ranged from 780 to 1,079 crates ha–1 (assuming 48 ears/crate), and the difference between the lowest yielding cultivar, Cultivar 3 was significantly different from highest yielding Cultivar 7. There was no significant difference in unhusked weight among the top seven cultivars (data not shown). As in the unhusked weight experiment, cultivars did not have a significant effect on most EarCV features (Supplemental Table 5). However, there were significant differences between cultivars for width, tip area, tip fill and green channel intensities. For example, Cultivar 3 was significantly wider than 2 and 7 (Figure 8b). In addition, features from EarCV were correlated to yield data from the same experiment after averaging by commercial cultivar. Ear length, ear area, curvature, and tip fill all had correlations to yield between 0.47 and 0.7 (Figure 8c).
4 CONCLUSIONS AND FUTURE DIRECTIONS
4.1 Algorithm implementation and segmentation performance
Overall, EarCV is an open-source photometry tool to automate extraction of features specifically relevant in fresh-market sweet corn ears. EarCV can phenotype any number of nontouching ears without shanks, dried or fresh, in any orientation, in a wide range of backgrounds and lighting conditions. For example, the color normalization feature allows the user to photograph ears throughout the day while controlling for variation in lighting. The use of a QR code scanner and a pixel to metric conversion allows the user to easily compare thousands of images even if the images were taken at variable distances from the objects of interest. Although the tool can run on default settings by only providing an input path, each major step can be customized to fit the user's need with custom flags. This tool may be reasonably applied to phenotype a variety of other objects with the right custom settings such as other fruits or vegetables. We evaluated three datasets (Dataset 1–3) comprising commercial hybrids, early-stage hybrids, and diverse inbreds for ear traits of interest as well as an independent dataset from the BGEM repository (Dataset 4). EarCV successfully segmented and cleaned up most ears across all datasets. Images with unusually small ears, large pieces of debris, or dark colored kernels required further processing. Nevertheless, custom background segmentation and filter settings successfully addressed the issues. The tool was designed with high-throughput application in mind by making it flexible and customizable while maintaining simplicity. We segmented the same six ears in different lighting and backgrounds. Ear area has highly correlated across lighting and background conditions showing that segmentation performance was adaptable. The iPhone 6 and DSLR cameras did not have any white-balance issues. Segmentation performance dropped off when light at low angles cast long shadows and when the color checker had nonuniform lighting due to shadows. These problems were much more common when using low-budget point-and-shoot cameras (Supplemental Figures 2 and 3). As cameras on hand-held devices improve, we expect dependency on DSLR cameras for high-quality imaging to shrink. Features on newer phone cameras could help simplify the segmentation of ears from background, such as the use of ‘portrait mode’ on newer iPhone models. How automatic blurring of the background from the subject with portrait mode affects the use of EarCV was not explored. However, based on the principle of K-means segmentation, we expect images taken with portrait mode to easily integrate into the use of EarCV: blurred pixels of similar background colors should cluster together and away from sharp pixels contained in the ear.
There are various segmentation solutions in the field of ML, including supervised and unsupervised methods. Unsupervised clustering methods for image segmentation can be categorized as hierarchical divisive, hierarchical agglomerative, partitional soft, and partitional hard (Mittal et al., 2021). Special consideration must be taken when budgeting the amount of work required to train supervised methods. We took a parsimonious approach in our segmentation method development by testing the simplest segmentation methods first. Segmentation is addressed within EarCV using the Otsu and the K-means methods of segmentation. The Otsu method is a robust, yet simple heuristics-based approach to binary image segmentation and has been used in CV for over 40 yr (Otsu, 1975). The K-means approach is simple, relatively fast compared with other clustering segmentation methods, and always converges (Mittal et al., 2021). K-means is limited by the fact that the number of clusters must be known a priori. However, because our images always seek to separate the background from the subject, we always used two clusters. Furthermore, another limitation of a K-means approach is sensitivity to color (Warman et al., 2021). We address this problem by performing color normalization using a color checker across images before performing ear segmentation. In short, we used simple and easy to deploy segmentation methods instead of more complicated methods. The performance of these methods in our testing of different background, lighting conditions, and diverse phenotypes did not merit exploring more complex segmentation algorithms. Nevertheless, a more robust progression of this tool would include the use of supervised ML methods. With the right training data, ML semantic segmentation models for ear, cob, kernel, shank, and ear damage could automate this tool such that the user would not need to optimize performance with custom flags. Current literature highlights the applicability of such methods applied for ear segmentation. For example DeepCOB (Kienbaum et al., 2021), uses a convolutional neural network-based deep learning model to automate segmentation of diverse ears against uniform colored backgrounds and extract basic morphological features. The bottleneck to this approach is the initial image labels for training and the computing power required to train the models with the benefit of automating segmentation for images downstream. It is important to consider that retraining of models may be required in cases where new input images are very different from the original images the algorithm was trained on. Overall, EarCV can tolerate variation in image background and lighting while still accurately segmenting ears using simple, proven segmentation methods.
4.2 Feature extraction and performance
Length, width, KRN, and quality grades were validated against manually derived ground-truth measurements in diverse, dried inbred ears and fresh commercial ears. We compared three methods for deriving length and width but found no significant difference in the resulting correlations against ground-truth measurements. However, based on average correlation performance for dried, fresh, inbred, and hybrid ears, we propose the maximum distance method as the most robust approach. We also extract complex features such as taper, curvature, tip fill, and color in the diverse inbred dataset. Based on the resulting trait distributions, we show that calculating the x axis centroid standard deviation along the length of the ear works well as a proxy for curvature. We were able to demonstrate that measuring the width standard deviation along the length of the top half of the ear works well as a measure of taper and tip fill can be quantitatively measured by segmenting the cob from the kernel region using an adaptive Otsu's method. Our principal component analysis of color shows that different channels contribute to different proportions of the observed variance with little commonality between the datasets (Supplemental Table 3). This suggests that each dataset should be analyzed independently for color. These differences could be explained by the variable conditions in which the datasets were photographed, as well as the fact that the BGEM collection (Dataset 4) is a field corn collection with mostly smooth field corn kernels, whereas the diverse inbred dataset (Dataset 3) is dried sweet corn with textured, wrinkly kernels. Regardless, in both cases working in the RGB space most effectively separates individuals based on color and ear segmentation worked for both datasets.
4.3 Applications in breeding
This high-throughput photogrammetric approach to phenotyping maize ears greatly increases efficiency without sacrificing accuracy in breeding programs. We demonstrated its application in breeding by evaluating an early-stage hybrid trial (Dataset 2). We apply a mixed linear model with pedigree information to the features extracted by EarCV to calculate trait heritability and rank genotypes based on BLUPs. Trait heritability for width and length fall within expected ranges; previous experiments in Citra, FL, using F4 sweet corn found broad-sense heritability between 0.09–0.24 for these traits (U. Kumar, unpublished data, 2018). Using EarCV, we quantitatively measured tip fill, curvature, and taper. We used these phenotypic data to calculate heritability for these traits. This analysis shows that EarCV can be used to capture heritable variation of ear quality yield component traits. Currently, breeders bundle these traits into an ear package score. This tool could be used to guide selections directly for each of these ear quality yield component traits instead of indirectly by selecting on these traits by using an ‘ear package’ quality score.
In addition, we used EarCV in a yield experiment to compare eight commercial hybrid cultivars. There was significant differences in the number of ears and crates per hectare (Ribeiro da Silva et al., 2021). Our approach found significant differences between cultivars in width, tip fill, and color as described by the red channel, among other traits. Similar to the yield weight and measures of quality on the same field experiment, there was no statistical differences for most other EarCV traits between cultivars (Ribeiro da Silva et al., 2021). The lack of significant differences may be due to the small sample size of commercial material whose ear traits have already been nearly fixed to match strict commercial production expectations. Regardless, this analysis shows that EarCV can be used to compare even similar commercial cultivars and score them quantitatively for traits that a routine yield experiment could not account for. Correlations of ear length, tip fill, ear area, and curvature to yield show that these features are positively correlated with yield, further defining these features as yield component traits. Because of its more efficient and quantitative approach to phenotyping, we expect that this tool will enable faster and more accurate selections for quality and yield components in fresh market sweet corn. For estimating KRN, our peak calling approach alone did not correlate with the ground truth KRN as well as other features. However, the implementation of the XGBoost approach made use of image features, statistical moments, and chord slices to improve our KRN predictive ability. This approach was limited to well-filled ears because the textured surface of dried sweet corn ears dilutes the kernel signal in the 1D array. Other sources of noise to the 1D signal that impaired our ability to call kernel boundaries was kernel abortion, row merging, and cases where kernels twisted along the length of the ear. Furthermore, our XGBoost model was limited by the small dataset and the uneven representation of KRN classes to predict. At this time, KRN predictions are exploratory and not recommended for deployment in routine phenotyping efforts.
4.4 Limitations and future work
This tool is built upon basic CV principles using common algorithms without the need to manually label thousands of datapoints or use expensive computing resources. Instead, EarCV uses basic descriptors such as area, aspect ratio, and solidity to segment out ears. For this reason, it cannot differentiate between an ear of corn and a similar object such as a banana using default settings. There are special cases where segmentation and feature extraction fail due to a lack of contrast between the regions of interest and the background. For example, segmentation can fail when ears do not contrast well against the background, and tip fill estimation can fail when cob color is similar to kernel color. In these cases, EarCV can be used with custom settings to circumvent segmentation issues. The algorithm does not segment individual kernels from each other, missing out on interesting yield component traits such as kernel shape descriptions that could improve KRN prediction or be used in yield prediction. The features extracted our KRN estimation methods may include error due to inherent texture of kernels, bicolored nature of kernels, and the fact that some ears have disorganized rows of kernels. Improved classification accuracies from this approach may be obtained with a larger training dataset and more even representation of materials across the different KRN categories.
EarCV can process ears with many diverse phenotypes in varying imaging conditions but has some limitations. For example, most of the time required to photograph ears is spent dehusking, removing silks, and detaching shanks. Several researchers have used infrared radiation to phenotype kernel chemical composition, kernel fungal contamination, and viability (Agelet et al., 2012; Oury et al., 2021; Wang et al., 2015; Yao et al., 2013). Using cameras to capture signals outside the visible spectrum has the potential to be used for phenotyping ears without removing husks. Infrared radiation methods have been developed to segment leaf veins from blade and to diagnose Anthracnose infection (Wang et al., 2015). Even more, near-infrared interactance spectroscopy has been used to detect internal necrosis in sweetpotatoes [Ipomoea batatas(L.) Lam.] (Kudenov et al., 2021). Infrared radiation or hyperspectral camera costs would present a bottleneck for large-scale deployment of this approach but has the potential to circumvent husking in the phenotyping process.
Alhough other maize ear phenotyping tools have been developed, none of these comprehensively address key phenotypes important in sweet corn (Brichet et al., 2017; Kienbaum et al., 2021; Liang et al., 2016; Makanza et al., 2018; Miller et al., 2017; Warman et al., 2021). Unlike field corn, sweet corn ear aesthetics such as color, curvature, taper, and tip fill are key determinants of hybrid performance and yield. Many of these methods focus on kernel phenotypes extracted from ears instead of focusing on whole-ear phenotypes. Furthermore, the deployment of some of these tools is not feasible in our use-case due to material expenses and/or inflexibility in the image capturing requirements. For example, Kienbaum et al. (2021) developed a state-of-the art semantic segmentation algorithm, deepCOB, using supervised ML methods. However, deepCOB does not capture taper or tip fill. In conclusion, EarCV and other CV and ML tools have the potential to address phenotyping as a major limitation toward genetic gain in plant breeding programs by increasing efficiency, number of samples observed, accuracy, and reproducibility.
ACKNOWLEDGMENTS
This work was supported by the National Institute of Food and Agriculture (SCRI 2018-51181-28419 and AFRI 2019–05410 to Marcio F. R. Resende, Jr.) and by the University of Florida Plant Breeding Graduate initiative. We also thank Nicole Beisel and Christina Finegan for assisting with final manuscript reviews.
AUTHOR CONTRIBUTIONS
Juan M. M. Gonzalez: Conceptualization; Data curation; Formal analysis; Methodology; Software; Validation; Writing – original draft. Nayanika Ghosh: Software.Vincent Colantonio: Data curation; Methodology; Software; Validation; Writing – review & editing. Francielly de Cássia Pereira: Formal analysis; Resources. Ricardo A. Pinto, Jr.: Resources. Chase Wasson: Resources. Kristen A Leach: Resources; Writing – review & editing. Marcio F. R. Resende, Jr: Conceptualization; Investigation; Methodology; Project administration; Supervision; Writing – review & editing.
CONFLICT OF INTEREST
The authors declare no conflict of interest.