ImageBreed: Open‐access plant breeding web–database for image‐based phenotyping

High‐throughput image‐phenotyping promises to accelerate the rate of genetic improvement in plant breeding through varietal selections informed by longitudinal growth models. To facilitate routine analyses and to drive breeding decisions, data integration is critical for effective management of germplasm, field experiment design, phenotyping, tissue sampling, genotyping, aerial‐phenotyping campaigns, image files, and geo‐spatial information. To this end, ImageBreed provides a software solution for end‐to‐end image‐based phenotyping integrated into the Breedbase plant breeding system. ImageBreed provides open‐source orthophotomosaic construction for raw image captures from standard color cameras and from the MicaSense RedEdge multispectral camera. Additionally, previously assembled orthophotomosaic raster images can be uploaded. Orthophotomosaic images allow for streamlined extraction of plot‐polygon images; however, ImageBreed plot‐polygon images can also be extracted directly from raw aerial image captures. A web–database interface streamlines assignment of plot‐polygon images from the orthophotomosaic or raw aerial‐captures to the field experiment design. Image processes spanning Fourier‐transform filtering, thresholding, and vegetation index masking are applied to reduce noise in extracted phenotypes. Summary‐statistic phenotypic values are extracted for every observed plot‐polygon image using a structured ontology. Plot‐polygon images are queryable against genotypic, phenotypic, and experimental design information for training of machine learning models and for driving breeding decisions in varietal advancement. ImageBreed is publicly available at http://imagebreed.org and built on the open‐source Breedbase system (https://github.com/solgenomics/sgn); all image‐processing scripts are available at https://github.com/solgenomics/DroneImageScripts and via a Docker image. All data deposited in http://imagebreed.org are publicly available for longitudinal model training and for driving future breeding decisions.


INTRODUCTION
The emergence of high-throughput aerial-phenotyping allows for near real-time evaluation of large numbers of genotypes, significantly impacting the field of plant phenomics (Ninomiya, Baret, & Cheng, 2019). Several studies have demonstrated the potential benefit of aerialphenotyping in plant breeding across a range of crop species (Thorp, Thompson, Harders, French, & Ward, 2018;Krause, et al. 2019). However, routine use for breeding decisions requires rapid data turnaround and image processing remains a significant bottleneck in breeding programs.
Unoccupied aerial vehicles (UAVs) and occupied aircraft mounted with multispectral and color-image cameras now frequently fly over agricultural field experiments. For convenience in plot-polygon extraction, the raw image captures can be assembled into an orthophotomosaic image (American Society of Civil Engineers, 1994;Shi et al., 2016). Open-source software, such as the OpenCV computer-vision library, and commercial products are available to perform the orthophotomosaic assembly; however, these methods are either highly technical or expensive (Bradski, 2000;Culjak, Abram, Pribanic, Dzapo, & Cifrek, 2012;Rublee, Rabaud, Konolige, & Bradski, 2011). Alternatively, ImageBreed performs orthophotomosaic assembly in a free, streamlined interface designed for phenotypic value extraction in the context of plant breeding. Furthermore, ImageBreed allows researchers to bypass orthophotomosaic assembly in order to extract plotpolygons directly from raw aerial image captures. Figure 1 provides an overview of the primary dashboard interface. A user guide for Breedbase is available at https://solgenomics. github.io/sgn/ with information regarding ImageBreed in the "Managing Image Data for Phenotyping" section in the supplemental information.

MATERIALS AND METHODS
ImageBreed is implemented within the codebase of Breedbase, which is an open-source web-database for managing germplasm, field experiment design, tissue sampling, phenotypic, and genotypic information; it is currently used by many plant breeding communities, including https: //cassavabase.org and https://solgenomics.net (Fernandez-Pozo, Menda, & Edwards, 2014). Breedbase employs the Chado database schema, the Natural Diversity (ND) module, and controlled vocabulary driven data models, allowing for a highly extensible system (Jung et al., 2011;Mungall, Emmert, & The FlyBase Consortium, 2007). Figure 2 presents the primary relational database schema. Ontologies are used for annotating phenotypic values and images within Breedbase (Shrestha et al., 2012). The supplemental information contains the image-phenotyping ontologies.
Breedbase, and subsequently ImageBreed, is written in Perl and connects to a PostgreSQL database; the software is open source (https://github.com/solgenomics/sgn). Development of Breedbase is ongoing and uses Github as a repository for tracking issues and new features. RESTful endpoints are constructed using the Catalyst web-framework and constitute the primary means of communication between Breedbase's JavaScript based web-interface and the database. The supplemental information contains the complete Image-Breed web-application programming interface (API) specification. All image processes are performed using Python scripts that interface with the OpenCV library (https:// github.com/opencv/opencv); MicaSense open-source scripts are used for orthophotomosaic assembly of RedEdge camera 5-band multispectral captures (https://github.com/micasense/ imageprocessing). The image-processing scripts are available with a Docker image for standalone use (https://github.com/ solgenomics/DroneImageScripts).

Field experiments
Prior to processing aerial images, the field experiment must first be saved in the database. The field experiment represents the design with which accessions are distributed among experimental field plots in a given field location. More information on field experiments is available in the Breedbase documentation (https://solgenomics.github.io/sgn/) in the "Managing Field Trials" section. Importantly, Breedbase supports the Plant Breeding API (BrAPI) standard for representing field experiments and associated metadata including phenotypic records (Selby et al., 2019). Once the field experiment is saved, aerial imaging campaigns can be uploaded for the field experiment and then plotpolygon images can be associated with their respective field experiment plots.

Image input
The starting point in ImageBreed is either (a) upload of raw image captures (.tiff) from standard color cameras or from the Micasense Rededge 5-band multispectral camera, or (b) upload of previously stitched raster images (.PNG, .JPG). When starting with raw image captures, a compressed (.zip) archive is uploaded; the maximum upload size is currently 2 GB. Depending on user input, the raw image captures are assembled into an orthophotomosaic and saved as a PNG raster image or the raw image captures are saved as PNG images and used directly for plot-polygon assignment. The use of previously stitched raster images facilitates use of any camera instrument, granted the image is smaller than 1 GB and meets the spectral category conditions defined below.
Imaging campaigns are saved and denoted by spectral type as either blue (450-520 nm), green (515-600 nm), red (600-690 nm), red-edge (690-750 nm), near-infrared (NIR) (780-3,000 nm), mid-infrared (MIR) (3,000-50,000 nm), farinfrared (FIR) (50,000-1,000,000 nm), or thermal-infrared (thermal IR) (9,000-14,000 nm) (Iso, 2007). If the image spectrum is not known precisely, options exist for black-andwhite or RGB color image. Each spectral category should only be used once in a given aerial imaging campaign; for example, when uploading imaging campaigns for the MicaSense Rededge 5-band camera, the images should be uniquely tagged as blue, green, red, red-edge, and near-infrared. In addition to assigning a spectral category, users must provide a name and a description for each uploaded imaging campaign. As the system matures, it is expected this information can be automatically extracted from the uploaded image headers to facilitate the upload process. Note that uploading several previously stitched orthophotomosaic image bands requires that the bands are perfectly superimposable and of equal size; orthophotomosaic stitching software generally accomplishes this by default.

Image processing
Noise in aerial-phenotyping imagery can stem from the camera and environmental conditions. To minimize camera and software-induced white noise, ImageBreed applies non-local means denoising (Buades, Coll, & Morel, 2011). In signal processing, the Fourier transform (FT) can remove noise Core Ideas • A pipeline for phenotype extraction from aerial images in agricultural experiments. • ImageBreed's web-database interface allows for data standardization and sharing. • Ontology based phenotype and image annotation allows for stable data representation. • Breedbase stores aerial images alongside experimental phenotypic and genotypic data. • Open-source orthophotomosaic construction for multispectral and standard imagery.
via high-pass filters (FT-HPF) or low-pass filters (FT-LPF) (Shaikh, Choudhry, & Wadhwani, 2016). ImageBreed performs FT-HPF to remove the lowest 20, 30, and 40 frequencies from images. ImageBreed performs magnitude thresholding on the high and low tails of the pixel distribution to remove noise pixels from images (Pandian, Ciulla, Mark Haacke, Jiang, & Ayaz, 2008). Furthermore, ImageBreed minimizes background soil pixels present in plot-polygon images by applying the following vegetation indices (VIs) as masks: normalized difference vegetation index (NDVI), triangular greenness index (TGI), visible atmospherically resistant index (VARI), and normalized difference red-edge vegetation index (NDRE) (Hunt et al., 2013;Robinson et al., 2017). Custom VIs, such as the soil-adjusted vegetation index (SAVI) (Huete, 1988), can be modularly added in the source code. All images are annotated using controlled vocabularies to account for the specific combination of processes applied. Application of denoising, thresholding, FT-HPF, and VI calculation for a 5,000 × 6,000 pixel orthophotomosaic image containing 500 experimental field plots takes approximately 2 min on an E5-2660v2 2.2 GHz workstation with a Quadro K5000 GPU and 256 GB RAM; however, plot-polygon processing can take approximately 30 min to complete. Plotpolygon processing crops and saves individual plot-images for all uploaded image bands to ensure fast queries during downstream analyses. The hardware specifications above are used to run the public ImageBreed instance http://imagebreed.org locally; however, a GPU is not required, and a minimum of 8 GB RAM ensures performant functionality. Disk storage space is required for the database, saved images, and archived files; storage requirements are dependent on the scale of the project.

Phenotype extraction
Extraction of phenotypic values from aerial imagery relies on assigning geo-spatial plot-polygons to each plot in the Shown is an image band tagged as near-infrared (780-3000 nm). All shown images originate from this uploaded image band. (6) Resulting images from standard process rotation and cropping of the relevant field from the uploaded image band are listed in display. (7) Resulting plot-polygon images from standard process are shown within collapsible sections. Highlighting 500 NIR plot-polygon images in an expanded window. Also shown are 500 thresholded NIR plot-polygon images. (8) Export phenotypic values from plot-polygon images for analyses and model training. Current and future ImageBreed features provide prediction of end-of-season traits with statistical and machine learning models field experiment design. The assigned plot-polygons are generally square-rectangles of equal size and shape for all experimental field plots. Software such as QGIS can create plot-polygon representations (Andrade-Sanchez et al., 2014; QGIS Development Team 2017); however, ImageBreed provides a standardized, manual interface for performing the plot-polygon assignment by clicking the four corners of the field experiment and minimally specifying the number of experimental plot grid rows and columns. ImageBreed makes no assumptions on whether the uploaded raw image captures or orthophotomosaic are georeferenced in an effort to be as flexible as possible. Future work will allow previously used plot-polygon templates to be rescaled and applied onto new aerial imaging campaigns through a point and click interface. ImageBreed provides point and click functions for copy-pasting plot-polygon templates and for removing specific polygons from consideration, allowing flexibility for templating even the most irregularly oriented field layouts. Plot-polygon images are then cropped out from the orthophotomosaic, annotated with the type of image and process applied, and stored as an entry in a relational database with association to the respective experimental plot in the field design and with association to the originating aerialphenotyping event. For a detailed walk-through please consult the "Managing Image Data for Phenotyping" section in the Breedbase documentation https://solgenomics.github.io/sgn or consult the supplemental information.
ImageBreed extracts and annotates phenotypic values from plot-polygon images using ontologies. Table 1 summarizes the extracted phenotypic traits and image processes applied; F I G U R E 2 Relational database schema of Breedbase as used by ImageBreed. The schema can be divided into the following subdivisions: genotyping, phenotyping, stocks, projects, and images, with connector tables from the Natural Diversity (ND) module of the Chado database. Note that image files are stored in the file system and only filenames and metadata are stored in the database furthermore, the supplemental information contains the complete image-phenotyping ontologies. Note that in Breedbase phenotypic annotations can be composed as combinations of ontologies; for instance, 'Mean Pixel Value|NIR (780-3,000 nm)|NIR Denoised Original Image|day 105' is a phenotypic annotation describing a mean pixel value from an original NIR image taken 105 d after planting. Composing annotations in this way ensures uniqueness and queriability of phenotypic values in the database. All plot-polygon images and phenotypic values stored within ImageBreed are public domain, allowing researchers to download and build upon existing datasets; currently http://imagebreed.org contains more than 500,000 plot-polygon images of maize (Zea mays L.) and barley (Hordeum vulgare L.). Aggregated data allows for training predictive longitudinal growth models across crop species and field environments, driving breeding decisions in varietal advancement (van Eeuwijk et al., 2019;Xavier, Hall, Hearst, Cherkauer, & Rainey, 2017). Current and future work is focused on developing pipelines for training models to predict end-of-season traits from plotlevel images, extracted phenotypes, and genetic relationships in Breedbase. Both longitudinal linear models and convolutional neural networks (CNNs) in Keras TensorFlow 2.0 are currently being developed.

SOFTWARE AVAILABILITY
ImageBreed is publicly available at http://imagebreed.org and built on the open-source Breedbase system (https://github. com/solgenomics/sgn); all image processing scripts are available at https://github.com/solgenomics/DroneImageScripts and via a Docker image. T A B L E 1 List of image processes applied to each plot-polygon image as well as a list of the extracted phenotypic values that ImageBreed currently supports. Phenotype annotations are composed as combinations of these ontologies; for instance, 'Mean Pixel Value|NIR (780-3,000 nm)|NIR Denoised Original Image|day 105' is a phenotypic annotation describing a mean pixel value from a denoised original NIR image taken 105 d after planting. In this way the annotations ensure uniqueness and queriability across the database. Consult the supplemental information for the complete ontologies used

Processes for phenotype extraction Methods and traits for phenotype extraction
Image processes applied Non-local means denoising