Images carried before the fire: The power, promise, and responsibility of latent phenotyping in plants

Understanding the genetic basis of plant traits requires comprehensive and quantitative descriptions of the phenotypic variation that exists within populations. Cameras and other sensors have made high‐throughput phenotyping possible, but image‐based phenotyping procedures involve a step where a researcher selects the traits to be measured. This feature selection step is inherently prone to human biases. Recently, a set of phenotyping approaches, which are referred to collectively as latent phenotyping techniques, have arisen in the literature. Latent phenotyping techniques isolate a latent source of variance in the data, such as stress or genotype, and then quantify the effect of this latent source of variance using latent variables without defining any conventional traits. In this review, we discuss the differences between, and challenges of, both traditional and latent phenotyping.


INTRODUCTION
Affordable cameras, data storage, and computational resources have drastically simplified the process of imaging plants in greenhouse or field settings (Chitwood & Topp, 2015;Gehan & Kellogg, 2017;Granier et al., 2006;Horgan, 2001;Ke et al., 2013;Li et al., 2014;Manacorda & Asurmendi, 2018;Merieux et al., 2021;Omori & Iwata, 1998;Rymaszewski et al., 2018;van der Heijden et al., 2012;Vasseur et al., 2018;Vőfély et al., 2019;York & Lobet, 2017). Although image acquisition is relatively straightforward, methods for processing image data lag in usability and generalizability. When a research group develops an image-based phenotyping tool to study particular traits within a specific organism, adoption of that tool by other groups can be hindered by differences in image acquisition, population type, parameter settings (e.g. thresholds), tissue specificity, and curated metadata (Granier et al., 2006;Lobet, 2017;van der Heijden et al., 2012;Zhou et al., 2018). As a result, publicly available image-based phenotyping tools can prove to be ad-hoc solutions with limited utility beyond the group that developed them (Lobet, 2017). A recent study, in which different groups were given the same image dataset and asked to test the same set of hypotheses, observed sizable variation in the statistical results obtained by each group (Botvinik-Nezer et al., 2020;Pieruschka & Schurr, 2019;Zhou et al., 2018). In this review, we discuss the potential benefits and shortcomings of latent phenotyping, a family of machine learning techniques that presents opportunities to overcome issues related to selection bias, consistency, and broader applicability often encountered with image or sensor-based phenotyping. Further, we discuss how current dogmatic thought about complex traits may obscure the ability to capture a holistic view of true variability, emphasizing the need for continued development of quantitative and mathematical approaches which will help to decipher genetic mechanisms underlying trait variation and to support crop improvement efforts. The ideas presented in this review share parallels with the allegory of Plato's cave: Prisoners trapped in a cave perceive shadows projected by flames onto the cave wall as reality, despite the truth that these shadows do not provide an accurate representation of the outside world. In the case of plant phenotyping, researchers view limited representations of the true plant phenotype via conventional metrics and/or latent space representations.
For those whose life work involves looking at plants, immediate identification of unique and useful variation needs only a quick glance; indeed, the 'breeder's eye' is often spoken of as an undefinable instinct for selecting superior varieties (Bernardo, 2001(Bernardo, , 2016(Bernardo, , 2020Dillmann & Guérin, 1998;Van Ginkel et al., 2008;Zhou et al., 2018). Though the plant phenotype is infinite (Chitwood & Topp, 2015) and can never be quantified in its entirety, images and sensor data that describe plants at high resolution provide an avenue to integrate multiple subtle signals and perspectives into a more comprehensive description of plant form and function (Das et al., 2015;Fahlgren, Feldman, et al., 2015;  2017; Herritt et al., 2020;Liu et al., 2020;Seethepalli et al., 2020;Tovar et al., 2018;Willis et al., 2017;York & Lobet, 2017). Over the last 20 years, researchers have applied methods for comprehensively describing plant form with varying strategies and input requirements. These strategies include supervised morphometric methods and more abstract concepts that rely upon unsupervised methods, e.g., latent space phenotyping, persistent homology (Table 1). Many of the strategies that the community has drawn upon originated in disparate fields of study, from psychology, e.g., generalized procrustes analysis and structural equation modeling (Bookstein, 1997;Chitwood, 2020;Gower, 1975;Klingenberg, 2015;Klingenberg & McIntyre, 1998;Rohlf, 1999;Slice & Stitzel, 2004) to computer vision and machine learning, e.g., eigenshapes, convolutional neural networks (CNNs) (Horgan, 2001;Horgan et al., 2001;Sirovich & Kirby, 1987;Turk & Pentland, 1991;Wang et al., 2020). In fact, early work in latent phenotyping closely mirrored approaches used for human face recognition, but instead to capture internal color pattern variation of carrot roots (Horgan, 2001;Horgan et al., 2001).
More recently, Chitwood & Topp, 2015 discussed the idea of a plant 'cryptotype': a function of discrete trait values that can be used to discriminate between a priori defined groups of individuals. Practically, developing a cryptotype function depends on training a supervised model with pre-existing features and labels. Ubbens et al. (2020) developed the supervised method Latent Space Phenotyping, which uses CNNs to differentiate between treatments based on longitudinal image data. Images are first embedded in a low-dimensional latent space, and their paths through that space over time are used to assess temporal change in plant responses to treatment. In a similar but unsupervised fashion, Gage et al. (2019) used both CNNs and principal component analysis (PCA) on flattened lidar point clouds to produce unsupervised latent phenotypes from a single time point evaluation of field-grown maize hybrids. Further, several studies have demonstrated that PCA is a powerful, unsupervised tool for quantifying plant organ shape, as demonstrated in carrot Turner et al., 2018), radish (Iwata et al., 2000(Iwata et al., , 2004, strawberry (Feldmann et al., 2020;Li et al., 2020;Zingaretti et al., 2021), apple (Lv et al., 2019;Migicovsky, Li, et al., 2017), grape (Chitwood et al., 2014;Yuan et al., 2016), maize Warman et al., 2021), and rice (Iwata et al., 2015;Suzuki & Hirata, 2011). Other unsupervised methods include persistent homology Li et al., 2017;Schlautman et al., 2020) and morphometric approaches (Chitwood & Otoni, 2017;Falk et al., 2020;Gupta et al., 2020;Iwata, 2011;Iwata et al., , 2015Klein et al., 2017;Manacorda & Asurmendi, 2018;Migicovsky, Li, et al., 2017;Sainin et al., 2016). Undoubtedly, the most common latent phenotype in plants is "shape", which is due, in part, to the access to inexpensive cameras and computers Tovar et al., 2018), the availability of user-friendly software (Bonhomme et al., 2014;, and the ability to visually interpret results (Chitwood et al., 2012;Horgan et al., 2001;Langlade et al., 2005;Manacorda & Asurmendi, 2018;Turner et al., 2018;Vőfély et al., 2019;Xu & Bassel, 2020). Though methods such as the ones mentioned above are common in plant phenotyping, latent phenotyping approaches are only beginning to gain traction for the description of other traits in plants.
The following sections provide more detail on conventional image-and sensor-based phenotyping methods, latent phenotyping techniques, and the technical considerations, benefits, and drawbacks of latent phenotyping methods compared to their conventional alternatives.

Conventional phenotyping from images and sensors
Historically, conventional image based phenotyping provided an efficient strategy to measure key traits that are easy to define and interpret (Anderson et al., 2020;Demir et al., 2018;Li et al., 2014;Poland & Nelson, 2011;Tao et al., 2020;Wang et al., 2018Wang et al., , 2019Zhou et al., 2020). These traits do not reflect the plant's full complexity, or even the complexity captured in an image. Like the shadows seen by prisoners in Plato's Cave, conventional phenotyping methods provide a limited representation of a plant's status. Methodologically, conventional image-based plant phenotyping aims to condense high-dimensional image data into tabular trait values which quantify some relevant characteristics of the individual (Das et al., 2015;Li et al., 2014;Seethepalli et al., 2020;Tabb & Medeiros, 2017;Walter et al., 2015;Zingaretti et al., 2021). For example, it is common to reduce an image of a plant into a single value specifying the number of vegetation pixels to estimate biomass, or to measure plant height as the number of contiguous pixels in the vertical axis. Often, a collection of these component traits is used to summarize some higher-level abstract concept about the individual and the population, such as the plant's level of stress, its general architecture, or other physiologically and agronomically relevant characteristics (York & Lobet, 2017;York, 2019). Table 1 presents some abstract physiological concepts and the corresponding image-based traits measured to proxy them.
The extraction of conventional trait values from images is a highly designed procedure, whether it is specified programmatically in the steps of an image processing pipeline (Das et al., 2015;Seethepalli et al., 2020;Zingaretti et al., 2021), or by human annotation of training data for machine learning techniques (Ishikawa et al., 2018;Stewart et al., 2019;Visa et al., 2014;Zhou et al., 2018). In fact, these human-designed traits are based on the observations of researchers as to how different abstract concepts manifest in the visual appearance of the plants. This human-designed nature means that these measurements are interpretable and concretely related to one, or more, of these observations.
User-defined traits are typically easy to interpret and have meaningful units associated with them (e.g., cm, g, mL, etc.), which is an understated advantage. Being able to interpret and communicate results is a major benefit of conventional phenotyping methodologies. Along these lines, standard units of measure are used across all fields of study and thus can be more effectively and clearly delivered to a diverse and nonexpert audience. While the importance or value of a plant height difference of 10 cm may be field specific, the meaning of the measurement will be clear to anyone familiar with a centimeter. Thanks to their interpretability, the use of conventional image-based traits has seen a boom in publication and application over recent years.
However, conventional phenotyping methods are potentially non-unique, not comprehensive, and compromised by human biases. It takes multiple, and potentially many, descriptors to accurately describe canonical shapes (i.e., unit circle and square), let alone more complex structures (e.g., root and shoot systems). For instance, a circle and a square can have identical aspect ratios (aspect ratio = width length ), so other descriptors that capture information about the curvature of the shape, the number of corners, or the variation in interior angles are needed to discriminate between these two shapes. In general, as the complexity of a biological phenomenon increases, a greater number of conventional traits are needed to convey its meaning accurately and precisely. Users therefore need to make educated decisions regarding which conventional traits are required to capture the most salient and biologically relevant information. Finally, many conventional traits are scale invariant ratios (e.g., aspect ratio and circularity = 4π ( area perimeter ) 2 ) and can result in multiple shapes returning the same value despite having visible differences in scale (Schindelin et al., 2012;Schneider et al., 2012). The user-driven decision of which traits to measure can introduce biases that, along with challenges in data collection, can lead to substantial ascertainment bias towards populations, germplasm, and variability. For example, in one strawberry population, fruit shape may be adequately described by the aspect ratio. However, aspect ratio may not be a discriminating descriptor in a second population. This could bias downstream analyses to think that the second population has no variation in shape, although this may be an artifact of the chosen perspective. When many of these characteristics are collected, it can be tempting to apply data reduction techniques to summarize plant phenotypes based on the variability in that collection of traits. However, the use of PCA or other common dimensionality reduction techniques on a matrix of individually selected, univariate attributes require caution. One concern is that a dimension reduction technique may uninten-tionally amplify biases in the data, leading to a more biased analysis. For example, length, width, aspect ratio, area, circularity, roundness = 4 ( ] ), solidity = area convex area , etc. are extracted individually and are ratios of other variables. If PCA is conducted on these traits, the analysis is exposed to area, length, and width multiple times (Schindelin et al., 2012). Whereas a change in length or width necessarily causes a change in aspect ratio, and vice versa, traits resulting from multivariate analyses do not share this limitation. As opposed to ratios like circularity and solidity, the harmonics of elliptical Fourier analyses, which are used to describe the outlines of 2D shapes, are not functions of each other and each harmonic can be modified independently without changing any others (Bonhomme et al., 2014;Chitwood & Otoni, 2017;Migicovsky, Li, et al., 2017;Visa et al., 2014). There are also more intricate scenarios involving arrays of correlated traits, such as leaf angle and inflorescence branching angle (Rice et al., 2020). In these examples, dimension reduction enhances the biological signal of "branching angle of lateral organs" while not explicitly double-or triple-counting the same trait. Similarly, researchers have implicitly posited that latent traits also include the plant ionome (the profile of ions in the plant, Fikas et al., 2019) the defense metabolome (the profile of specialized metabolites that participate in herbivory and pathogen responses, Katz et al., 2021), and the composition of seed fatty acids (Carlson et al., 2019).
In general, conventional image and sensor-based phenotyping strategies are highly interpretable, can reduce time and cost relative to their hand-measured counterparts, and enable opportunities for greater reproducibility. Conventional image analyses tends to follow the common theme of measuring many single compounds and then using PCA to capture global patterns of correlated effects in genome wide association studies or quantitative trait locus mapping. However, given the quantity of conventional traits produced from images and sensors, there is a greater potential for implicit biases during data acquisition or dimensionality reduction. Due to challenges with multicollinearity, using many conventional traits to describe a plant's phenotype provides diminishing returns and greater likelihood of bias or reduced generalizability. The following section elaborates on how latent phenotyping methods address some of these issues.

Latent phenotyping from images and sensors
Latent variable models explain patterns in observed data by modeling unobserved latent variables (Blei, 2014). The concept of latent variables (also called latent features or latent factors) is widespread in statistics, and statistical models incorporating latent variables have a rich set of applications in F I G U R E 1 A schematic of the latent phenotyping procedure. (a) The first step is to collect sensor data on a population of interest. (b) The second step is to process that data into a format that will be usable in downstream analyses and to make the target phenomenon more salient by reducing noise or highlighting specific features. (c) The third step is optional and depends on that study at hand. For traditional approaches, this step is skipped and for latent approaches this step further accentuates the phenomenon of interest within the data. (d) The fourth step is to quantify the traits, or latent traits, of interest from the cleaned data. (e) The final step is to try to either make biological sense of the quantified latent traits or to use them for a specific application, e.g., phenomic prediction. The preprocessing methodologies listed here are by no means exhaustive and represent disparate fields of research with many individual contributions that present their own assumptions and biases the biological sciences (Grotzinger et al., 2019;Runcie & Mukherjee, 2013;Tardieu et al., 2017;Vőfély et al., 2019). As an example, partial least squares regression (PLSR) is commonly applied to high-dimensional biological datasets that contain a high degree of multicollinearity, such as -omics data (Edlich-Muth et al., 2016;Kleinbaum et al., 2013;Lane et al., 2020;Rincent et al., 2018;Runcie & Crawford, 2019;Runcie & Mukherjee, 2013). The PLSR approach and related models, such as partial least squares discriminant analysis, accomplish dimensionality reduction by projecting the raw data onto lower-dimensional latent variables, which retains the ability to predict the response variables. This property of reducing the input data to a set of lower-dimensional features is shared by many familiar and commonly applied models, which include PLSR, partial least squares discriminant analysis, PCA, independent component analysis, latent Dirichlet allocation, linear discriminant analysis, factor analysis, and others. A nonexhaustive list of methodological options is provided for these types of analyses in Figure 1. The choices of which method to employ will vary substantially depending on the data acqui-sition strategy (e.g., red, gree, blue [RGB] or lidar images), data cleaning and curation choices (e.g., background or outlier removal), pre-processing to highlight the phenomena of interest (e.g., elliptical Fourier analysis or wavelet transform), quantification of the latent space (e.g., PCA), and targeted interpretation of the latent space (e.g., correlation or genetic analyses). Such representations also appear in machine learning models, such as autoencoders (Figure 2), where a latent space is derived from the lower-dimensional activations of a layer in the center of a neural network. In Figure 2, image data of Brassica napus plants were analyzed with a convolutional autoencoder neural network with two latent dimensions. The goal of an autoencoder, broadly, is to learn a lowerdimensional representation, two variables in this case, that can be used to approximately reconstruct the original input data.
In machine learning, discovering this transformation from the input space to another, typically lower-dimensional form which is more appropriate for a learning task is known broadly as 'representation learning ' (Bengio et al., 2013). In each case, the set of latent variables, or learned representation, is learned F I G U R E 2 A convolutional autoencoder trained on a dataset of mature Brassica napus plants. In this example, a high-dimensional image is encoded as a point in a two-dimensional latent space, and the decoder attempts to recreate the original image using only this information. The encoder needs to pack as much pertinent information as possible into these two dimensions to help the decoder produce accurate outputs from the raw data, as opposed to being defined beforehand. Thus, each of these features do not directly represent any one input feature, but rather encode some linear or nonlinear combination of the many input features into a single dimension. Just as the latent variables in these models correspond to some learned abstract representation of the processes generating the input data, latent phenotypes are defined by some learned abstract representation of the latent source of variance present in the dataset. This variability is typically measured across the full input. For example, in the case of images, the input data will correspond to the values of each individual pixel of each image, producing an extremely high-dimensional dataset where each color channel, e.g., RGB for most standard digital cameras, in each pixel adds an additional input dimension. Compared to conventional traits, which are based on measurements made by researchers, latent traits are inferred. Consequently, this means that latent phenotypes are reflective of patterns present in the data, not measurements made by a human observer, and are, ideally, less susceptible to userimposed biases and assumptions. While conventional phenotyping seeks to quantify a particular concept by summarizing it as a collection of one or more measurable traits, latent phenotyping instead seeks to indirectly quantify the latent source of variance in the high-dimensional raw data, whether that source is stress, genotype, or otherwise.
For the purposes of defining latent phenotyping, statistical methods which assume the presence of underlying latent factors that explain correlations in the observations, such as factor analysis, as well as factor models such as PCA, independent component analysis, etc., are included. For exam-ple, because environment (E) and management (M) can be controlled or, at least, accounted for using appropriate experimental designs and sampling strategies, genotype (G), interactions with genotype (i.e., GxE, GxM, GxExM), and irreducible error can be isolated as the sole sources of variance (Lane & Murray, 2021;Piepho et al., 2012;Schulz-Streeck et al., 2013). Thus, methods such as PCA which seek to summarize the major axes of variance are likely to find heritable components, which implies a latent effect from genotype. In cases where there are many sources of variance, a more computationally sophisticated latent phenotyping technique, e.g., latent space phenotyping (Ubbens et al., 2020), may be required to disentangle the sources of variance and uncover the genetic signal in more complex latent traits, e.g., whole plant response to abiotic stress.
As with conventional phenotyping approaches, the specific path a user takes to acquire latent phenotypes will depend on the structure and type of raw data, e.g., RGB, hyperspectral sensors, lidar, etc., and the underlying biology of the target phenomenon the user wishes to capture. Both the beauty and the danger of this framework is that there are nearly limitless approaches that a researcher could take (Backhaus et al.  Xu & Bassel, 2020;Zingaretti et al., 2021). The most common approaches to date are PCA and autoencoders, which both aim to compress data into a smaller number of axes based on a specific set of rules. For example, the use of PCA to describe internal carrot color variation (Horgan, 2001), strawberry fruit shape (Feldmann et al., 2020), and variation in carrot shape and biomass distributions (Turner et al., 2018); the use of PCA and autoencoders to describe plant architecture from heatmaps of maize plots (Gage et al., 2019); and the use of a latent embedding emitted by deep neural networks (a distinct multi-stage technique called 'latent space phenotyping') to capture the genetic variation attributable to drought stress in the genus Setaria and the visual drought response in Brassica napus (Ubbens et al., 2020). Importantly, these approaches can be deployed for both classification and prediction of target traits, as well as to capture biologically interesting aspects of plant growth and development.
Notable advantages of latent phenotyping include the ability to describe complex, multi-dimensional phenotypes or biological processes at the whole plant level and the reduction of direct human input in the measurement process. Most strikingly, latent traits are likely to be more independent of explicit human biases in the choice of what aspect of the data to measure, potentially allowing them to learn complex, nonlinear visual concepts instead of tying the measurement to a rigid, human-defined concept or pre-existing hypotheses (Baxter, 2020). This does not mean that latent phenotyping approaches are free from other types of user-imposed biases during model selection, implicit biases accrued during data collection, and over-generalized interpretations. Furthermore, compared to conventional phenotyping strategies, latent phenotyping techniques typically require less pre-processing, such as noise removal or segmentation, and can be applied to most types of input data. Although the adoption and active development of these techniques in plant phenotyping is relatively new, it seems likely that latent phenotyping will become increasingly important and widespread in the future due to these key advantages.

Technical considerations of latent phenotyping
As with all attempts to model data, many techniques can be used to model latent phenotypes, and awareness of differences among these techniques is critical to understand how specific methodologies will influence results and the interpretation of those results. For example, PCA is only capable of modeling linear aspects of the input variables. As a result, nonlinear patterns in the data will be represented differently when using PCA versus other techniques that are capable of modeling nonlinear patterns, such as autoencoders. A hazard with nonlinear models is that the embeddings they provide exist on a nonlinear manifold. This means that they cannot be interpreted in subsequent steps, such as linear regression, as if they were on a linear scale, e.g. a point which embeds on a particular axis in latent space at the value 4 is not, in any meaningful sense, "twice as dissimilar" to an embedding at 8 as it is to a different embedding at 6. In fact, any comparison of Euclidean distance between points in these encoded variables as a measure of similarity is not reasonable because of the non-linear nature of the analysis. In contrast to other nonlinear models, variational autoencoders explicitly learn an approximation to a normal distribution, and therefore impose this distribution over the latent variables (Kingma & Welling, 2013). Forcing the latent variables to be normally distributed allows for tractable sampling from the posterior distribution, which is the original intent behind the variational autoencoder, but the utility of this arbitrarily distributed latent variable for a downstream phenotyping task is questionable. For these reasons and others, care must be taken when selecting a model, and the researcher should be informed about the model's assumptions and its capabilities prior to implementation.
Using learned latent traits instead of conventional traits comes with the substantial drawback of losing the inherent interpretability that comes with conventional measurements. For example, consider a conventional phenotyping result that height was taller by an average of 36 mm for a breeding line. This makes intuitive sense, as plant height is something experienced by an observer. In contrast, a study which makes use of latent phenotyping requires extra steps towards interpretation. The raw trait values, such as component scores on principal components, are unitless scalar values which only have meaning if mapped back through a particular function, such as a trained decoder network. Various studies have attempted to visualize these features using eigenshapes, the visualization of eigenvectors, and saliency mapping (Feldmann et al., 2020;Gage et al., 2019;Migicovsky, Li, et al., 2017;Turner et al., 2018;Ubbens et al., 2020;Vőfély et al., 2019). This begs the user to define their questions more rigorously before using latent phenotyping approaches, which may distort an otherwise simple question regarding plant height, yield, or any other easily defined, conventional trait.
Data dependency is a critical issue that is not only affected by the amount of observed data, but the saliency of the biological phenomenon and the reliability of data acquisition strategy. Latent phenotyping approaches such as PCA and autoencoders will map noise when only noise is present. A possible consequence of this is that the resultant principal component scores and encodings will have low repeatability or heritability, which may have more to do with data acquisition or model misuse than any biologically relevant phenomenon (Feldmann et al., 2020;Turner et al., 2018). Latent phenotyping approaches may be able to reveal whether data is of poor quality, but they are not able to "fix" overly noisy, unreliable data. However, if data is collected in a careful, considered manner, these approaches can deliver powerful insights for both prediction and inference. Thus, the data acquisition and model selection strategies are just as relevant to latent phenotyping as they are to conventional phenotyping from images and sensors.
Standard operating procedures and common best practices for latent phenotyping approaches might be obscure to biological researchers, leading to naive applications of such tools. As an example, linear regression has a set of assumptions that make its use appropriate and interpretable given the assumptions are met. Similarly, PCA, PLSR, CNNs, and autoencoders have unique sets of assumptions that may render them useless if these underlying assumptions are violated. Latent phenotyping will not tell the user when a core assumption has been irreparably violated unless the user knows how to probe the data and the model with consideration of those assumptions. It is unlikely that every biological researcher with an interest in latent phenotyping will have the time or capacity to learn all the various assumptions and best practices themselves, thereby emphasizing the necessity to partner with experts in the application of these methods who can help to safeguard against nonsensical or biased results.

Practical considerations for latent phenotyping
When embarking on a study, researchers consider the purpose of the study, the context of that purpose, and whether there are meaningful takeaways for general crop improvement (prediction) and/or basic biology (inference). This dichotomy in the philosophy of science, while often conflated, can be useful for discussing how latent phenotyping approaches may be valuable in practice.
If the aim is prediction, it is sufficient to identify latent traits that are correlated with and can be used to predict a trait or whole-plant phenotype of interest, such as yield, biomass, or fruit quality, without the necessity of an underlying causal relationship between predictors and response using PLSR, or a similar approach, and may rely on tens, hundreds, or even thousands of predictor traits. This process, termed "phenomic prediction," uses phenotypic and/or latent traits as independent variables-analogous to markers in genomic prediction-to predict the value of some dependent variable (Edlich-Muth et al., 2016;Fernandes et al., 2018;Krause et al., 2018;Lane et al., 2020;Momen et al., 2019;Rincent et al., 2018;Sandhu et al., 2021). Phenomic prediction has both the potential to improve genomic prediction by incorporating more reliable, correlated traits and to potentially reduce the need for or extent of genotyping (Fernandes et al., 2018;Lane & Murray, 2021;Lane et al., 2020;Rincent et al., 2018). These approaches, like other high-dimensional regression approaches, are subject to overfitting and multicollinearity.
On the other hand, when the goal is to understand an underlying mechanism or sources of variability for a target trait, methods that rely on statistical associations have limited utility unless paired with the necessary biological context and knowledge of the underlying assumptions for a chosen specific analysis (Bernardo, 2001(Bernardo, , 2016(Bernardo, , 2020. This makes the application of latent phenotyping more challenging for basic biological questions than for prediction. Despite these challenges, latent phenotyping approaches have been used to successfully identify genetic loci linked with tomato fruit and leaves (Chitwood et al., 2012;Wang et al., 2019), rice grains (Iwata et al., 2015), the maize, rice, and soybean ionomes (Chu et al., 2016;Fikas et al., 2019;Liu et al., 2021), the Brassica defensive metabolome (D'Oria et al., 2021;Katz et al., 2021), oat seed fatty acid concentrations (Carlson et al., 2019), inflorescence development in maize and sorghum (Leiboff & Hake, 2019;Rice et al., 2020), carrot shoot and roots (Turner et al., 2018), response to drought in Setaria (Ubbens et al., 2020), and strawberry fruit shape (Nagamatsu et al., 2021). Some of these successful examples have relied on prior information from preceding univariate analyses to validate the loci discovered using latent phenotyping and to aid their interpretation of those newly discovered loci (Fikas et al., 2019;Katz et al., 2021;Ubbens et al., 2020). Notably, latent phenotyping approaches have yet to be applied to biotic stress resistance in plants (Table 1). It could be that reasonably powered studies require unreasonable sample sizes because of inconsistent symptom development (e.g., chlorosis, wilting, spotting), or that subtle changes in plant status during disease onset and development are of lesser interest than the binary difference between being alive or dead, which is simpler to assess.
Regardless of the experimental goals, providing intuitive presentation and interpretation of results is critical to maximize the impact of empirical findings. This is especially true when measurements are acquired using methods that rely on abstract concepts, as is the case for latent phenotypes. Strategies to relate latent phenotypes to observable quantities are often as straightforward as a statistical comparison between latent phenotypes and human-understandable features, e.g., classification based on a categorical scale or regression against a numerical scale (Casanova et al., 2017;Chitwood, 2020;Clark et al., 2015;Ishikawa et al., 2018;Kadir, 2015;Lane et al., 2020;Nascimento et al., 2021;Neto et al., 2006;Stewart et al., 2019;Zingaretti et al., 2021). However, it is also possible that latent phenotypes capture aspects governing a trait that are otherwise difficult to quantify, and therefore have complex, non-linear relationships with or cannot be directly related back to a familiar concept Migicovsky, Li, et al., 2017;Rice et al., 2020). When there is no clear observable counterpart for latent phenotypes, visual aids can fill the gap by providing human-interpretable representations of the data. Effective examples of this approach in practice include the presentation of PCA results alongside a morphospace of theoretical shapes based on eigenvectors (Bonhomme et al., 2014;Iwata, 2011;Iwata et al., , 2015Migicovsky, Li, et al., 2017;Miller et al., 2017) and the direct comparison of raw input images with reconstructions produced from an embedding (Figure 2).
Irrespective of whether measurements are easily observable or more abstract, perhaps the most critical set of questions a researcher can ask revolves around whether results meet certain standards of replicability and variability (Bernardo, 2020;Moehring et al., 2014). Similarly, as indicated, thought should also be given to the scope of diversity included in the study (e.g. diversity panels versus elite breeding populations) and the relevance to addressing the question(s) of interest, i.e. if there are useful or general takeaways when entries are more or less phenotypically diverse (Berro et al., 2019;Brandariz & Bernardo, 2019;Campbell et al., 2017).
A final consideration is if there are any large-scale uses for latent phenotyping results. From a logistical perspective, applied use will depend on whether a given method is scalable, cost-effective, and broadly applicable to diverse study systems or at least within a target system. By defining plant phenotypes through a more comprehensive set of traits, latent phenotyping approaches have the potential to change the way researchers practice breeding and deploy marketing strategies. For instance, these comprehensive traits could conceptually be used for variety identification and facilitating intellectual property (IP) applications and protections (e.g., plant patents, plant variety protections, and plant breeders rights) by including more quantitative descriptors, which have been shown to be predictive of both variety and species in multiple studies (Chitwood, 2020;Chitwood & Otoni, 2017;Ishikawa et al., 2018;Pereira et al., 2019). It is unlikely that latent or conventional traits will replace DNA forensics for the enforcement of IP laws, but they may aid plant specific IP applications by providing detailed and nuanced descriptions of plant properties. Regardless of downstream goals, latent phenotyping and other machine learning methods are undoubtedly an exciting area of research, but not a panacea in the pursuit of understanding phenotypic variation (Bernardo, 2016).

CONCLUSIONS
In recent years, the growth of plant phenomics, the availability of powerful open-source software, and the competency to acquire complex datasets has elicited the need for a thorough discussion of the state of the art. As with all statistical techniques and applications, careful consideration of the hypothesis, experimental procedure, and data structure is required to deploy latent phenotyping approaches. All the challenges inherent to traditional methods still exist; the data must be collected systematically and in such a way to capture relevant information for the questions and hypotheses related to the specific study. While sensors are much less subject to explicit biases than are humans, implicit and explicit biases may still be reflected in the quantification of sensor data and exacerbated by the naive application of sophisticated tools.
To support reproducibility and the adoption of best practices across labs with varying levels of experience, there is a need to report latent phenotyping analyses and protocols precisely and accurately in the literature, alongside relevant code repositories (Fanelli, 2018;Harris, 2018;Hutson, 2018;Miyakawa, 2020;Peng, 2015;Stoddart, 2016). Furthermore, it is important to note that not all research questions require a latent phenotyping approach, such as conventional disease traits, yield, and time-to-flowering, and that the naive application of these methods can lead to nonsensical results and biased interpretations. These methods rely on mathematical and statistical models that all have unique and diverse assumptions about the input, which may render certain methods irrelevant or inappropriate for a given data set. Successful application of latent phenotyping in practice will require communication among developers, end users, and stakeholders, to address important questions in crop improvement and basic biology True validation of these approaches in independent populations managed by independent groups with similar objectives is the final frontier for both conventional and latent phenotyping. While the promise of latent phenotyping is great, it is still critical to plan sampling procedures that match experimental goals to pay close attention to methodological assumptions, and to enable the repeatability of analyses and reproducibility of results.

A C K N O W L E D G M E N T S
The authors thank Daniel Chitwood and Brandon Hurr for discussing, reviewing, and improving the manuscript. The authors also thank the anonymous reviewers and Seth Murray for their comments, which greatly improved the quality of the manuscript. This material is based upon work supported by the NSF Postdoctoral Research Fellowship in Biology under Grant No. IOS-1906619. This research was undertaken thanks in part to funding from Canada First Research Excellence Fund.

C O N F L I C T S O F I N T E R E S T
The authors declare no conflicts of interest.