Journal list menu

Volume 107, Issue 2
Symposium: Statistical Concepts
Open Access

Nonlinear Regression Models and Applications in Agricultural Research

Sotirios V. Archontoulis

Dep. of Agronomy, Iowa State Univ., 1206 Agronomy Hall, Ames, IA, 50011

Search for more papers by this author
Fernando E. Miguez

Corresponding Author

E-mail address: femiguez@iastate.edu

Dep. of Agronomy, Iowa State Univ., 1206 Agronomy Hall, Ames, IA, 50011

Corresponding author (E-mail address: femiguez@iastate.edu).

Search for more papers by this author
First published: 01 March 2015
Citations: 23

All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher.

Supplemental material available online

Supplemental material available online

Available freely online through the author‐supported open access option.

Abstract

Nonlinear regression models are important tools because many crop and soil processes are better represented by nonlinear than linear models. Fitting nonlinear models is not a single‐step procedure but an involved process that requires careful examination of each individual step. Depending on the objective and the application domain, different priorities are set when fitting nonlinear models; these include obtaining acceptable parameter estimates and a good model fit while meeting standard assumptions of statistical models. We propose steps in fitting nonlinear models as described by a flow diagram and discuss each step separately providing examples and updates on procedures used. The following steps are considered: (i) choose candidate models, (ii) set starting values, (iii) fit models, (iv) check convergence and parameter estimates, (v) find the “best” model among competing models, (vi) check model assumptions (residual analysis), and (vii) calculate statistical descriptors and confidence intervals. The associated feedback mechanisms are also addressed (i.e., model variance homogeneity). In particular, we emphasize the first step (choose candidate models) by providing an extensive library of nonlinear functions (77 equations with the associated parameter meanings) and examples of typical applications in agriculture. We hope that this contribution will clarify some of the difficulties and confusion with the task of using nonlinear models.

In data analysis, we often ask the following questions: Which is the best model to describe our data? Which is the best statistical index to judge the goodness of fit? How do we choose among competing models? There are no simple answers to these questions. Here we attempt to provide agronomists with a general framework on how to approach these questions appropriately. Our specific objectives are: (i) to provide a succinct overview of nonlinear models and to develop a guideline to understand the family of functions used in agricultural applications; (ii) to indicate techniques to modify nonlinear models and how to cope with multiple nonlinear models; (iii) to discuss key methodological issues on parameter estimation, model performance, and comparison; and (iv) to demonstrate step‐by‐step analysis of experimental data using a nonlinear regression model. The structure follows the flow diagram in Fig. 1. We start with the definition of nonlinear regression models and discuss their main advantages and disadvantages. Then we present 77 nonlinear functions (including those in supplemental tables) with references to applications in agriculture. We offer an updated overview of methodologies to fit models, choose starting values, assess goodness of fit, select the best models, and evaluate residuals. Finally, we reanalyze experimental data on biomass growth with time (Danalatos et al., 2009).

image

Suggested work flow in the nonlinear regression analysis. Thick arrows indicate major steps, thin arrows indicate substeps, and dashed arrows indicate feedback in nonlinear regression. The shaded part is optional and can be ignored in simple cases. (Abbreviations: GLMs, generalized linear models; LRT, likelihood ratio test; AIC, Akaike information criterion; BIC, Bayesian information criterion.)

NONLINEAR REGRESSION MODELS

Definition

In general, statistical models used in agricultural applications can be described with the following notation:
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0001
where y is the response variable, f is the function or model, x are the inputs, θ denotes the parameters to be estimated, and ε is the error. Each parameter can be evaluated for whether it is linear or not: if the second derivative of the function with respect to a parameter is not equal to zero, then the parameter is nonlinear. Thus a given function (f) can have a mix of linear and nonlinear parameters.

Why Should We Use Nonlinear Models?

The main advantages of nonlinear models are parsimony, interpretability, and prediction (Bates and Watts, 2007). In general, nonlinear models are capable of accommodating a vast variety of mean functions, although each individual nonlinear model can be less flexible than linear models (i.e., polynomials) in terms of the variety of data they can describe; however, nonlinear models appropriate for a given application can be more parsimonious (i.e., there will be fewer parameters involved) and more easily interpretable. Interpretability comes from the fact that the parameters can be associated with a biologically meaningful process. For example, one of the most widely used nonlinear models is the logistic equation (Eq. [2.1] in Table 1). This model describes the pervasive S‐shaped growth curve. The parameters have a clear meaning (see Table 1) and units associated with their definition. The asymptotic parameter (Yasym) has units equal to the response variable (Y), the inflection point (tm) has units equal to the independent variable (t), and the parameter that determines the steepness of the curves (k) has units equal to t. This last parameter can be interpreted as the time (when t is time) that it takes to move from the inflection point to approximately 0.73 of the asymptotic value. A competing polynomial model used to describe the same data would have the disadvantages that more parameters would be needed (more than just three) and that the parameters would not be easily interpretable (Pinheiro and Bates, 2000). For example, what would be the interpretation of the parameters in a five‐degree polynomial?

Table 1. Nonlinear regression models. For example fits, see supplemental figures.
Eq. Name and/or reference Form Parameter definition
Group I—Exponential
[1.1] Exponential decay Y = Yoexp(–kt) Y is the response variable (e.g., soil organic matter), t is the explanatory variable (e.g., time), Yo is the initial or the maximum Y value, k is a rate constant that determines the steepness of the curve
[1.2] Exponential gives rise to maximum Y = Yo[1 – exp(–kt)]
Group II—Sigmoid functions
[2.1] Logistic (Verhulst, 1838) Y = Yasym/{1 + exp[–k(ttm)]} Y is the response variable (e.g., biomass), t is the explanatory variable (e.g., time), Yasym or Ymax is the asymptotic or the maximum Y value, respectively, tm is the inflection point at which the growth rate is maximized, k controls the steepness of the curve, v deals with the asymmetric growth (if v = 1, then Richards' equation becomes logistic), a and b are parameters that determine the shape of the curve, te is the time when Y = Yasym, tc is the critical time for a switch‐off to occur (e.g., critical photoperiod), n is a parameter that determines the sharpness of the response
[2.2] Richards (1959) Y = Yasym/{1 + v exp[–k(ttm)]}1/v
[2.3] Gompertz (1825) Y = Yasymexp{–exp[–k(ttm)]}
[2.4] Weibull (1951) Y = Yasym[1 – exp(–atb)]
[2.5] Beta (Yin et al., 2003a) urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0002
[2.6]§ Hill (switch‐off) function Y = tcn/(tcn + tn)
Group III—Photosynthesis
[3.1] Blackman (1905) Y = min(Yasym,aI) – Rd Y is the response variable (net photosynthesis), I is the explanatory variable (irradiance), Yasym is the asymptotic Y value, a is the initial slope of the curve at low I levels, Rd is the dark respiration, θ is a dimensionless curvature parameter (when θ = 1, Eq. [3.4] is equivalent to Eq. [3.1], and when θ → 0, Eq. [3.4] is equivalent to Eq. [3.3])
[3.2] Asymptotic exponential Y = Yasym[1 – exp(–aI/Yasym)] – Rd
[3.3] Rectangular hyperbola Y = aIYasym/(Yasym + aI) – Rd
[3.4]# Nonrectangular hyperbola urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0003
[3.5] Modified logistic (Sinclair and Horie, 1989) Y = Yasym(2/{1 + exp[–k(NNmin)]} – 1) Y is the response variable (light‐saturated net photosynthesis), N is the explanatory variable (leaf N), Yasym is the asymptotic Y value, k determines the curvature of the curve, Nmin is the N value at or below which Y = 0
[3.6]†† Farquhar et al. (1980) urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0004 Y is the response variable (net photosynthesis), Ci is the explanatory variable (intercellular CO2 concentration), Vcmax is the maximum carboxylation capacity, Γ* is the CO2 compensation point in the absence of Rd, Kmc and Kmo are Michaelis–Menten coefficients of Rubisco for CO2 and O2, respectively, O is the partial pressure of O2 (= 21 kPa), J is the photosystem II electron transport rate, Rday is the dark respiration occurring in the light
Group IV—Temperature dependencies
[4.1] van't Hoff (1898) (known as the Q10 function) urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0005 Y is the response variable (e.g. respiration), T is the explanatory variable (temperature), Tref is a reference temperature at which Y = 1, Q10 is the factor by which the rate of a process (respiration) increases for each 10°C temperature increase, E is the activation energy that determines the increase in temperature response, R is the universal gas constant (= 8.314 J K−1 mol−1), D is the deactivation energy that determines the decrease in the temperature response, S is the entropy term that determines the transition state of the curve, Eo is an activation‐energy‐like parameter that is temperature adjusted, Tx is a fitted temperature parameter (in K), Tmin is the base or minimum temperature for Y = 0
[4.2] Arrhenius (1889) Y = exp{E/R[1/(Tref + 273) – 1/(T + 273)]}
[4.3]‡‡ Modified Arrhenius urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0006
[4.4] Lloyd and Taylor (1994) Y = exp{Eo[1/(Tref + 273 – Tx) – 1/(T + 273 – Tx)]}
[4.5] Ratkowsky et al. (1982) Y = (TTmin)2/(TrefTmin)2
Group V—Peak or bell‐shaped curves
[5.1] Beta (Yin et al. 1995) urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0007 Y is the response variable (e.g., rate of development), T is the explanatory variable (temperature), To is the optimum temperature for maximum Y, Tb is the base or minimum temperature for Y = 0, Tc is the ceiling or maximum temperature for Y = 0, c is a curvature parameter (default c = 1)
[5.2]§§ Bell curve Y = Yasymexp[a(XXo)2 + b(XXo)3] Y is the response variable, X is the explanatory variable, Yasym is the asymptotic maximum Y value, Xo is the position of the center of the peak (Yasym), a (default = 0.5 for the Gaussian function), and b are coefficients controlling the width of the bell
[5.3] Gaussian function Y = Yasymexp{–0.5[(XXo)/b]2}
Group VI—Other nonlinear equations
[6.1] Power Y = aXb Y is the response variable, X is the explanatory variable, a and b are parameters that define the shape of the curve and the magnitude of the Y value
[6.2] Modified hyperbola Y = aX/(1 + bX)
[6.3] Michaelis–Menten Y = μX/(X + Csat) Y is the response variable (e.g., denitrification rate), X is the explanatory variable—the substrate (e.g., NO3), μ is the rate constant, Csat is the half‐saturation constant
[6.4]¶¶ Rational function urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0008 Y is the response variable, X is the explanatory variable, a1 and a3 are parameters defining the magnitude of the Y value, a2 and a4 are parameters defining the shape of the curve (if a2 = a4 + 1, then the equation shows a near‐linear response; if a2 = a4, then the equation becomes a hyperbola; if a2 < a4, then the equation takes a bell shape; if a2 > a4 or a4 = 0, then the equation becomes exponential; if a2 = 0, then the equation becomes exponential decay
[6.5] Ricker curve Y = a1X exp(–a2X) Y is the response variable, X is the explanatory variable, a1 and a2 are parameters that control both the height and the width of the right skew of the “bell”

The final advantage of using nonlinear regression models is that their predictions tend to be more robust that competing polynomials, especially outside the range of observed data (i.e., extrapolation). Nonlinear regression models, however, come at a cost. Their main disadvantages are that they can be less flexible than competing linear models and that generally there is no analytical solution for estimating the parameters. The first point has as a consequence that the choice of model is crucial. It is tempting to then try a large library of functions and choose the model with the lowest error; however, it is almost always better to choose a model based on whether it has been used successfully in similar applications and has biologically meaningful parameters (e.g., Table 1).

The lack of an analytical solution has two practical consequences. First, a numerical method needs to be used to find estimates for the parameters, and this implies that convergence of the algorithm needs to be checked (Fig. 1). A lack of convergence often results from the second consideration, which is that these numerical methods require starting values. Choosing a model with biologically meaningful parameters makes the process of choosing starting values easier because the starting values can usually be easily determined from visual inspection of the data (see below).

TYPICAL NONLINEAR MODELS AND APPLICATION EXAMPLES

Choosing competing models for an application is not always a simple task. We have developed a reference table as a guideline to understand the family of functions used in agricultural applications. Table 1 presents 27 common nonlinear equations, and Supplementary Tables S2 to S6 supplement these with 45 additional equations. We classified the equations into six groups based on a combination of statistical form and use in the agricultural domain. All equations have been used in agricultural applications, and most of the parameters have an interpretable meaning (see supplementary figures also). The variety of equations presented in Table 1 reflects well the fact that one equation does not suit all processes.

Group I—Exponential

The exponential decay and exponential gives rise to maximum functions (Eq. [1.1] and [1.2], Table 1) find applications in a wide spectrum of soil and plant sciences. They are commonly used to describe light and N vertical distributions within plant canopies (Monsi and Saeki, 2005), N2O emission response to N fertilizer (e.g., Hoben et al., 2011), cumulative soil respiration (e.g., Gillis and Price 2011), photoperiodic sensitivity (e.g., Wang and Engel, 1998), temperature or moisture responses to nitrification (e.g., Ma and Shaffer, 2001), water infiltration rate (Horton, 1940), and first‐order kinetics. They are simple equations with one major unknown, the rate constant (k), which is also termed the extinction coefficient in crop physiology. The ratio ln(2)/k is of importance in soil science because it denotes the mean residence time (e.g., soil organic matter). Equation [1.1] provided the starting point to develop case‐specific nonlinear functions. Yin et al. (2000, 2003b) established a nonlinear function to describe leaf area index development as a function of canopy N content (Eq. [1.8] in Supplementary Table S1). Johnson et al. (2010) developed a flexible nonlinear function for the protein (N) vertical distribution within plant canopies (Eq. [1.9] in Supplementary Table S1). In soil science, Andren and Paustian (1987) and Gillis and Price (2011) extended Eq. [1.1] to better describe the decomposition of straw residue and biochar, respectively (see Supplementary Table S1). Lastly, Eq. [1.1] as well as expolinear functions (viz. Eq. [2.16] and [2.15] in Supplementary Table S2) were also applied to describe the initial parts of growth curves but not the entire growth curves because the growth profile often reaches an asymptotic value.

Group II—Sigmoid Curves

Sigmoid curves (mathematical functions having an S shape) are another important group of nonlinear models. These models are often applied to describe plant height, weight, leaf area index, or seed germination as a function of time, N application rate, herbicide dose, etc. (e.g., Gan et al., 1996; Miguez et al., 2008). Sigmoid equations are also used as 0–1 modifiers in process‐based models to incorporate moisture availability or soil pH, etc., effects on soil N transformation processes (e.g., McGechan and Wu, 2001) and also as a switch‐off function in studies assessing plant photoperiodic sensitivity (e.g., Amaducci et al., 2008). Table 1 presents common sigmoid functions and Supplementary Table S2 provides additional sigmoidal equations, providing increased flexibility (e.g., when maximum growth or the inflection point is achieved at the start or the end of growth period). Additional equations can be found in Zwietering et al. (1990), Zeide (1993), Leduc and Goelz (2009), and many statistical textbooks or software manuals (e.g., SigmaPlot, JMP, TableCurve).

In general, the suitability of a sigmoid equation to estimate maximum rate of increase or optimum x level for maximizing the y value is an important part of its function (Birch, 1999; Yin et al., 2003a). Each function has its advantages and disadvantages (for a discussion, see Birch, 1999; Yin et al., 2003a), and it is up to the researcher to select the most appropriate one to fit the experimental data. The logistic equation, Eq. [2.1], describes symmetric growth having an inflection point at half the final size. The Gompertz equation, Eq. [2.3], has an inflection point that is controlled by its asymptotic value and is at about one‐third (1/e = 0.3679), while others like the Richards or Weibull or beta have more flexibility in dealing with asymmetric growth (the inflection point can be at any x value).

Having a flexible inflection point is another important feature of a sigmoid curve. For that, Birch (1999), for example, modified the logistic equation (Eq. [2.1]) to deal with asymmetric growth by adding an extra shape parameter. When growth is known to decrease after a certain period of time, then the beta function (Eq. [2.5]) might be a better option (see supplementary figures). On the other hand, Eq. [2.5] might not accurately predict initial growth and, in cases when the initial phase is very important, different versions of the beta function should be used (see Eq. [2.11] in Supplementary Table S2 and example below). It is important to note that all sigmoid equations presented in Table 1 (except Eq. [2.4] and [2.5]) assume an initial Y value close to zero at time zero, which is reasonable in most cases, e.g., at planting, the biomass weight is very close to zero.

Group III—Photosynthesis

Photosynthesis is the most important biological process involved in plant growth, and its rate is influenced by irradiance, temperature, N availability, the vapor pressure deficit, and CO2 concentration. Different nonlinear functions have been developed to describe the photosynthesis response to different environmental variables. Functions to describe the photosynthesis response to irradiance have been researched the most (Jassby and Platt, 1976; Goudriaan, 1979). All equations assume that dark respiration (Rd) is independent of the light level. Among the equations presented in Table 1, Blackman (Eq. [3.1]) is the simplest one, and the asymptotic exponential (Eq. [3.2]) and the nonrectangular hyperbola (Eq. [3.4]) are the most common. The rectangular hyperbola (Eq. [3.3], also termed the Michaelis–Menten equation) is used less frequently because it reaches saturation faster than photosynthesis actually does.

Currently, the scientific discussion on the photosynthetic capacity (Yasym) and efficiency (a) of different plant species is based on the comparison of nonlinear regression estimates; for this reason, caution should be exercised because similar estimates from different equations can result in different responses (Fig. 2). Equation [3.2] is a simple three‐parameter equation widely used in light‐driven process‐based models like SUCROS and Hybrid‐maize (Goudriaan and van Laar, 1994; Yang et al., 2004). Equation [3.4] offers more flexibility and is more accurate than Eq. [3.2] at the cost of one extra parameter (i.e., θ, the curvature parameter). When θ = 1, Eq. [3.4] becomes the Blackman equation (Eq. [3.1]), and when θ approaches zero, Eq. [3.4] becomes the rectangular hyperbola equation (Eq. [3.3]). Equation [3.4] is the reference equation when the biochemical model of Farquhar et al. (1980) or Collatz et al. (1992) is used in modeling studies. New equations are still being developed and tested (e.g., Eq. [3.7] in Supplemental Table S3).

image

Nonlinear models for describing photosynthesis response to irradiance (left) and respiration response to temperature (right). Equations are given in Table 1. The following parameter values were used for these plots: asymptotic maximum response variable (Yasym) = 30 μmol CO2 m−2 s−1, initial curve slope (a) = 0.05 mol CO2 mol−1 photons, curvature parameter (θ) = 0.7, dark respiration (Rd) = 2 μmol CO2 m−2s−1; increase in respiration for each 10°C temperature increase (Q10) = 2, reference temperature (Tref) = 20°C, universal gas constant (R) = 8.314 J K−1 mol−1, activation energy (E) = 65,000 J mol−1, entropy (S) = 650 J K−1 mol−1, deactivation energy (D) = 207,000 J mol−1, temperature‐adjusted activation‐energy‐like parameter (Eo) = 350 K, fitted temperature parameter (Tx) = 225 K, and minimum temperature (Tmin) = 0°C. Note that at the reference temperature of 20°C, respiration = 1. The optimum temperature for the modified Arrhenius equation is: Topt = D/{SR ln[E/(DE)]} –273 = 42.3°C.

The photosynthesis response to CO2 has been quantified empirically using a nonrectangular hyperbola (Goudriaan, 1979; Johnson et al., 2010) and mechanistically using a biochemical model (Farquhar et al., 1980). The biochemical model is based on Michaelis–Menten kinetics for substrate‐limited growth and the law of minimum between carboxylation and electron transport rates (Eq. [3.6], Table 1). Although its computation is laborious, this model has found large acceptance. For more details on that model, see the original publications (Farquhar et al., 1980; von Caemmerer and Farquhar, 1981) and model application studies (Medlyn et al., 2002; Archontoulis et al., 2012). The photosynthesis response to leaf N, which is strongly related to the Rubisco content, can be modeled using a modified logistic equation proposed by Sinclair and Horie (see Eq. [3.5], Table 1), while alternatives exist (Eq. [3.8] in Supplemental Table S3). The photosynthesis response to water stress is usually described by sigmoid functions at the leaf level. For instance, Vico and Porporato (2008) utilized a Weibull‐type curve (Eq. [3.9] in Supplemental Table S3). The photosynthesis response to a vapor pressure deficit has been described by an exponential decay function (e.g., Osório et al., 2006) but usually more sophisticated approaches have been used (Collatz et al., 1992; Yin and Struik, 2009). The photosynthesis response to temperature is discussed below.

Group IV—Temperature Dependence

A multitude of nonlinear regression models have been proposed and tested for modeling the temperature dependence of various soil and plant processes (Lloyd and Taylor, 1994; Kätterer et al., 1998; Davidson et al., 2006; Shibu et al., 2006; Portner et al., 2010). These include power, logarithmic, exponential, sigmoid, and bell‐shape functions (Table 1; Supplemental Table S4). The van't Hoff or Q10 function (Q10 is the factor by which the rate of a process increases for each 10°C temperature increase) has found application in many studies, particularly in those addressing leaf or soil respiration rates. A Q10 of 1 indicates no temperature effect. The Q10 value commonly ranges from 1.4 to 4.9 (Tjoelker et al., 2001; Atkin et al., 2005). In the Arrhenius equation, the Q10 term has been replaced by the activation energy. Both equations are equivalent, producing similar temperature responses (Fig. 2); however, it should be noted that both Q10 and E coefficients are temperature‐range dependent. Usually, narrow temperature measurement ranges result in high and sometimes unrealistic Q10 or E estimates. Lloyd and Taylor (1994) noticed limitations of these two functions (i.e., the rate of reaction is not constant across temperatures) and developed a new equation (see Eq. [4.4]) to fit extensive literature data.

The above temperature functions describe a monotonic increase (Fig. 2). The rate of a process probably increases to an optimum temperature point and then drops (in reality, due to the lack of appropriate data, the drop is not always apparent). New equations or modifications of existing models have been developed to account for this. For example, the modified Arrhenius function, when compared with the Arrhenius equation, includes an additional two‐parameter term (see D and S in Eq. [4.3] and Fig. 2) to capture the decline in the rate of a process at very high temperature (e.g., the electron transport rate). If one of the two additional parameters is set to zero, then Eq. [4.3] becomes Eq. [4.2] (see also supplemental figures). This equation is “fragile” and requires careful parameterization (Medlyn et al., 2002; Archontoulis et al., 2012).

Johnson et al. (2010) argued that temperature functions based on the activation energy of chemical reactions are quite complex and difficult to apply routinely. They used a modified beta function to describe the photosynthesis response to temperature (Eq. [4.10] in Supplemental Table S4). Kirschbaum (1995) used a modified exponential temperature function (see Eq. [4.9] in Supplemental Table S4) that provides a peak pattern to fit soil organic matter decomposition data. Additional (but difficult to interpret) peak temperature response functions were reported in Portner et al. (2010).

Group V—Bell Curves

In addition to temperature dependencies of photosynthesis, the bell‐shaped or peak functions have been applied in agricultural science to describe the rate of phenological development as a function of temperature (e.g., Yin et al., 1995), the size of a leaf as a function of its rank in a plant (e.g., Hammer et al., 2009) or soil moisture effects on N2O emissions (e.g., Rafigue 2011). Table 1 lists three important equations. More application examples and different types of bell‐curve equations can be found in Ma and Shaffer (2001) and in Supplementary Table S5. In process‐based simulation models, researchers have approximated a bell‐shaped response (viz. rate of development) with two‐, three‐, or four‐segment (broken) linear regression models (e.g., APSIM; Keating et al., 2003). Typically, these segmented models should be fit using nonlinear methods as well.

Group VI—Others

In allometric studies, the relations that exist among the growth rates of different plant components are quantified by means of regression analysis. Given the large variability that exists among plant species and plant components, numerous nonlinear models have been utilized including power (Eq. [6.1], e.g., plant N concentration vs. biomass weight), hyperbolic (Eq. [6.2]), and sigmoid curves (e.g., Eq. [3.1]). For application examples, see Vega et al. (2000), Vega and Sadras (2003), and Archontoulis et al. (2010). The Michaelis–Menten equation (Eq. [6.3]) is well known and routinely applied to quantify the rate of a process (i.e., denitrification) that is dependent on the substrate (i.e., NO3). In contrast, Eq. [6.4] is not as common in agronomy, but it appears to be very flexible, taking many forms from linear to exponential and bell curved (see supplemental figures). It was applied to model temperature effects on soil N mineralization (Bril et al., 1994). The last equation in Table 1 is the Ricker function (Eq. [6.5]), an option for hump‐shaped patterns that are skewed to the right (Bolker, 2008).

Manipulating or Combining Nonlinear Functions

Sometimes there is a need to modify a “standard” nonlinear function to fit a set of data. This has led to the development of numerous versions of a standard equation (e.g., Birch, 1999; Tsoularis, 2001; Supplemental Tables S1–S3). Using the simplest form of the Michaelis–Menten hyperbolic function (see Eq. [7.1] in Fig. 3), we illustrate simple modification techniques. Equation [7.1] starts at zero when x = 0 and increases up to an asymptotic value of 1 as x increases. We can change the horizontal scale of this function by multiplying the variable x by a constant parameter, b, which is called a scale parameter (Bolker, 2008). If b > 1, then y saturates faster and if 0 < b < 1, then y saturates more slowly (Eq. [7.2] in Fig. 3). We can change the vertical scale of the function by introducing a new parameter, a (Eq. [7.3] in Fig. 3). In this case, the asymptote moves from 1 to a. We can shift the whole curve to the right or the left by subtracting or adding a new parameter, c, to the x variable (Eq. [7.4] in Fig. 3), which is called the location parameter (Bolker 2008). Similarly we can shift the whole curve upward or downward by adding or subtracting a new constant value, d (Eq. [7.5] in Fig. 3). Lastly, we can replace x with xk, where k is a shape parameter, and then the equation takes many forms (exponential, sigmoid, etc.; not shown). A close example to the last modification is Eq. [2. 6] in Table 1. When we modify nonlinear functions, we should add parameters that have an interpretable meaning.

image

Example of a nonlinear model modification. Starting with Eq. [7.1], the parameters a, b, c, and d were added step by step to Eq. [7.1], resulting in four new equations: Eq. [7.2–7.5]. Horizontal or vertical arrows in the figure panel indicate how the additional parameters affected the model.

When nonlinear functions are extended or combined to describe a phenomenon, we should be aware that there is an upper limit in the number of parameters that can be estimated from standard nonlinear regression analysis. This depends on the complexity of the model and the number of data points. For example, to fit growth curves with three parameters, we need at least four data points. When a process is described by a combination of nonlinear models (e.g., Farquhar model of photosynthesis or generic simulation crop models), then a stepwise parameterization method is usually applied. For application examples, see Miguez et al. (2009) and Archontoulis et al. (2012).

FITTING NONLINEAR MODELS

Presently there are many statistical software packages available for fitting nonlinear models (e.g., SAS, R, JMP, GenStat, MatLab, Sigmaplot, OriginLab, and SPSS). Nonlinear parameter estimates can be obtained using different methods (Bates and Watts, 2007); the most common are: (i) ordinary least squares, which minimizes the sum of squared error between observations and predictions, and (ii) the maximum likelihood method, which seeks the probability distribution that makes the observed data most likely. For non‐normal data such as binomial or counts, generalized (non)linear models should be used (Lindsey, 2001; Huet et al., 2003; Gbur et al., 2012). Most problems encountered during the use of standard nonlinear regression software functions are due to a poor choice of competing models or an incorrect equation or starting values (Fig. 1). The choice of estimation method can affect the parameter estimates (Ruppert et al., 1989), but in general, estimates from least squares and maximum likelihood methods tend to differ only when the data are not normally distributed and are approximately identical when the data follow a normal distribution (Myung, 2003).

Choosing Starting Values

All the procedures for nonlinear parameter estimation require initial values. The choice of values will influence the convergence of the estimation algorithm, in the worst case yielding no convergence and in the best case convergence in a few iterations (Ritz and Streibig, 2005); however, there is no standard procedure for getting initial estimates. We indicate five practical methods:
  1. If the model has parameters with biological meaning, then use information from the literature.

  2. Use graphical exploration (see the example below and Fig. 4).

  3. Transform the nonlinear model into a linear model. For instance logarithmic transformation of Eq. [1.1] yields a linear equation (viz. lnY = Yokt) in which rough estimates of the parameter values can be easily obtained by linear regression. This method is recommended for getting initial estimates and to detect deviations from linearity, but these estimates may also be used as the final estimates (Ruppert et al., 1989). For more transformation examples, see Zeide (1993), Singh (2006), and Portner et al. (2010).

  4. In the case where no clear guidelines exist for choosing starting values, the recommendation is to use a grid search or “brute force” approach (e.g., PROC NLIN in SAS or the nls2 package in R). This grid search can be done by generating an extensive coverage of possible parameter values (and their combinations) and then evaluating the model at each one of these parameter combinations. The numerical method can then be used starting with the combination that resulted in the best fit (lowest mean squared error). The hope is that an extensive enough coverage of the parameter space will provide a combination of parameters that will result in an adequate fit.

  5. Use prespecified algorithms. This approach is specific to a given equation and can be used to calculate starting values for a given data set (e.g., Pinheiro and Bates, 2000; Ritz and Streibig, 2008).

image

Biomass accumulation with time for three crops—maize (M), fiber sorghum (F), and sweet sorghum (S)—at high and low levels of agricultural inputs, collected in Greece in 2008.

Checking Algorithm Convergence

After the initial attempt at fitting a nonlinear model, we recommend that algorithm convergence is evaluated (Fig. 1). Convergence is achieved when a measure (such as the relative offset or maximum change among parameter estimates; Bates and Watts, 2007) is below a certain threshold value (e.g., 10−5), meaning that the algorithm has found a “best” solution (Fig. 1). If convergence is not achieved, the most likely problems are a poor choice of starting values or the selected model is not well suited to describe the data. If convergence is achieved, the next step is to evaluate whether the parameter estimates are within a reasonable range. This requires not only evaluating the point estimates but also their standard errors. Unusually large standard errors are a sign of convergence problems, even if convergence was apparently achieved in the previous step. If no problems were encountered up to this point, the analysis can continue by assessing model assumptions and simplifying the model.

Evaluating Model Assumptions

When we are dealing with one model, the next step is to evaluate key model assumptions: normally distributed errors, independent errors, and homogeneous variance for the errors (Fig. 1). This step and the following steps are not unique to nonlinear models but are common to all linear models. Substantial deviations from the assumptions could result in bias (inaccurate estimates), distorted standard errors, or both (Ritz and Streibig, 2008). Violations of these assumptions can be detected from an analysis of the residuals by means of graphical procedures and formal statistical tests. For a thorough analysis, see Ritz and Streibig (2008).

Briefly, to check whether the distribution of the measurement errors follows normality, the standardized residual plot is commonly applied (Pinheiro and Bates, 2000; see also the example later and Fig. 5). Outliers and many extreme values are common causes for deviations from normality (Fig. 1). Heterogeneity of variance can be detected by looking at the plot of the fitted values over the residuals (absolute residuals, which are raw residuals stripped of the negative sign, or standardized residuals, which are raw residuals scaled by the variance; see the example below).

image

Standardized residuals from individually fits to all experimental units: Eq. [2.5] from Table 1 (left) and Eq. [2.11] from Supplementary Table 2 (right). The fewer points in the left panel are because Eq. [2.5] converged in only 10 out of the 24 experimental units.

When the residual errors show a trend (e.g., increasing variability as the explanatory variable increases, Fig. 4 and 5), this can be addressed by modeling the variance as a function of the independent variable or the fitted values (Fig. 1 and 6). This is the case in our example (see discussion below). If variance heterogeneity is ignored, the parameter estimates might not be influenced much, but this may result in severely misleading confidence and prediction intervals (Carroll and Ruppert, 1988). The residuals are assumed to be independent, and when this assumption is violated it is visually evident in a plot of correlations of residuals against “lag” (or units of separation in time or space). Typically, variables measured with time on the same subject (e.g., plant, animal, or soil sample) tend to result in autocorrelated residuals that need to be accounted for by modeling the variance–covariance matrix.

image

Observed vs. predicted biomass values. The root mean squared error (RMSE) was used as a measure of the goodness of fit. Given that the variability is increasing along with the biomass weight, three RMSE values were calculated for biomass ranges indicated by the vertical dashed lines (0–10, 10–20, and >20 Mg ha−1).

MODEL SELECTION CRITERIA

When we are dealing with multiple models, the question is how to find the best model among competing models. Depending on the structure of the models, different statistical criteria can be used to find the best model: F test, Akaike information criterion (AIC), Bayesian information criterion (BIC), or the likelihood ratio test (Zucchini, 2000; Burnham and Anderson, 2002; Hoffmann, 2005; Ritz and Streibig, 2008; Lewis et al., 2011). When models are nested (one model is a special case of another), any of these criteria are applicable (Fig. 1). When models are non‐nested (models having different structures, e.g., Eq. [2.1] vs. Eq. [2.2]), typically the AIC and the BIC criteria are used (Fig. 1). From a practical point of view, however, one model might be preferred over another based on interpretability and specific objectives. There needs to be a balance between statistical model performance and how effectively the model answers research questions.

For two nested models, one with two parameters (reduced, e.g., Eq. [2.9] in Supplementary Table S2) and one with four parameters (full, e.g., Eq. [2.7] in Supplementary Table S2), to check whether the addition of parameters has a statistically significant contribution to the model performance, we can use the F test:
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0009
where SSfull and SSreduced are the regression model sum of squares for the full and reduced models, respectively, and dffull and dfreduced are the degrees of freedom for the full and the reduced models, respectively. The P value can be calculated at dffull dfreduced, np – 1, (equals 2, n – 3 for this example), where p is the number of parameters for the full model and n is the number of observations, and a decision can be made. This test is sometimes referred to as extra‐sum‐of‐squares or multiple partial F test (Hoffmann, 2005; Ritz and Streibig, 2008). The F test is computed when the ordinary least squares method is used to fit the data (see fitting nonlinear models above). When the maximum likelihood method is used to fit the data, then the likelihood ratio test statistic (Q) is computed to compare nested models:
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0010
where Lfull and Lreduced are the likelihood functions for the full and reduced models, respectively. These functions are closely related to the residual sum of squares (see Eq. [2.4] in Ritz and Streibig, 2008). It is assumed that Q is approximately χ2 distributed with np – dfreduced degrees of freedom (for details, see Ritz and Streibig, 2008). Another approach for model selection involves calculating the AIC and BIC values for each model separately:
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0011
where Li and pi are the likelihood and the number of parameters for each model, and n is the number of observations. For both statistical criteria, a smaller value indicates a preferable model. The BIC differs from the AIC only in the second term, which depends on n. Clearly as n increases, the BIC favors the simpler models (fewer parameters). This explains why sometimes the AIC and BIC indices disagree. For more information about these indices, see Burnham and Anderson (2002). Note that the likelihood ratio test, AIC, and BIC are all designed to compare the performance of models that have been fitted to data via maximum likelihood estimation (or for any model for which the likelihood can be calculated).

Goodness of Fit

There is no single method or index to best assess the goodness of fit, but there are many different methods (graphical and numerical) that highlight different features of the data and the model. Graphical comparison provides a quick visual assessment of the goodness of fit. Numerical statistical indices like R2, adjusted R2 (R2adj), bias, mean squared error, root mean squared error (RMSE), modeling efficiency (ME), concordance correlation, and others (Wallach, 2006) provide the additional detail needed to assess the goodness of fit. Some indices measure the absolute error (includes units) and some others the relative error (excludes units). Depending on the data type, a combination of these indices can be used. For example, the relative term is more meaningful than the absolute when comparing errors based on different data sets. An important aspect of the statistical descriptors is that some simple and very common indices like r2 and bias do not account for the number of parameters. The following numerical indices are commonly used in model evaluation:
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0012
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0013
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0014
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0015
urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0016
where bias, R2, Radj2, RMSE, and ME are numerical statistical indices, n is the number of data points, Yi and urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0017 are the observed and predicted values, respectively, urn:x-wiley:00021962:agj2agronj20120506:equation:agj2agronj20120506-math-0018 is the mean observed value, p is the number of model parameters, and SSresidual, and SStotal are the sum of squares for the residual, regression model, and total, respectively.

Although often used, the R2 does not represent a good metric of model performance for nonlinear models. It has several limitations (e.g., it does not account for the number of parameters), and other measures of agreement (or combinations) should be used (Wallach, 2006). The main limitation of R2 is that the full model does not necessarily include the simpler model with one single parameter, as is the case with linear models.

The numerical statistical descriptors indicate the average performance of the model across the sample. When the variability is not constant throughout the sample (e.g., biomass increase with time), then statistical indices do not capture the fact that the uncertainty is not the same at different magnitudes of the response variables (see the example below; Fig. 4 and 6). We should often be concerned with the predictive ability of the model, and for this, cross‐validation techniques can be used and the mean squared error of prediction is more appropriate (Wallach, 2006).

EXAMPLE APPLICATION

The example follows the workflow illustrated in Fig. 1.

Data: We used data from Danalatos et al. (2009), which represent destructive measurements of aboveground biomass accumulation with time for three crops: fiber sorghum (F), sweet sorghum (S), and maize (M), growing in a deep fertile loamy soil of central Greece under two management practices: high and low input conditions, in 2008. High refers to weekly irrigation (to match 100% of maximum evapotranspiration) and application of 200 kg N ha−1 and low input refers to biweekly irrigation (approximately 50% of maximum evapotranspiration) and application of 50 kg N ha−1. The experiment was a 2 × 3 factorial completely randomized in four blocks. For more details, see Danalatos et al. (2009). With such data, many questions are possible. We will concentrate on three: (i) what is the maximum biomass accumulated by these crops, (ii) at what point in time was this biomass achieved, and (iii) are there significant treatment effects and/or interactions. This requires statistical determination of the effects of crop type on the function parameters and also the effect of input level (i.e., high or low). These questions are approachable through the use of a nonlinear model that captures the mean function and the structure of the data.

Graphs: Visually (Fig. 4), sorghums have greater biomass than maize and the maximum biomass occurs later in the season. No outliers have been detected at this point. Without a statistical analysis, however, is difficult to make sound statements based solely on data visualization.

Choose candidate model: Danalatos et al. (2009) analyzed the data using the beta growth function (Yin et al., 2003a; Eq. [2.5] in Table 1). That model was selected because it captures the decline of biomass toward the end of the growing season (Fig. 4 and supplementary figure for the beta growth function). Also, the parameters have clear meaning and are very suitable to answer the research questions.

Starting values: Because for this function the parameters have a straightforward interpretation, starting values can be determined by visual inspection of Fig. 4. In this example, however, we used a prespecified algorithm that chooses the initial starting values automatically (see details in supplementary materials).

Fit model and convergence: Model fit was performed in the R package using the ordinary least squares estimation method (nls function). There are three crops, two levels of agronomic input, and four blocks, which results in 24 possible combinations (experimental units). The model was fitted to every experimental unit separately, and apparent convergence was obtained for only 10 experiment units. This indicates that some modifications are needed (see below). Checking model assumptions can be useful for diagnosing the problem (Fig. 5). In this case, it stands out that there is a concentration of points at low fitted values, which indicates overprediction (i.e., bias) at low values (Fig. 5), suggesting that a different function might work better.

Revise the mean model: We selected a modified beta growth function (see Eq. [2.11] in Supplementary Table S2), which was designed to capture more efficiently the initial growth phase at the cost of two extra parameters. Equation [2.11] allows an offset in the x axis (tb) orientation and an offset in the y axis orientation (Yb). We did not fit these parameters but rather kept them fixed; tb is the planting date at Day of the Year (DOY) 141 and Yb is the biomass weight at sowing, which is zero. As a first step, the fitting process was repeated as above, with starting values determined visually this time as 30 for Ymax, 240 for te, and 280 for tm (Fig. 4). Apparent convergence was obtained for all the experiment units. The final revised mean model was fitted to the entire data set, but at this step the model included the effect of crop type, agronomic input level, and the interaction for each parameter.

Check model assumptions: Visual inspection of the standardized residuals (Fig. 5) was used to evaluate the assumptions of appropriate mean function and normally distributed errors with homogeneous variance. Figure 5 indicates that Eq. [2.11] (modified beta growth function) alleviated the overprediction at low values, but this bias did not disappear completely. The major argument for choosing Eq. [2.11] over Eq. [2.5] is that convergence was achieved for all experimental units (24 vs. only 10).

Model variance homogeneity: The residual variance was modeled with a power function, and different power parameters were used for the three crops. This function is s2(v) = s2|v|, where v is the variance covariate (the fitted values in this case) and θ depends on the crop (0.7, 0.86, and 0.89 for maize, fiber sorghum, and sweet sorghum, respectively). More details about the fitting process can be obtained from the supplemental material.

Determine parameter estimates and standard errors: Table 2 provides the estimates and the corresponding standard errors of the model. These values are final and account for modeling the residual variance.

Table 2. Estimates of the beta growth model (Eq. [2.11] in Supplementary Table S2) used to fit the biomass data reported by Danalatos et al. (2009); P values < 0.05 indicate a significant effect of input levels (high or low). Note: in Eq. [2.11] the parameters biomass weight at sowing (Yb) and sowing date (tb) were fixed at 0 Mg ha−1 and Day of the Year (DOY) 141, respectively.
Maize Fiber sorghum Sweet sorghum
Parameter High Low P High Low P High Low P
Ymax 21.2 (0.99) 15.4 (2.27) <0.00 38.6 (2.24) 31.8 (5.18) 0.02 43.2 (2.83) 33.9 (6.48) 0.01
tm 215.7 (1.30) 217.1 (3.33) 0.50 234.5 (1.61) 235.8 (4.11) 0.61 239.4 (1.53) 240.0 (3.81) 0.79
te 248.0 (1.79) 248.8 (4.51) 0.76 277.2 (2.03) 279.4 (5.38) 0.50 278.6 (1.99) 279.0 (4.87) 0.89
  • Ymax, maximum biomass (Mg ha−1); tm, DOY when the crop growth rate is maximized; te, DOY when biomass is maximized.
  • Standard errors in parentheses.

Calculate statistical descriptors: Given that the biomass had low initial values and high values at the end of the season (Fig. 6), the use of the average RMSE (here 4.1 Mg ha−1) is misleading because it overestimates the error at the initial stages (biomass of 0–10 Mg ha−1), and it underestimates it at advanced stages (biomass of 30–40 Mg ha−1). Therefore, different RMSE values were calculated for different biomass ranges (see Fig. 6). Regarding the relative indices (no units), use of the modeling efficiency (viz. 0.88, scale 0–1) is somewhat better than the RMSE in this case, but it still expresses the average model performance across the sample and therefore is not recommended.

Interpret results and draw conclusions: According to the model predictions, the maximum estimated biomass was obtained for sweet sorghum under high inputs, and this crop reached a total of 43 Mg ha−1on DOY 279 (Fig. 7; Table 2). At the other extreme, maize reached its maximum biomass under high inputs of 21 Mg ha−1 on DOY 248. The maximum biomass (Ymax) and the time when it was reached (te) were significantly affected by the crop × input interaction (see supplemental materials). In practice, the most meaningful result might be in accurately representing treatment differences and their significance level (P value) and having a model capable of producing robust predictions within the range of observed values (i.e., interpolation) and, with more caution, outside the range of observed values (i.e., extrapolation).

image

Observed data and fit for the final model for three crops: maize (M), fiber sorghum (F), and sweet sorghum (S). Vertical bars indicate confidence intervals of observations.

SUMMARY

The most critical step that distinguishes nonlinear models from linear models is that the choice of the main function is critical and this can be difficult without appropriate guidance. We have presented an extensive library of nonlinear functions (77 equations with the associated parameter definitions) and typical applications that, we hope, will make the task of choosing candidate models easier. Our review of nonlinear equations is incomplete because there are countless numbers of potential functions (Ratkowsky, 1990) to be used and ad hoc modifications. We have also contributed a suggested work flow (Fig. 1) that should provide the necessary structure to avoid common errors in the use of nonlinear regression models.

ACKNOWLEDGMENTS

We would like to thank Ken Moore, Philip Dixon, and an anonymous reviewer for their helpful comments and suggestions that improved the manuscript.

      Number of times cited according to CrossRef: 23

      • Unoccupied aerial system enabled functional modeling of maize height reveals dynamic expression of loci, Plant Direct, 10.1002/pld3.223, 4, 5, (2020).
      • Modeling perennial groundcover effects on annual maize grain crop growth with the Agricultural Production Systems sIMulator, Agronomy Journal, 10.1002/agj2.20108, 112, 3, (1895-1910), (2020).
      • Predicting crop yields and soil‐plant nitrogen dynamics in the US Corn Belt, Crop Science, 10.1002/csc2.20039, 60, 2, (721-738), (2020).
      • Early high‐moisture wheat harvest improves double‐crop system: II. Soybean growth and yield, Crop Science, 10.1002/csc2.20174, 60, 5, (2650-2666), (2020).
      • Corn stover harvest reduces soil CO2 fluxes but increases overall C losses, GCB Bioenergy, 10.1111/gcbb.12742, 12, 11, (894-909), (2020).
      • The problem of scale in predicting biological responses to climate, Global Change Biology, 10.1111/gcb.15358, 0, 0, (2020).
      • Testing for nonlinear genotype × environment interactions, Crop Science, 10.1002/csc2.20268, 0, 0, (2020).
      • Dry Heat and Exposure Time Influence Divine Nightshade and Itchgrass Seed Emergence, Agronomy Journal, 10.2134/agronj2019.02.0072, 111, 5, (2226-2231), (2019).
      • Cryoturbation and Carbon Stocks in Gelisols under Late‐Successional Black Spruce Forests of the Copper River Basin, Alaska, Soil Science Society of America Journal, 10.2136/sssaj2019.07.0212, 83, 6, (1760-1778), (2019).
      • Prediction of Maize Grain Yield before Maturity Using Improved Temporal Height Estimates of Unmanned Aerial Systems, The Plant Phenome Journal, 10.2135/tppj2019.02.0004, 2, 1, (1-15), (2019).
      • Sugarcane Straw Blanket Management Effects on Plant Growth, Development, and Yield in Southeastern Brazil, Crop Science, 10.2135/cropsci2018.07.0468, 59, 4, (1732-1744), (2019).
      • From Field Experiments to Regional Forecasts: Upscaling Wheat Grain and Forage Yield Response to Acidic Soils, Agronomy Journal, 10.2134/agronj2018.03.0206, 111, 1, (287-302), (2019).
      • Morphological Traits Underlying Differences in Early Vigor among Four Cotton Genotypes, Crop Science, 10.2135/cropsci2018.10.0611, 59, 3, (1165-1181), (2019).
      • Nonlinear Modeling for Analyzing Data from Multiple Harvest Crops, Agronomy Journal, 10.2134/agronj2018.05.0307, 110, 6, (2331-2342), (2018).
      • Kinetics of Hydrocarbon Induced Visual Green Color Loss on a Bermudagrass Green, Agronomy Journal, 10.2134/agronj2017.08.0444, 110, 2, (472-479), (2018).
      • Nutrient Supply Rates and Phytoextraction during Wetland Phytoremediation of an End‐of‐Life Municipal Lagoon, Soil Science Society of America Journal, 10.2136/sssaj2018.02.0086, 82, 4, (1004-1012), (2018).
      • Can Late‐Split Nitrogen Application Increase Ear Nitrogen Accumulation Rate During the Critical Period in Maize?, Crop Science, 10.2135/cropsci2018.02.0118, 58, 4, (1717-1728), (2018).
      • Precision of Growth Estimates and Sufficient Sample Size: Can Solar Radiation Level Change These Factors?, Agronomy Journal, 10.2134/agronj2017.05.0297, 110, 1, (155-163), (2018).
      • Linear Regression Techniques, Applied Statistics in Agricultural, Biological, and Environmental Sciences, undefined, (107-176), (2018).
      • Allometric Method to Estimate Leaf Area Index for Row Crops, Agronomy Journal, 10.2134/agronj2016.11.0665, 109, 3, (883-894), (2017).
      • Hairy Vetch Biomass across the Eastern United States: Effects of Latitude, Seeding Rate and Date, and Termination Timing, Agronomy Journal, 10.2134/agronj2016.09.0556, 109, 4, (1510-1519), (2017).
      • Biomass Decomposition and Phosphorus Release from Residues of Cover Crops under No‐Tillage, Agronomy Journal, 10.2134/agronj2016.03.0168, 109, 1, (317-326), (2017).
      • Transgressive Variation for Yield Components Measured throughout the Growth Cycle of Jefferson Rice (Oryza sativa) × O. rufipogon Introgression Lines, Crop Science, 10.2135/cropsci2015.10.0603, 56, 5, (2336-2347), (2016).