Challenges with effective representations of heterogeneity in soil hydrology based on local water content measurements

The description of soil water movement at the field scale requires soil hydraulic material properties. These cannot be measured directly and exhibit an intrinsic multiscale heterogeneity. The estimation of soil hydraulic properties through inverse modeling or data assimilation requires measurements of the state of the system. However, these measurements are typically scarce and do not resolve soil heterogeneity leading to effective material properties with limited predictive capabilities. In a synthetic study at the scale of soil profiles, we explore the possibility to estimate an effective reduced one‐dimensional representation based on only four local water content measurements in a vertical profile located in one‐ and two‐dimensional heterogeneous media. We allow the effective material properties to deviate locally to consider the impact of the small‐scale heterogeneity on the water content measurements. In the synthetic experiments, the estimated reduced one‐dimensional representations predict soil water content and fluxes sufficiently well but can fail to predict the accurate timing and magnitude of infiltration fronts. Cumulative fluxes below the water content measurements were predicted correctly even on short time scales, if evaporation was estimated accurately. However, this proved to be challenging if the local evaporation fluxes above the measurement profile deviated strongly from the average evaporation fluxes of the heterogeneous medium. The one‐dimensional estimation leads to soil hydraulic parameters describing effective material properties that differ from the true material properties to compensate the missing representation of heterogeneity and two‐dimensional flow.


INTRODUCTION
The quantitative description of soil water movement on scales from meters to entire landscapes remains an open challenge. A critical reason is the heterogeneity of the porous soil structure at all scales, which strongly influences the water flow.
On the smaller scale of soil profiles, which is considered to be the critical scale for a reliable representation of soil hydrology (Vogel, 2019), soil water movement is described by the Richards equation, which requires hydraulic material properties of the porous structure. These properties are typically parameterized, commonly by the Mualem-van Genuchten parameterization. The required soil hydraulic parameters cannot be measured directly with sufficient accuracy or transferred reliably from the laboratory to the field. The unknown parameters are a priori often the source for the largest uncertainties in soil hydrologic modeling.
The issue of unresolved heterogeneity is well known in soil hydrology and has been addressed by describing the heterogeneous soil through effective material properties. For example, Wildenschild and Jensen (1999) performed a laboratory experiment with two-dimensional flow in a soil with small-scale heterogeneity. Based on detailed information about their experimental setup, they were able to find effective homogeneous material properties that could describe their specific experiment adequately. In contrast, an estimation based on local measurements of potentials through inverse modeling yielded dissatisfying results. However, effective material properties not only depend on the heterogeneous distribution but also on boundary conditions, and due to the nonlinear dynamics, simple averaging or first-order upscaling is unable to fully reproduce upscaled fluxes (Vereecken, Kasteel, Vanderborght, & Harter, 2007).
One possibility to address the challenge of heterogeneity is to fully resolve the heterogeneity through distributed measurements. This has been demonstrated for large-scale heterogeneity with extension of tens of meters in a synthetic study on the field scale by Chaudhuri, Franssen, and Sekhar (2018), who used data assimilation to estimate three-dimensional parameter fields. However, fully resolving heterogeneity through measurements is already challenging for large-scale heterogeneities where only a few measurement infrastructures exist that allow the necessary measurement resolution through distributed measurement profiles, but it is typically unfeasible in practical Core ideas • We estimated effective material properties based on local water content measurements. • Estimated effective soil hydraulic material properties differ from true material properties. • Accurate description of evaporation is crucial for flux prediction.
field applications to resolve smaller submeter heterogeneity for the entire field scale. Alternatively, prior structural information about the soil hydrological properties needs to be added to the inverse estimation (Vereecken et al., 2007), which is again difficult to provide for small-scale heterogeneity on the field scale.
On the scale of soil profiles, Song, Shi, Ye, Yang, and Navon (2014) estimated a two-dimensional field of the saturated hydraulic conductivity based on two measurement profiles close to each other in a synthetic study. In many cases, however, only one-dimensional heterogeneity can be considered in parameter estimation based on information from local measurement profiles. For example, Zha, Zhu, Zhang, Mao, and Shi (2019) directly estimated a one-dimensional parameter field, and Brandhorst, Erdal, and Neuweiler (2017), Erdal, Neuweiler, and Wollschläger (2014), and Zhang et al. (2019) estimated an additional bias to the state to consider model structural error, which can also stem from unresolved small-scale heterogeneity.
Heterogeneity within soil layers that are typically considered homogeneous has been reported and addressed in real-world applications as well. For example, Bauser, Jaumann, Berg, and Roth (2016) found relevant heterogeneity within individual soil layers based on water content measurements in a soil profile and estimated a simplified onedimensional heterogeneity by employing Miller scaling. In artificial hillslopes at the Landscape Evolution Observatory (LEO) at Biosphere 2, which were constructed from a homogeneous material, Gevaert et al. (2014) reported that water content decreased with depth during a constant infiltration experiment and considered soil heterogeneity to be one possible cause. Subsequent studies represented the heterogeneity vertically by introducing soil layers in the model (Scudeler et al., 2016). Nevertheless, even this research infrastructure does not allow to resolve three-dimensional heterogeneity on a submeter scale.
To the best of our knowledge, it has not been demonstrated to what extent an effective representation of small-scale heterogeneity can describe soil water movement, when estimated on the basis of few sparse local measurements without additional information about the structure of the heterogeneity. In this study, we quantitatively explore this question with synthetic experiments by estimating simplified one-dimensional representations of heterogeneous soils based on only four local water content measurements in a vertical profile. The simplified representation of the heterogeneity is based on an effective description of the hydraulic material properties but allows local deviations at the measurement locations to consider the local deviations of the measurements there. For a set of scenarios, we generate the respective two-dimensional heterogeneous synthetic truth using Miller-similar media. We assimilate the local water content measurements with an ensemble Kalman filter (EnKF) to estimate the effective one-dimensional representation of the soil hydraulic material properties. We only estimate the heterogeneity at measurement locations and interpolate in between. To differentiate between the effect of this interpolation and twodimensional flow, we additionally generate heterogeneous one-dimensional truths and apply the EnKF to them as well. Based on these synthetic experiments, we investigate the estimated material properties and predictive capabilities for water content and flux of a one-dimensional representation of two-dimensional heterogeneous water flow.

Soil water movement
Under the assumption of local equilibrium and single phase flow, the Richards equation describes water movement in a porous medium as with volumetric soil water content θ [-], time t [T], unit vector in direction of gravity e z , and soil hydraulic material properties, which relate the isotropic hydraulic conductivity K(θ) [L T −1 ] and the matric head h m (θ) [L] to the water content. These properties are most commonly parameterized with the Mualem-van Genuchten parameterization: with the saturation The parameterization comprises a set of six parameters: saturated water content θ s [-], residual water content θ r [-], saturated hydraulic conductivity K 0 [L T −1 ], and three additional parameters α [L −1 ], n [-], and τ [-], which do not have a direct physical relation but can be associated with the inverse air entry value, pore size distribution, and tortuosity.
To represent small-scale heterogeneity in a soil, we use Miller scaling. It assumes geometric similarity based on the microscopic pore geometry of a reference material. The soil hydraulic material properties at a specific location x then scale with where * denotes the reference material properties and ξ [-] is the Miller scaling parameter. This way a heterogeneous field can be generated based on a scalar field of Miller scaling parameters. Miller-similar media reproduce an essential feature of heterogeneous soils. At small negative matric heads, locations with coarse texture (larger scaling factors) feature a higher hydraulic conductivity than locations with finer texture (smaller scaling factors). This relation becomes reversed for more negative matric heads. Then, locations with finer texture feature a higher hydraulic conductivity than locations with coarse texture. This behavior leads to crossing points between different realizations of the hydraulic conductivity function K(h m ) scaled with different Miller scaling parameters. A limitation of Miller scaling is, that it does not scale the porosity, which is consequently assumed to be the porosity of the reference material at all locations.

Ensemble Kalman filter
The EnKF (Burgers, Jan van Leeuwen, & Evensen, 1998;Evensen, 1994) is a data assimilation method to combine model and measurement information into an improved estimate of the model's state based on the assumption of Gaussian error distributions. It is the Monte Carlo extension of the Kalman filter (Kalman, 1960) for nonlinear models. To not only estimate states but also parameters, an augmented state can be defined, which then contains the model states as well as parameters. For the estimation, the filter sequentially iterates between a forecast and an analysis step.
In the forecast step, an ensemble of augmented states u n is propagated from time k − 1 to time k when a measurement is available: where n denotes the ensemble member, M is the forward model, and η is the process noise, which can be used to describe unrepresented model errors. The subscripts "f" and "a" denote forecast and analysis, respectively. In this paper, the augmented state consists of a vector of water contents θ representing the water content state, soil hydraulic parameters, and Miller scaling factors, and it has the length N u = 208, where the water content state component accounts for 200 dimensions. The forward model for the water content is the Richards equation, whereas for the parameters it is a constant model. Assuming Gaussian distributions, the uncertainty of the ensemble is characterized by the N u × N u forecast error covariance matrix: where the overline denotes the expectation value and f , is the ensemble mean. In the practical application of the EnKF, expectation values are approximated by the ensemble.
In the analysis step, the model information is combined with measurement data d based on their respective uncertainty. In this paper, the length of d is the number of measurements N m , with N m = 4. The measurement uncertainty can be represented through an ensemble of measurements d n : where ε is the measurement error. The N m × N m measurement error covariance matrix is then The measurements are linked to the augmented state through the linear N m × N u measurement operator H, which maps from the state space to the measurement space. A measurement at time step k of the postulated true system state u true,k is given by The forecast ensemble is then updated with the measurements to the analysis ensemble: where the N u × N m Kalman gain K weights the forecast error covariance matrix with the sum of forecast error covariance matrix and measurement error covariance matrix and maps from the measurement space back to the state space: The augmented state is iteratively updated by alternating between forecast and analysis throughout the available measurement data.
We use several extensions of the EnKF to improve its performance in soil hydrological application. Through spurious correlations, as well as non-Gaussian distributions, the uncertainties described by the error covariance matrix of the analysis ensemble can become too small. This leads to filter inbreeding and can ultimately give rise to filter divergence. The issue is intensified if the process noise describing the model error is unknown. To reduce this effect, a damping factor γ ∈ (0,1] can be multiplied to the correction vector in the analysis update in Equation 12 (Hendricks Franssen & Kinzelbach, 2008). This lowers the reduction of the uncertainty in each analysis step. Extending the damping factor to a vector γ (with an entry-wise multiplication) allows to treat each augmented state component differently (Wu & Margulis, 2011), which is also done in the present work.
Multiplicative inflation is an additional possibility to counteract filter inbreeding. To increase the model uncertainty, the forecast uncertainty can be raised by enhancing the distance between each ensemble member and the ensemble mean. Anderson and Anderson (1999) multiplied the distance with a constant inflation factor λ ≥ 1: However, the inflation factor is specific for each problem. By comparing the distance between forecast and measurements with their uncertainties, the factor can be estimated adaptively both in time (Anderson, 2007;Li, Kalnay, & Miyoshi, 2009;Wang & Bishop, 2003) and space for the augmented state (Anderson, 2009). In this paper, we use the spatiotemporally adaptive inflation method TA B L E 1 Summary of synthetic experiments. For each of the eight scenarios, we used five different realizations of the Miller scaling field for a two-dimensional truth and one-dimensional data assimilation (2D) and a one-dimensional truth and one-dimensional data assimilation (1D) case. This leads to a total of 80 experiments proposed by Bauser, Berg, Klein, and Roth (2018), which uses a Kalman filter within the EnKF to estimate the inflation factors. The method requires setting an uncertainty for the inflation factor, which determines how fast the factor can be adjusted. Due to a limited ensemble size, the signal/noise ratio worsens for decreasing correlations in the error covariance matrix. However, correlations typically decrease with increasing physical distance between two state locations (Hamill, Whitaker, & Snyder, 2001). The resulting spurious correlations can be counteracted through localization (Houtekamer & Mitchell, 1998), which reduces covariances with distance through a correlation function (Houtekamer & Mitchell, 2001). The localization alters the calculation of the Kalman gain to where ρ is a correlation function and • denotes the element-wise multiplication with each entry of the covariance matrix based on the distance between the positions in the augmented state. We use the typically employed fifthorder piecewise rational function defined by Gaspari and Cohn (1999) as a correlation function. The function is similar to a Gaussian function but with local support, which introduces a cutoff length.

Ensemble Kalman filter settings
Throughout this paper, we chose an ensemble size of N = 100 and set the process noise to η = 0. The damping factor was chosen analogously to, for example, Erdal et al. (2014): we set a factor of γ = 0.3 for the parameter component of the augmented state and applied no damp-ing (γ = 1.0) to the water content component. For the inflation methods, we followed the settings used by Bauser et al. (2018) for the estimation of the inflation factor with a separate Kalman filter: we set a value of 1.0 for the uncertainty of the inflation factor and applied the same damping factors as in the EnKF. For the localization, we chose a cutoff length scale of the correlation function of 1.5 m.

SYNTHETIC EXPERIMENTS
To investigate the effect of unrepresented heterogeneity when estimating soil hydraulic properties based on local measurements, we performed a total of 80 synthetic experiments, which are specified below and summarized in Table 1. The synthetic experiments are designed to include the unrepresented heterogeneity as the only model structural error. The local water content measurements were placed in the middle of a two-dimensional heterogeneous soil at depths of 9.5, 19.5, 39.5, and 79.5 cm. For the heterogeneous soil, we chose a spatial extension of 2 × 2 m, which we expect to be sufficient to describe the impact of the small-scale heterogeneity on the local measurements.

Material properties
To generate heterogeneous material fields, we used Miller scaling, which requires a scalar field of scaling factors for the entire domain. We followed Roth (1995) to create lognormally distributed Miller scaling fields. In a first step, we generated realizations of scalar Gaussian random fields with expectation 0 and variance 2 ξ using a generator based on circulant embedding (Klein, 2016). We used Gaussian autocovariance functions with axes parallel to F I G U R E 1 Example of Miller scaling fields. The two-dimensional field in Panel a was generated with vertical correlation length (ℓ v ) = horizontal correlation length (ℓ h ) = 10 cm and variance ( 2 ξ ) = 0.1 for the underlying Gaussian random field. The one-dimensional Miller field in Panel b was generated by using a vertical slice of the 2D field at width 100.5 cm. Note that the color map scales exponentially the anisotropy defined by the vertical correlation length ℓ v and horizontal correlation length ℓ h . The spatial resolution of the field was set to 1 cm in all cases. The Miller scaling factors are then calculated at each location from the random number r at that location with This way, the expectation of the hydraulic conductivity function remains the same as for the reference material.
We created five different realizations of the random field through different seeds for each set of ℓ v , ℓ h , and 2 ξ listed in Table 1. We restricted the number of different realizations to five to limit the computational demand. An example for a Miller field with ℓ v = ℓ h = 10 cm and 2 ξ = 0.1 is shown in Figure 1a. As a control for each two-dimensional heterogeneous Miller field, we also created an effectively one-dimensional heterogeneous Miller field by using a vertical slice at horizontal position 100.5 cm of the twodimensional heterogeneous Miller field ( Figure 1b). This slice allows to explore the effect of the reduced representation of heterogeneity alone without the impact of twodimensional flow.

Initial condition
To create the initial condition for the synthetic experiments, we used a spin-up phase of 30 d. The initial condition for this spin-up phase was hydraulic equilibrium with a groundwater table at the bottom of the domain. The lower boundary condition was kept constant with a matric head of 0 m throughout the entire time. The upper boundary condition was an infiltration flux of 1.0 × 10 −7 m s −1 (0.36 mm h −1 ) for the first 10 d, followed by no flux for the remaining 20 d. Figures 2a and 2b show the initial conditions for the two-and the one-dimensional Miller field given in Figure 1 (ℓ v = ℓ h = 10 cm and 2 ξ = 0.1) with sandy loam as the reference material. Figure 2c compares the vertical water content distribution of the one-dimensional case with the two-dimensional water content distribution at 100.5 cm. Although the Miller scaling factors are exactly the same, the water contents differ due to the influence of the two-dimensional flux even after 20 d of no flux as the upper boundary condition.
In this example, this can be seen at a depth of around 100 cm.

F I G U R E 2
Initial conditions created with the Miller scaling field shown in Figure 1. Panel c compares the one-dimensional initial condition with a slice of the two-dimensional initial condition at horizontal position 100.5 cm, marked by the orange dashed line in Panel a

F I G U R E 3
Standard boundary condition at the upper domain boundary. Precipitation is described though a flux, whereas evaporation is realized through a matric head, where only outflow is allowed (positive fluxes are set to 0). The entire time is separated into an assimilation phase and a verification phase

Boundary conditions
The boundary condition sequence applied to the upper boundary is shown in Figure 3. The total simulation time is 52 d. There are five rain events with duration of 4 d, each followed by 4 d of evaporation. The sixth rain event lasts for 2 d, followed by 10 d of evaporation. The first four irrigation events have a flux of 1.0 × 10 −7 m s −1 (0.36 mm h −1 ), the fifth event has a lower flux of 0.25 × 10 −7 m s −1 (0.09 mm h −1 ), and the sixth rain event has an increased flux of 4.0 × 10 −7 m s −1 (1.44 mm h −1 ). The evaporation is realized through a matric potential that starts at 0 m at the end of the previous rain event and is linearly decreased by 1 m d −1 . If this potential would lead to a flux into the domain, a no-flux boundary condition is applied instead. At the lower boundary, we keep a constant matric head of 0 m, corresponding to a fixed water table, and we apply noflux boundary conditions at the lateral boundaries. We separated the entire time into a data assimilation phase during which the EnKF was used to estimate water content states, soil hydraulic parameters, and Miller heterogeneity, and a verification phase used to investigate the predictive capabilities. The assimilation phase comprises the first three rain and evaporation events and ends after 24 d. The rain events are large enough to cause a response at all measurement locations. The following verification phase includes one rain event identical to the one used in the data assimilation phase, and rain events with lower and higher intensity. This allows us to investigate the prediction accuracy within and beyond the calibrated dynamics.
To alter the impact of the Miller scaling on the dynamics, we created a second boundary condition sequence with lower fluxes, for which the material properties differ more strongly for the same scaling field. We created this boundary condition by dividing the flux of each rain event by 4. The rate of change of the matric potential for the evaporation was decreased to only 0.25 m d −1 . To ensure that the dynamics reaches similar depths as before, all times were extended by a factor of 4. This increased the total time to 208 d. Hereafter, we refer to these longer cases as the low-flux boundary condition, whereas the previous boundary condition of 52 d length is referred to as the standard boundary condition.

Data assimilation
Synthetic measurements of the water content were generated every 2 h (8 h for the low-flux boundary condition) at horizontal position 100.5 cm at depths of 9.5, 19.5, 39.5, and 79.5 cm. The true values were disturbed by the measurement uncertainty drawn from an unbiased normal distribution with a standard deviation of 0.007, which can be assumed for time domain reflectometry (TDR) probes (Jaumann & Roth, 2017).
In the data assimilation phase, we used the EnKF to estimate a one-dimensional representation based on the synthetic water content measurements. However, other methods for the inverse estimation could be used as well. The augmented state in the EnKF comprised the water content state, the Mualem-van Genuchten parameters α, n, and K 0 , the Miller scaling factors at the four measurement locations, and additionally the Miller scaling factor at the cell closest to the surface (0.5-cm depth). For K 0 and the Miller scaling factors, we chose to include the logarithm instead of the direct quantity into the augmented state. The parameters θ s , θ r , and τ were set to the respective true values.
As the initial guess of the water content state for the data assimilation, we linearly interpolated the water content measurements at time 0 d and the saturation water content at the lower boundary. The value between the upper boundary and the topmost measurement was set constant to that of the measurement. To represent the uncertainty of measurements and interpolation, we used a normal distribution with a standard deviation of 0.01 and a spatial covariance given by the fifth-order piecewise rational function by Gaspari and Cohn (1999) with a cutoff length scale of 20 cm.
For the initial guess of the Mualem-van Genuchten parameters we used univariate normal distributions, whose mean was set to the respective true value. By choosing the true values for the mean, the parameter estimation becomes easier for the EnKF. This is intended to reduce the impact of the filter performance on the different synthetic experiments, as it is not the goal of this paper to investigate the performance of the EnKF itself. The standard deviations for the Mualem-van Genuchten parameters were set to σ α = 2.0 m −1 , and σ log 10 ( 0 ) = 1.0. For n, we chose different uncertainties for the sandy loam and the silt loam cases to avoid too small of values, which create difficulties for numeric solvers due to dramatically increasing sensi-tivity of the hydraulic conductivity to small changes in the water content close to saturation (Vogel, van Genuchten, & Cislerova, 2000). For the sandy loam, we set σ n = 0.25, whereas for the silt loam we set σ n = 0.1.
The data assimilation was only performed on onedimensional domains with a reduced Miller scaling field representation explained in the subsection below. Synthetic measurements, on the other hand, were conducted on one-and two-dimensional domains with a fully resolved Miller field. These two cases (two-dimensional truth and one-dimensional data assimilation, and onedimensional truth and one-dimensional data assimilation) are in the following only referred to as "2D" and "1D," respectively. Hence, there are two sources of model uncertainty: the reduced representation of heterogeneity in both 2D and 1D cases, and the influence of horizontal fluxes in the 2D cases. The resulting soil hydraulic parameters and Miller scaling field are not expected to match the true values because the data assimilation process estimates an effective representation that predicts the measured dynamics within the limited frame of the hydraulic model used in it.

Miller scaling representation
We chose to not include the entire Miller scaling field into the augmented state, because local water content measurements typically convey this information insufficiently, if they cannot resolve the field. Therefore, we only estimated five Miller scaling factors, one at each measurement position and one at the surface, and aim for an effective description where the measurements cannot resolve the heterogeneity. We constructed this reduced representation for the Miller scaling factor at depth z as where ξ is the vector containing the five Miller scaling factors ξ i . The vector φ(z) consists of Gaussian kernel functions φ i (z) whose means are the positions of the corresponding Miller scaling factors. These functions are not normalized, leading to a maximum value of 1.0 at the location of the respective Miller scaling factor. For the standard deviation of the kernels, we chose 10 cm. This way the difference to no scaling (scaling factor of 1) of each Miller scaling factor is weighted with the kernel function depending on the distance to its respective location. The matrix A accounts for possible correlations of Miller scaling factors close to each other to avoid values exceeding the individual Miller scaling factors. This is achieved by setting the diagonal entries to A ii = 1 and the off-diagonal entries to The logarithm of each of the five scaling factors was included in the augmented state. Their initial guess was drawn from normal distributions with standard deviation σ log 10 (ξ) = 0.2. We chose the respective distribution mean to be the logarithm of the true value of the Miller scaling field at the corresponding location.

Implementation
We solved the Richards equation numerically with the DORiE software package (Riedel, Ospinar De Los Rios, Klein, Häfner, & Riexinger, 2019), which is based on the Distributed and Unified Numerics Environment (DUNE) (Blatt et al., 2016) and DUNE-PDELab (Bastian, Heimann, & Marnach, 2010). For the spatial discretization, we chose a vertical grid resolution of 1 cm in all simulations. In the onedimensional simulations, one grid cell spanned the entire domain width of 2 m. In two-dimensional simulations, we chose a horizontal resolution of 1 cm as well, which resulted in a square grid of 200 × 200 cells. The spinup phase simulations were conducted with the symmetric weighted interior penalty variant of the discontinuous Galerkin method. During the synthetic experiments, the ensemble states were propagated by a finite volume solver with upwinding. The simpler solver and the relatively low resolution were chosen to reduce the computational effort, which enabled a larger ensemble size. For the temporal discretization, the diagonally implicit second order method proposed by Alexander (1977) was used. We used an adaptive time step scheme that varies simulation time steps between 1 s and 1 h depending on the solution's convergence rate.
For computing the state propagation of a single ensemble member, we calculate a matric head state from the water content state used in the EnKF via the hydraulic parameterization. To that end, the Miller scaling field is reconstructed from the Miller scaling factors stored in the augmented state and their respective positions following Equation 17. The resulting field in the solver has one constant scaling value per grid cell. DORiE then propagates the matric head state by using the current boundary conditions. Before returning the new state back to the EnKF, it is transferred into water content again.

Evaluation
After the data assimilation phase, the propagation of the ensemble was continued without any state or parameter update during the verification phase. To evaluate and compare the performance of the different cases dur-ing the verification phase, we calculated summarizing measures.
For the water content we calculated the RMSE as where θ true,t,m and θ µ,t,m are the synthetic true water content and mean water content for measurement location m at time step t, respectively. The total number of measurement locations is N m = 4, and the total number of 2-h time steps (8-h time steps for the low-flux boundary condition) during the verification phase is N t = 336. For the fluxes, we calculated the ensemble mean of the predicted vertical flux at three different locations-(a) at the surface at 0 cm, (b) in between the water content measurements at a depth of 50 cm, and (c) below the depth covered by the water content measurements at 100 cmand compare it with the true value. Note that the synthetic truth for the 1D cases, as well as the predictions for all cases, are one-dimensional with a horizontally homogeneous flux, whereas the synthetic truth for the 2D cases is two-dimensional and the flux direction and magnitude varies horizontally. Therefore, we use the horizontal average of the vertical flux component at the different depths as the true flux.
For the depth of 100 cm, we calculated the relative cumulative flux error as where j true,t and j μ,t are the true and mean predicted vertical flux at the depth of 100 cm at time step t. The total number of time steps is again N t = 336.
Additionally, we calculated the Nash-Sutcliffe efficiency (NSE), where j μ is the temporal average of the true fluxes at 100-cm depth.
To quantify the relevance of the two-dimensional flux, we calculated the average horizontal flux ratio (AHFR) for the synthetic truth at horizontal position 100.5 cm where the measurements are located: where <j h,t > is the spatial average of the absolute horizontal flux in the top 100 cm and <j v,t > is the spatial average

RESULTS AND DISCUSSION
In this section, we first discuss the results for the water content and flux predictions for a single example and summarize the results for all synthetic experiments. We then explore the effect of surface heterogeneity for the particular example with the strongest impact, as well as the effect on the prediction of evaporation fluxes. The last subsection presents the details of the example with the largest AHFR value to highlight the impact of two-dimensional flow. Figure 4 shows the water content predictions during the verification phase for the 1D and 2D case of the Default scenario for the Miller random field shown in Figure 1. The water content predictions at all four measurement loca-tions agree very well with the synthetic measurements and truth. The uncertainty predicted by the ensemble is small. The residuals between synthetic truth and predicted mean water contents are low as well. However, there is an increase from the 1D to the 2D case. In the 2D case, the most significant errors are spikes in the residual between 40 and 44 d, due to an inaccurate prediction of the arrival time of the infiltration front.

Water content and flux prediction
The RMSE over all four measurement locations is small with RMSE θ = 0.0025 for the 1D case and a slight increase to RMSE θ = 0.0035 for the 2D case. This finding is confirmed by the average RMSE over all five seeds for the Default scenario. It increases from an average RMSE θ = 0.0022 for the 1D cases to RMSE θ = 0.0040 in the 2D cases ( Table 2).
The small errors demonstrate that in this scenario, the EnKF was able to estimate a reduced one-dimensional representation of the heterogeneous soil, based on only four local water content measurements. This reduced representation was sufficient to predict the water contents at the measurement locations with high accuracy for (a) a truth generated with a more detailed one-dimensional TA B L E 2 Summary measures of the one-dimensional truth and one-dimensional data assimilation (1D) and two-dimensional truth and one-dimensional data assimilation (2D) cases for all scenarios (see Table 1 heterogeneity as well as (b) a truth generated with a twodimensional heterogeneous Miller-similar medium. However, the results also show that the errors increase in case of unrepresented two-dimensional flux. To see how well the error is predicted by the ensemble spread, we divided the absolute residual by the standard deviation of the ensemble. This relative residual is displayed in the bottom panels in Figure 4. They show that the uncertainty is slightly underestimated, with a stronger underestimation in the 2D case. The ensemble can only predict uncertainties represented in the augmented state. Errors due to the two-dimensional heterogeneity in the 2D case and the reduced representation of the heterogeneity in the 1D and 2D case cannot be fully represented. This can lead to the underestimated uncertainties. Compared with the absolute residuals, the structure of the relative residuals changes as well. The peaks due to the inaccurate prediction of the arrival time become smaller in relation to the other errors. This shows that the ensemble, at least in part, is able to predict the uncertainty in the arrival time.
To investigate the relation of the performance of water content predictions and heterogeneity over all scenarios and cases, Figure 5 shows the RMSE θ over the AHFR. The AHFR is calculated from the synthetic truth and indicates the strength of two-dimensional flux in the profile. Per definition, AHFR values of the 1D cases are 0. We therefore use the AHFR of the 2D cases for the corresponding 1D cases as well, as indicated in Table 2. Figure 5 shows that RMSE θ generally increases with the AHFR. We find that RMSE θ is below 0.012 for all cases, and even below 0.005 for a majority of them. This means that most cases show a smaller RMSE θ than the measure-F I G U R E 5 The RMSE of water content (RMSE θ ) vs. the average horizontal flux ratio (AFHR) of one-dimensional truth and onedimensional data assimilation (1D) and two-dimensional truth and one-dimensional data assimilation (2D) cases for all eight scenarios with five different seeds each. The 2D cases are shown in dark circles, and the 1D cases are shown in pale squares. The AHFR values for the 1D cases are 0. We therefore use the AHFR of the 2D cases for the corresponding 1D cases as well (see Table 2) ment uncertainty of TDR probes of around 0.007 (Jaumann & Roth, 2017). This confirms that the water content predictions have a high accuracy, as indicated by the example case in Figure 4. Most 2D cases have a larger RMSE θ than their 1D counterparts due to the unrepresented twodimensional flux.
The variance of the underlying Miller field rises through scenarios Default, MedVar, and HighVar. This increases both AHFR and RMSE θ , following the overall trend in Figure 5. The same is true when comparing scenarios Default and Silt. Scenario Default uses sandy loam as reference material, whereas scenario Silt uses silt loam. The material properties of silt loam with the same Miller scaling field applied differ less for the occurring fluxes than for sandy loam, since the system is closer to the crossing points of the hydraulic conductivity functions K(h m ). Therefore, the flow field heterogeneity is lower and AHFR and RMSE θ become smaller.
Comparing scenarios Default and LowFlux, AFHR and RMSE θ behave oppositely: the AFHR increases, but the RMSE θ becomes smaller. The reason for this is that Default uses the standard boundary condition, whereas LowFlux uses the low flux boundary condition. The material properties differ stronger for the lower fluxes, since the system is farther away from the crossing points of the hydraulic conductivity functions K(h m ), leading to a larger AHFR. The lower fluxes also cause smaller changes in the water content, which in combination with an increasing slope of the Vadose Zone Journal F I G U R E 6 Flux predictions for the 1D and 2D case of the Default scenario with the Miller field shown in Figure 1. The top panels show the flux at three different depths. The dashed line marks the synthetic truth that is calculated by averaging horizontally over the entire domain. The solid line is the mean of the ensemble predictions and the pale area around shows the standard deviation of the ensemble. The bottom panels show the residuals between synthetic truth and ensemble mean hydraulic conductivity function K(θ) towards lower water contents then lead to smaller errors and a reduced RMSE θ . The same explanation is true for the scenarios with silt loam, Silt and SiltLowFlux, where the RMSE θ is smaller for SiltLowFlux.
Increasing the horizontal or vertical correlation length of the Miller scaling field in scenarios Horizontal and Vertical promotes vertical flow paths, manifesting in a lower AHFR compared with the isotropic setting. In the former, this is due to the scaling field becoming more similar to the respective 1D case. Indeed, the difference of RMSE θ between 1D and 2D case becomes nearly negligible. The impact on the RMSE θ of the Vertical scenario is low, the differences between 1D and 2D cases persist and the RMSE θ values remain similar to the ones of the Default scenario.
Synthetic experiments offer the advantage that we can analyze the flux predictions as well. Figure 6 shows the flux predictions during the verification phase for the 1D and 2D case of the Default scenario for the Miller random field shown in Figure 1. The fluxes are predicted well for both the 1D and 2D case. Again, the residuals are larger for the 2D case than for the 1D case. At the depths of 50 and 100 cm, they are mainly due to the inexact prediction of the arrival time of the infiltration fronts. At the upper boundary, at a depth of 0 cm, the flux during the rain events is given through the boundary condition and the residuals are 0. The evaporation flux is determined through a matric potential leading to larger residuals, especially in the 2D case, due to the inexact onset time of the evaporation and due to the maximum evaporation flux after the onset.
The NSE j at a depth of 100 cm is 0.95 for the 1D case and 0.94 for the 2D case in Figure 6. The relative cumulative flux error ΔJ at 100 cm is 0.01 for the 1D case and 0.07 for the 2D case. In this example, the fluxes are predicted well, but again better for the 1D case than for the 2D case. This is also true for the average over all five seeds for NSE j and ΔJ.
To investigate the fluxes in more detail, Figure 7 shows the NSE j over the AHFR for all scenarios and cases. Most NSE j are close to 1 and larger than 0.8. Even for large values of the AHFR, corresponding to a stronger heterogeneous flux, NSE j values close to 1 are possible. However, the spread of NSE j increases with AHFR and makes lower NSE j more likely. Although a 1D case can have a lower NSE j than the corresponding 2D case for individual realizations, most 2D cases have smaller NSE than the 1D cases for all Scenarios with sandy loam. Only for scenarios Silt and SiltLowFlux do the 1D cases have smaller NSE than the 2D cases. However, these differences are small with NSE values close to 1 (see Table 2). We also note that there is one particular case that has even a negative NSE j = −0.57. This is a realization of scenario HighVar, which features the largest variance of the Miller scaling field.

F I G U R E 7
Nash-Sutcliffe efficiency of the flux at a depth of 100 cm (NSE j ) vs. the average horizontal flux ratio (AFHR) for onedimensional truth and one-dimensional data assimilation (1D) and two-dimensional truth and one-dimensional data assimilation (2D) cases for all eight scenarios. The 2D cases are shown in dark circles, and the 1D cases are shown in pale squares. The AHFR values for the 1D cases are 0. We therefore use the AHFR of the 2D cases for the corresponding 1D cases as well The relative cumulative flux error ΔJ at a depth of 100 cm over the AHFR is shown in Figure 8. Again, no clear tendency with the AHFR can be seen. However, the 2D cases tend to have larger errors than the 1D cases (see also Table 2). In particular, we note a group of five different 2D cases with ΔJ < −0.15. The worst performance is again part of scenario HighVar, which has a relative cumulative flux error of ΔJ < −0.49.

Effects of surface heterogeneity
We investigate the outlier noted in Figure 8 in more detail. This case also corresponds to the single case with the negative NSE j in Figure 7 and the largest RMSE θ in Figure 5. The water content predictions of this case are shown in Figure 9a, and the flux predictions are shown in Figure 10a. The water content predictions show strong deviations from the true values. The RMSE θ over all four measurement locations is 0.012. However, the deviations are stronger closer to the surface. The RMSE at the measurement depth of 9.5 cm is 0.017, where the largest deviations occur during the infiltration events. The flux predictions show large deviations at all depths. The evaporation at a depth of 0 cm is overestimated strongly, whereas the infiltration flux at 50-and 100-cm depth is underestimated. The cumulative flux at 100 cm is underestimated by 49%.

F I G U R E 8
Relative cumulative flux error ΔJ at a depth of 100 cm over the average horizontal flux ratio (AHFR) for onedimensional truth and one-dimensional data assimilation (1D) and two-dimensional truth and one-dimensional data assimilation (2D) cases for all eight scenarios. The 2D cases are shown in dark circles, and the 1D cases are shown in pale squares. The AHFR values for the 1D cases are 0. We therefore use the AHFR of the 2D cases for the corresponding 1D cases as well The reason for this poor performance is in the specific realization of the Miller field. In this particular case, the scaling factor at the surface in the middle of the domain (where the measurements are located) is very small with ξ = 0.09. This corresponds to smaller pore sizes and consequently higher evaporation fluxes at more negative potentials compared with larger scaling factors.
The data assimilation is able to follow the trend of the true Miller field correctly and estimates a small Miller scaling factor of ξ = 0.11 at the surface. Note, however, that the Miller scaling factors cannot be compared entirely because the relevant material properties at the surface are also influenced by the estimated reference material. Although the low scaling factor in the estimated 1D field influences the evaporation flux over the entire width of the domain, it is a localized deviation in the 2D synthetic truth with little influence on the horizontally averaged fluxes. This leads to incorrect estimation of the evaporation, which influences the subsequent infiltration, causing lower fluxes at greater depths.
To investigate the impact of the Miller scaling at the surface in more detail, we modified the data assimilation setup and removed the Miller scaling factor at the surface from the EnKF estimation. The chosen interpolation to create the reduced representation of the Miller field approaches scaling factors of 1 in sufficiently large distances from the locations of the Miller scaling factors represented in the augmented state (see Equation 17). In F I G U R E 1 0 Flux predictions for (a) the two-dimensional truth and one-dimensional data assimilation (2D) case with the worst performance with respect to RMSE of water content (RMSE θ ), Nash-Sutcliffe efficiency (NSE j ), and relative cumulative flux error (ΔJ) and (b) the same case, but with a modified data assimilation setup, where the Miller scaling factor at the surface is removed from the augmented state, which relaxes the scaling field towards values of 1 there. The top panels show the flux at three different depths. The dashed line marks the synthetic truth, which is calculated by averaging horizontally over the entire domain. The solid line is the mean of the ensemble predictions, and the pale area around it shows the standard deviation of the ensemble. The bottom panels show the residuals between synthetic truth and ensemble mean this modified example, this resulted in a scaling factor of ξ = 0.58 at the surface. Figure 9b shows that the water content predictions of this modified estimation changed structurally. Now the largest errors occur during the evaporation phases and extend over longer time periods. The RMSE θ increases to 0.014 because we reduced the degrees of freedom in the augmented state of the EnKF. The now larger Miller scaling factor at the surface deviates stronger from the true scaling, which worsens the water content predictions directly below. However, the larger errors also lead to an increased uncertainty in the predictions. Figure 10b shows that the flux predictions of the modified estimation improved strongly. The residuals for the flux at the surface are very small. The larger Miller scaling factor at the surface keeps the material properties closer to the reference material properties, which are more representative for the average evaporation of the heterogeneous field. As a consequence, the fluxes at the depths of 50 and 100 cm improve strongly as well. Nevertheless, relevant errors remain. The NSE j for the flux at 100 cm only increases to 0.64, and the cumulative flux is still underestimated by 14%. The errors mainly stem from the inexact predictions of arrival time and maximum flux of the infiltration fronts.
We also achieve a similar predictive performance, if we use the true flux across the upper boundary condition of the synthetic truth and use this flux as the boundary condition in the data assimilation phase, as well as the verification phase. In this case, the description of the water content also worsens to an RMSE θ of 0.014, whereas the flux predictions improve to an NSE j of 0.59 and a cumulative flux error of 12% for the depth of 100 cm.
These examples demonstrate that it is imperative to describe the evaporation accurately. Then, the onedimensional representation of the two-dimensional synthetic truth can predict cumulative fluxes adequately. However, this may lead to a worse description of local water content measurements and therefore requires additional information. Contrary to a real-world experiment, we could safely omit the estimation of the Miller scaling factor at the surface because we knew that the reference material properties do not change. Alternatively, we were able to use the true evaporative flux of the synthetic truth, which is also difficult in real-world cases. More importantly, this causes unphysical situations, when the soil is locally unable to provide the average evaporative flux.
In Figure 11, we take a closer look at the effect of the correct prediction of the evaporation. It shows the cumulative flux error at a depth of 100 cm (not the relative cumulative flux error ΔJ) over the cumulative flux error at the surface for all scenarios and cases. Note that the error at the surface only stems from the evaporation, F I G U R E 1 1 Comparison of cumulative flux error at the surface and at 100-cm depth for one-dimensional truth and onedimensional data assimilation (1D) and two-dimensional truth and one-dimensional data assimilation (2D) cases for all eight scenarios. The 2D cases are shown in dark circles, and the 1D cases are shown in pale squares. The black line shows the identity function since the true flux is used during the infiltration. The cumulative flux errors show an almost perfect identity relation. Due to the mass balance a relation between the cumulative flux errors at different depths is not surprising. However, it holds in these examples even on the rather short timescales and with a more negative cumulative flux at 100 cm than at the surface (on average over all 80 experiments, ∼19 mm). This emphasizes the relevance of a correct estimation of the evaporation. It also shows that the average change of water content in the two-dimensional reality could be predicted well by the one-dimensional representations. This indicates that the EnKF was able to find an effective description of the soil water characteristics of the two-dimensional reality based on information from sparse local measurements only.

Effects of two-dimensional flow
We further investigate the effect of two-dimensional flow on the predictions with another example. We choose the case with the largest occurring AHFR value of 0.33 because we expect to see the strongest effects there. The 1D case yields RMSE θ = 0.0058 over all measurement locations, a relative cumulative flux error ΔJ = 0.03, and NSE j = 0.93 at the depth of 100 cm, whereas the 2D case has RMSE θ = 0.0078, ΔJ = 0.02, and NSE j = 0.27. The corresponding predictions of water content and flux are shown in Figures 12 and 13. Errors in both water content and flux predictions are larger than in the first example case but show the same overall behavior. As indicated by the summary measures, the errors in the 2D case are larger than in the 1D case. However, we particularly note the large error at the lowest measurement location in the 1D case, which stems from the reduced representation of heterogeneity. In the 2D case, the largest errors occur because the water contents during the first infiltration event are overestimated and the arrival time of the last event is predicted too early. It demonstrates that prediction errors depend on the specific configuration and that we cannot conclude general statements about expected errors. The flux errors in the 2D case are larger than in the 1D case, but the cumulative flux as well as the evaporation are predicted well in both cases.
We conclude again that the reduced one-dimensional representation predicts the soil water dynamics sufficiently well if exact arrival times and magnitudes of infiltration fronts are not relevant and if the evaporation is predicted accurately. However, it is not clear if this would hold for further processes such as solute transport, which are based on the description of the water dynamics. There, the actual flow paths may have a larger impact and the reduced description may become insufficient for predictions.
For the same example, we investigate the effect of the reduced one-dimensional representation on the estimated material properties. Figures 14a and 14b show the true and estimated soil water characteristics and hydraulic conductivity functions at the four measurement locations. To compensate the nonrepresented heterogeneity, these estimated effective material properties deviate considerably from the true material properties in both the 1D and 2D case. However, the deviations still remain smaller than the variations between all measurement locations. Although the deviations do not show a clear direction in the 1D case, the deviations in the 2D case show a consistent bias. For the same water content, matric potentials are more negative and the hydraulic conductivities are larger. Figure 14c shows the true scaling field in the middle of the domain, where the measurements are located, together with the reduced representations based on the estimation of Miller scaling factors at the surface and the measurement locations only. As expected, the reduced representations are unable to reproduce the true field. Only close to the surface, where measurements are denser, is the structure better resolved. For lower measurement densities, the chosen interpolation relaxes towards scaling factors of 1. Note that the scaling factors cannot be compared directly, since the material properties are also altered by the reference material properties, which are estimated as well. Therefore, we compare these resulting material properties at the measurement locations only.
The differences between true material properties and estimated effective material properties, as well as the differences between the 1D and 2D cases, highlight that estimated material properties based on local information alone (e.g., through taking soil cores) cannot be used to predict the soil water movement if the soil is heterogeneous. This is even true at the measurement locations, where we did allow local adjustments of the material properties. Therefore, material properties must be estimated inversely. The true local material properties would yield a suboptimal prediction, unless the heterogeneity can be resolved entirely.

CONCLUSIONS
In this study, we quantitatively explored the effect of unrepresented small-scale heterogeneity on prediction uncertainties for soil water movement, when estimating effective soil hydraulic material properties based on sparse local water content measurements. For this, we generated a set of synthetic experiments, where the synthetic truths feature one-and two-dimensional small-scale heterogeneity described through Miller-similar media. The estimations of the effective hydraulic material properties are based on four local water content measurements in a single vertical profile only. The effective description uses a reduced representation of the vertical heterogeneity through a reference material but allows local deviations at the measurement locations through Miller scaling to consider the local deviations of the measurements. We used a data assimilation method, the EnKF, for the estimation and investigated the predictive capabilities with respect to soil water content, as well as fluxes of the reduced effective representation.
For the synthetic experiments in this study, the estimated reduced representation proved to be sufficient for water content and flux predictions in most cases. In detail, the uncertainty in the water content predictions rises with increasing heterogeneity but remains sufficiently low. Still, these uncertainties are underestimated with the EnKF. Flux errors are larger and more relevant. They mainly stem from inexact prediction of arrival times and magnitudes of infiltration events.
The limitation of the reduced description lies in the capability to predict evaporation accurately, which was not possible in few cases. This capability of the reduced representation depends strongly on the location of the measurements within the two-dimensional heterogeneous field. In the case of strong local heterogeneity at the surface above the measurement profile, the estimation based on local water content measurements becomes more difficult due to the highly nonlinear material properties. Resulting cumulative errors of the evaporation translated directly into errors of the cumulative flux below the soil profile, even on the short time scales of the synthetic examples in this study. We demonstrated on one example that in such a particular case, additional information about the evaporation or the structure of the heterogeneity is necessary to estimate effective material properties, which allow the description of the cumulative fluxes as well. Therefore, if the evaporation is represented accurately and if the exact timing of infiltration fronts and flux magnitudes are not important, the estimation and description of the full heterogeneity are not necessary and can be achieved based on few local measurements.
Even though the estimated effective soil hydraulic material properties are allowed to adjust locally to describe the water content measurements, they differ strongly from the true material properties to compensate the reduced representation of the heterogeneity. This highlights the relevance of in situ measurements for inverse estimation of material properties and demonstrates the limitations of direct determination of "accurate" material properties (e.g., based on soil samples).
As concluding remarks, we would like to point out that we limited the description of the synthetic truth to up to two dimensions and did not include a three-dimensional truth. We expect that this would only lead to a quantitative modification of our findings, not to a qualitative one. Finally, we would like to emphasize that our results only apply to the soil water dynamics. Driving other processes like solute transport or chemical and microbial transformations with the reduced soil water dynamics cannot be expected to yield a faithful representation.

C O N F L I C T O F I N T E R E S T
The authors declare no conflict of interest.

A U T H O R C O N T R I B U T I O N S
HH Bauser designed, implemented, performed and analyzed the presented study with contributions from L Riedel. L Riedel and D Berg provided computational software. All authors participated in continuous discussions. HH Bauser and L Riedel prepared the manuscript with contributions from all authors.

A C K N O W L E D G M E N T S
We thank editor Sjoerd Van Der Zee and two anonymous reviewers for their comments, which helped to improve this paper. H.H. Bauser was funded by the Deutsche Forschungsgemeinschaft (DFG) through Project BA 6635/1-1. L. Riedel was funded by the Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg (Az 33-7533.-30-20/6/2). The Biosphere 2 facility and the capital required to conceive and construct LEO were provided through a charitable donation from the Philecology Foundation, and its founder, Mr. Edward Bass. We gratefully acknowledge that charitable donation. Additional funding support was provided by the Water, Environmental, and Energy Solutions (WEES) initiative at the University of Arizona and by the Office of the Vice President of Research at the University of Arizona. These funding sources are gratefully acknowledged.