3D mapping of soil organic carbon content and soil moisture with multiple geophysical sensors and machine learning

Soil organic C (SOC) and soil moisture (SM) affect the agricultural productivity of soils. For sustainable food production, knowledge of the horizontal as well as vertical variability of SOC and SM at field scale is crucial. Machine learning models using depth‐related data from multiple electromagnetic induction (EMI) sensors and a gamma‐ray spectrometer can provide insights into this variability of SOC and SM. In this work, we applied weighted conditioned Latin hypercube sampling to calculate 25 representative soil profile locations based on geophysical measurements on the surveyed agricultural field, for sampling and modeling. Ten additional random profiles were used for independent model validation. Soil samples were taken from four equal depth increments of 15 cm each. These were used to approximate polynomial and exponential functions to reproduce the vertical trends of SOC and SM as soil depth functions. We modeled the function coefficients of the soil depth functions spatially with Cubist and random forests with the geophysical measurements as environmental covariates. The spatial prediction of the depth functions provides three‐dimensional (3D) maps of the field scale. The main findings are (a) the 3D models of SOC and SM had low errors; (b) the polynomial function provided better results than the exponential function, as the vertical trends of SOC and SM did not decrease uniformly; and (c) the spatial prediction of SOC and SM with Cubist provided slightly lower error than with random forests. Hence, we recommend modeling the second‐degree polynomial with Cubist for 3D prediction of SOC and SM at field scale.

knowledge about soil properties and conditions that are relevant indicators for more efficient and effective agriculture. As one of the key soil properties, soil organic C content (SOC) is relevant for soil quality and fertility, as it influences the soil's nutrient availability and structural stability (Dexter et al., 2008). In combination with soil texture, SOC affects the soil water-holding capacity, plant available water, and soil moisture (SM; Rawls, Pachepsky, Ritchie, Sobecki, & Bloodworth, 2003). Understanding the spatial as well as vertical variability of SOC and SM is essential for plant cultivation, which requires fertile soils and sufficient water. Digital soil mapping (DSM) can provide high-resolution information for sustainable agricultural management to facilitate food production on the field and farm scale through the spatial prediction of physical and chemical soil properties (Govers, Merckx, van Wesemael, & van Oost, 2017), such as SOC and SM. This facilitates an improvement in the decision-making processes for fertilization, irrigation, and liming, among others, and subsequently higher productivity of food and biofuels (McBratney, Whelan, Ancev, & Bouma, 2005).
However, soil properties vary in the horizontal as well as in the vertical domain. Hengl et al. (2014) and Viscarra Rossel et al. (2015) mapped soil properties in multiple depths, which can be interpreted three-dimensionally, but do not actually provide continuous three-dimensional (3D) information (Liu et al., 2016). To fully grasp the continuous character of soil, the third dimension should be included in the analysis as continuous entity (e.g., by incorporating mathematical functions that represent the vertical distribution of soil properties, so-called soil depth functions; Aldana Jague et al., 2016;Minasny, McBratney, Mendonça-Santos, Odeh, & Guyon, 2006;Rentschler et al., 2019;Veronesi, Corstanje, & Mayr, 2014). The spatial distribution of soil depth functions is related to the spatial distribution of environmental covariates given by the soil forming equation (Jenny, 1941;McBratney, Mendonça Santos, & Minasny, 2003): = ( , , , , , , ) where S is the soil or any soil information we would like to explain, s stands for other available soil properties at a location, c is climate, o is organisms, the factor r is the terrain, p is the parent material, a is age, and n is the spatial position. The function coefficients of the soil depth functions can be treated as an abstract soil property and therefore modeled and predicted spatially based on comprehensive sets of environmental covariates delineated from digital elevation models representing terrain (Aldana Jague et al., 2016;Rentschler et al., 2019;Veronesi et al., 2014), provided by land cover maps representing

Core Ideas
• Multi-depth EC a and gamma-ray spectrometry describe vertical trends of SOC and soil moisture. • Machine learning models can predict vertical trends of SOC and soil moisture spatially. • Cubist models of polynomial depth functions provide accurate 3D maps at field scale.
organisms Veronesi et al., 2014) and gamma-ray sensing data as indicators for soil-forming minerals (Aldana Jague et al., 2016;Cook, Corner, Groves, & Grealish, 1996;. Besides gammaray spectrometry, hydrogeophysical methods, such as electromagnetic induction (EMI) sensors, provide a widely used base in DSM in general (Binley et al., 2015;Cassiani et al., 2012;Martini et al., 2017), as well as in 3D soil mapping specifically (Moghadas, Taghizadeh-Mehrjardi, & Triantafilis, 2016), complementing field sampling by generating high-resolution spatial geophysical covariates. In particular, geophysical sensing technologies and measurements are urgently needed at the field scale, as the distribution of soil properties (S) in the vertical domain are decreasingly linked to terrain (r) and climate (c) variability but more prone to variations in weathering, mineralogy (as parts of s and p), biological activities, as well as past anthropogenic influences (o and a; Jobbágy & Jackson, 2000;Rentschler et al., 2019). Apparent electrical conductivity (EC a ) from EMI sensors and dose rate, 40 K, 238 U, and 232 Th contents from gamma-ray spectrometers are covariates that are closely linked to numerous soil properties, such as texture, horizonation, bulk density, SOC, and SM in the case of EMI (Cho, Sudduth, & Chung, 2016;Doolittle & Brevik, 2014;Martini et al., 2017), and in the case of gamma-ray spectrometry texture and SOC. Thus, gamma-ray spectrometry and EMI are used either individually or combined (Castrignanò, Wong, Stelluti, Benedetto, & Sollitto, 2012) as a proxy to the mineralogy of the parent material and other soil properties developed or inherited from the parent material (Cook et al., 1996;Jenny, 1941;McBratney et al., 2003). The geophysical measurements are interpolated with geostatistical methods like kriging (Krige, 1951) to obtain spatial information of EC a , 40 K, 238 U, and 232 Th covering the whole field with high spatial resolution (Abdu, Robinson, Seyfried, & Jones, 2008;Schmidt et al., 2014). The interpolations of the geophysical measurements constitute the covariate space of the agricultural field, which is utilized in crucial modules of DSM: 1. The spatial data of 40 K, 238 U, and 232 Th and multidepth EC a are used to calculate the locations of a representative sampling scheme for soil sampling. The aim is to fully cover potential soil variability that influences the modeled soil property, and that is found on the field at the time of measurement. For that, many approaches use conditioned Latin hypercube sampling (cLHS) or extensions like weighted extreme cLHS Schmidt et al., 2014). The cLHS is a stratified random sampling design that provides an optimal stratification of a covariate space with a reduced number of spatially distinct sample sites ). 2. The sampled point-wise soil data (i.e., SOC and SM in this work) is linked to the geophysical measurements at the soil profile locations with linear or nonlinear machine learning models (Aldana Jague et al., 2016;Rentschler et al., 2019;Schmidt et al., 2014). The dependent variable of the models are the samples measured at the locations introduced above, and the independent variables are the interpolated geophysical measurements at these locations. 3. The model trained with sampled soil data and geophysical measurements is used to make predictions to the locations of the covariate space where no soil samples were taken. 4. The model predictions are validated with additional soil samples. Ideally, the sampling scheme used for validation is independent from the scheme used for model training, which can be achieved with randomly distributed samples (Brus, Kempen, & Heuvelink, 2011;Steyerberg & Harrell, 2016).
Based on the modules of DSM described above and the potential to measure 3D soil properties provided by geophysical measurements (other available soil properties s in the soil forming equation), we assume that EMI and gamma-ray spectroscopy are highly suitable for the spatial prediction of soil depth functions, as the combination of multi-coil EMI and gamma-ray spectrometry provides multiple penetration depths and a different sensitivity to soil parameters (Dierke & Werban, 2013;McNeill, 1980aMcNeill, , 1980b. To our knowledge, there are no studies on spatial prediction of soil depth functions with EMI and gammaray sensing data as environmental covariates.
The major objective of this study was the prediction of SOC and SM in 3D using soil depth functions based on EMI data from sensors with 12 different penetration depths and gamma-ray spectrometry by capturing the response of the parent material and overlaying soil. For model training, we used Cubist and random forests, two machine learning methods often used in DSM. The hypothesis is that for 3D modeling of SOC and SM, data from EMI and gamma-ray spectrometry will achieve low errors throughout the sampled depth increments, due to the different depth penetration of the sensors.

Study site
The study site is an agricultural field of 58 ha ≈70 km north of Leipzig, Saxony, Germany ( Figure 1). The field is located on the Elbe flood plain and bordered by the creeks Altes Flieth and Fließgraben. There is no visible terrain variation in the field. Present soil types are Gleysols and Gleyic Cambisols consisting of alluvial loam (loam and clay) over Holocene sediments of fluvial sand (LAGB, 2014). At the time of sampling in August 2017, the cultivated wheat (Triticum aestivum L.) had been harvested, and the field was bare.

Methodological overview
The workflow consisted of seven individual steps ( Figure 2). First, the geophysical measurements were taken with EMI and gamma-ray spectrometry and, subsequently, interpolated geostatistically with ordinary kriging (Krige, 1951) to receive spatial predictions of the environmental covariates. These covariates were used to calculate representative sampling locations with an extension of cLHS  and served as independent variables for the soil depth function modeling. In the next step, these models were applied for spatial prediction of the soil depth functions with the independent variables and validated independently in the final step. The subsections below describe this workflow in detail.

Geophysical measurements and interpolation
The geophysical measurements were recorded with two EMI sensors (CMD-Explorer and CMD-Mini-Explorer, both GF Instruments) and a gamma-ray spectrometer (GS CAR, GF Instruments) in August 2016. The EMI sensors measure the apparent electric conductivity (EC a in mS m −1 ). The penetration depth of the magnetic field is mainly controlled by the intercoil spacing and the orientation of the dipoles, as well as the applied frequency. The penetration depth and footprint of the sensor data increase with increasing intercoil spacing. Vertically oriented magnetic dipoles (VDP/coil axis horizontal coplanar [HCP]) provide a higher depth penetration than horizontally oriented magnetic dipoles (HDP/coil axis vertical coplanar F I G U R E 1 Location of the field Wesen near Selbitz, Saxony-Anhalt, Germany, sampling scheme of the geophysical measurements with electromagnetic induction (EMI) and gamma-ray spectrometry, and sampled soil profiles (circle, weighted conditioned Latin hypercube sampling with extremes [wecLHS] samples for calibration, cross, random samples for validation). The signal of the CMD-Explorer with 4.49-m intercoil spacing and vertical dipole orientation (VDP) was noisy due to a grid gas pipe, and therefore measurements of all sensors were omitted in the marked area [VCP]) while taking into account the different cumulative sensitivity functions of both orientations (Callegary, Ferre, & Groom, 2012;Martini et al., 2017;McNeill, 1980b;von Hebel et al., 2019). The CMD-Explorer and the CMD-Mini-Explorer enable simultaneous multi-depth exploration of EC a with either VDP or HDP. The instruments have one transmitter and three receiver coils with different intercoil spacings covering six effective penetration depths, which is defined by the manufacturer (GF Instruments) as the depth above which 70% of the signal comes from (Table 1). This multi-sensor setup measuring EC a and penetration depths of up to 6.7 m enables the detection of textural patterns of the spatially variable subsurface sediments of the Elbe floodplain indirectly (Doolittle & Brevik, 2014). This is important for SM modeling to account for subsurface sediment structures, as gravel lenses with high TA B L E 1 Intercoil spacings and effective penetration depth for vertical and horizontal coil orientation for the used electromagnetic induction (EMI) sensors CMD-Mini-Explorer and CMD-Explorer  (Abdu et al., 2008). The sledge-mounted devices (height of CMD-Explorer 80 cm, height of CMD-Mini-Explorer 10 cm) were towed by a four-wheel vehicle at <10 km h −1 , crossing the field in multiple parallel (track distance 27 m) and a few crossed transects. By using overlapping measurements collected at different time from crossing the field in the end, drifts in the data were assessed and a linear drift function was applied to correct the data (Martini et al., 2017). Before interpolation these quality control lines, as well as outliers related to a gas pipe line, were removed ( Figure 1). Within the dataset, negative values of EC a occurred due to the custom calibration of the instrument (von Hebel et al., 2019). We corrected the measurements with an offset of 3.44 mS m −1 (CMD-Mini-Explorer VDP 0.32 m), 4.86 mS m −1 (CMD-Mini-Explorer HDP 0.32 m), and 0.21 mS m −1 (CMD-Mini-Explorer HDP 0.71 cm) to avoid confusion with these values and to make use of the containing information on spatial variability. Smoothing of the data was not necessary due to low noise conditions. All EMI sensors captured five records per second in any dipole orientation. We refrained from inverting EMI data, since the reliability of the required calibration procedure is limited due to a number of fundamental issues that are not solved yet (Martini et al., 2017).
The bulk (≈90%) of aboveground measured gamma radiation is emitted in the top 30-50 cm of soil (Cook et al., 1996). We used a gamma-ray spectrometer with a 4 l NaI(Tl)-crystal and automatic peak-stabilization to measure the concentration of 40 K, 238 U, and 232 Th. The device has 512 channels with an energy range from 100 keV to 3 MeV. Measurements were captured every 5 s. The 40 K, 238 U, and 232 Th were measured as counts per second. The concentration of 40 K (in %) and 238 U and 232 Th (both in μg g −1 , where μg g −1 = 1 ppm) was calculated corresponding to the decay rate at specific energy levels. The concentration of 40 K, 238 U, and 232 Th was used to calculate the dose rate (Gy h −1 ; IAEA, 2003).
The geophysical measurements served as a basis for the sampling design ( Figure 1) and were interpolated to a grid cell size of 5 m with ordinary kriging (Krige, 1951) using individual exponential semivariogram functions for each dataset in the gstat package version 1.1-5 (Pebesma, 2004) in R version 3.4.3 (R Development Core Team, 2017). Beforehand, measurements within 1-m range were averaged. Noisy measurements along a straight line were detected for the CMD-Explorer with the higher depth penetrations, caused by an underground grid gas pipe ( Figure 1). For reasons of continuity, all measurements from the EMI sensors and the gamma-ray spectrometer in this area were excluded from further processing. This crucial step is to be evaluated carefully, since all consecutive steps strongly depend on accurate environmental covariates. Therefore, error of the kriging predictions was assessed with a 10-fold cross-validation, which is an outof-sample testing method to assess the ability of the model to generalize to independent data subsets. For that, the dataset is partitioned in 10 folds of nearly equal size, where nine folds are used to train a model and tested with the 10th fold. This is done 10 times to test all folds, and the quality measure is the average of all models. For the final model, all folds are used.

Soil sampling
For the estimation of the number of soil profiles to sample, the areas under curve for the empirical cumulative distribution functions were calculated for the proposed sampling set sizes of n = 10, 20, 25, 30, 35, and 40 and the geophysical measurements with the MESS package version 0.5.0 (Ekstrøm, 2018) in R. The mean of the differences between the areas under curve indicated the error for each sample set size and the sample set size with the lowest error is chosen Schmidt et al., 2014). In this case, the optimal sample set size with the lowest error was 35. However, due to costs and feasibility constraints, we agreed on a sample set size of 25 for the representative sampling design as tradeoff between feasibility and a slight increase in model error. Spatial soil modeling requires specific sampling schemes or designs for sampling and validation (Brus et al., 2011;Schmidt et al., 2014). The aim of a sampling design is to cover the full range of potential driving factors that influence the modeled soil property and that are found on the field at the time of measurement while reducing soil sampling effort and analytical costs. Therefore, we calculated the locations of the soil profiles to sample for model training with a weighted cLHS with extreme values (wecLHS; Schmidt et al., 2014) based on the geophysical covariates with the lowest error in cross-validation of each sensor, to obtain representative sampling locations. The wecLHS extends the cLHS  by including samples from the extrema of the used covariate space to cover the full range of data. Further, a weighting scheme according to the explained variation (R 2 ) of the kriging predictions is implemented to account for noise in the interpolation . The wecLHS design was calculated with 150,000 iterations. The settings were a temperature decrease of 0.95, an initial temperature of 1, optimization every 10 steps, and an initial Metropolis value of 1.
Additionally, we sampled 10 fully randomly distributed profile locations for independent model validation. We chose a fully random sampling design for validation, since wecLHS is a stratified random sampling design and for independent validation a nonstratified sampling strategy is recommended. Further, no assumptions regarding the standard error of the estimated quality measures are required (Brus et al., 2011;Steyerberg & Harrell, 2016). The locations of the sampled profiles are displayed in Figure 1.
The soil profiles were sampled from four equal depth increments to 60-cm depth (0-15, 15-30, 30-45, 45-60 cm) with a hand auger on 2 d with the same weather conditions in August 2017 again after harvest under similar field conditions as during sensing. Sixty centimeters is the depth above which ≈80% of the roots of many agricultural crops are found (Fan, McConkey, Wang, & Janzen, 2016). Samples were taken for each depth increment as mixed subsamples from two corners and the center of 1 m 2 , resulting in 100 samples for the training set and 40 samples for the validation set. The positions of the profiles were located with a differential GPS (Leica TPS1200+, Leica Geosystems).

Laboratory analysis
For SOC determination, the samples were dried at 40 • C for 24 h, sieved (<2 mm), and ground and root fragments were removed. Total C was determined with dry combustion using an ELTRA CHS-580A Helios analyzer (ELTRA).
Although LAGB (2014) states that soils in the flood plains of the Elbe river are mostly free of carbonates, pH of the samples ranged from 5.2 to 7.2. Consequently, inorganic C was determined gravimetrically with 10 % HCl solution.
Then, SOC was determined as the difference between total C and inorganic C. Soil moisture was measured gravimetrically with drying at 90 • C for 24 h. A summary of the training and validation sets for SOC and SM is shown in Figure 3. Training and validation sets were similar. The rather small differences are due to the small sampling set size of the validation set and its sensitivity to extreme values because of its random and nonstratifying nature.

Soil depth functions and 3D predictions
where c 0 is the intercept with the y axis, thus the SOC and SM at the surface, and c 1 and c 2 are dimensionless coefficients.
The functions coefficients c 0 , c 1 , and c 2 were modeled and predicted for the whole study site based on the F I G U R E 3 Summary of the training and validation sets for soil organic C (SOC) and soil moisture (SM) as boxplots and the respective polynomial soil depth functions. The boxplots show the variation of the samples within the sampled depth increments. Training and validation sets have a similar range at each sampled depth increment and the validation set is suitable for model evaluation. The polynomial soil depth functions show the vertical distribution at the 25 profiles of the training set geophysical data of the EMI sensor and gamma-ray spectrometer with Cubist and random forests. After modeling and spatial prediction of the coefficients of the soil depth functions, the respective function can be solved at every grid location of the study site, and SOC and SM can be calculated with any vertical resolution (Aldana Jague et al., 2016;Liu et al., 2016;Veronesi et al., 2014). However, vertical resolution is limited by the vertical sampling of each profile that reflects the vertical variation within each profile. In this work, the soil depth functions were solved from 0 to 60 cm with a vertical resolution of 5 cm. The main advantage of this approach is that solving the soil depth functions provides data points that represent a three-dimensional entity (voxels) of the response variables instead of two-dimensional pixels. The voxels were stored in an array with the dimensions of the study area in the horizontal domain (Rentschler et al., 2019). A workflow diagram is given in Figure 2.

Supervised machine learning
In many applications in DSM, supervised machine learning is used to train regression models with numeric values, such as the coefficients of soil depth functions. In such models, the function coefficients at each sampling location are the dependent variable (soil property S in the soil forming equation) and the geophysical measurements compose the covariate space of independent variables (i.e., EC a in the measured depth intervals, 40 K, 238 U, and 232 Th) at the location of each sample for regression: = ( EC a , 40 K, 238 U, 232 Th ) Subsequently, the model can predict the dependent variable for each grid location at the field, since the independent variables were measured and interpolated onto the whole study area. Common supervised machine learning methods in DSM are Cubist and random forests.
Cubist uses a robust system called M5 model tree, which was established by Quinlan (1992). It applies a recursive partitioning of data to build a piecewise linear model as a decision tree, where the terminal nodes are linear models. When growing the tree, intra-subset variation is minimized at each split. A leaf of such a tree applied on continuous data contains a linear model connecting the values of the training cases to their attribute values. The procedure is based on building and applying rules. The rules generate subsets of the data according to similar characteristics of predictor and response variables. The rules are structured as if (condition is true), then (regress), else (apply next rule), comprising single or multiple predictor variables. With the rules that fulfil the conditions, soil properties are predicted by ordinary least-squares regression. If the rule does not apply, a new rule is processed for the next node of the tree within an iterative process. These rule sets are appropriate for model interpretation (Quinlan, 1992(Quinlan, , 1993. Random forests were developed by Breiman (2001) as an ensemble of classification and regression trees (Breiman, Friedman, Stone, & Olshen, 1984). Binary splits are used for a single decision tree to homogenize the predictor variables according to the dependent variable, thus minimizing the node impurity. Random forests use a bootstrap approach, where random predictor variables are chosen at each split of a tree. The final regression model results from averaging all decision tree outputs (Breiman, 2001). Random forests are robust against overfitting and interpretable with the feature importance calculated from its randomly permutated trees (Breiman, 2001). However, this is beyond the scope of this study as it requires more detailed analysis of the depth functions as dependent variables.
The tuning parameters for the machine learning methods used were the number of subsequently adjusted trees committing to the final decision tree (committees) and the number of neighboring samples from the training set to adjust the model prediction (neighbors) for Cubist. The number of randomly selected covariates at each split (mtry) was used for tuning of random forests. The number of trees (ntree) and the node size of random forests were set to default as this is not necessary when a large number of trees is computational manageable (Probst & Boulesteix, 2018). To find the best tuning parameters for the models, a grid search (Schmidt, Behrens, & Scholten, 2008) with a 10 times repeated 10-fold crossvalidation was applied. The final models were calibrated with the tuning parameters of the models with the lowest RMSE.

Model validation
The 3D predictions are validated independently with the coefficient of determination (R 2 , Equation 3) as measure of correlation between the observed and predicted values, Lin's concordance correlation coefficient (CCC, Equation 4), the RMSE (Equation 5), and the normalized RMSE (nRMSE, Equation 6) as measures of error with the 10 random profiles of four samples each taken at 0-15, 15-30, 30-45, and 45-60 cm. The CCC is a measure of concordance of the model predictions and the measured values on the 1:1 line from the origin. The RMSE is a measure of error, which allows to compare models with observed values of the same magnitude. Since the response of SOC and SM has observed values of different range, the nRMSE is required to compare the models for each depth increment.
The equations for R 2 , the CCC, the RMSE, and nRMSE are where y andŷ are the observed and predicted values, μ y and μˆy denote the means for the observed and predicted values, respectively, ρ is the correlation coefficient (Pearson's r), σ y and σˆy are the corresponding variances, y max is the maximum of the observed values, and y min is the minimum of the observed values.

Geophysical measurements and interpolation
The results of the 10-fold cross-validation for the geostatistical interpolation showed a high coefficient of determination (R 2 ) between the observed and the predicted values for all EMI sensors (R 2 > .96) and low errors (nRMSE ≤ 0.08). The predictions for the CMD-Mini-Explorer with intercoil spacings of 0.71 and 1.18 m and VDPs had the highest coefficient of determination (R 2 = .99 and nRMSE = 0.01), and the predictions for the CMD-Explorer with an intercoil spacing of 4.49 m and horizontally oriented dipole had the lowest coefficient of determination (R 2 = .96 and nRMSE = 0.04).
The cross-validation results of the gamma-ray spectrometer showed coefficients of determination ranging from .75 ( 238 U) up to .92 (dose rate). The lower R 2 compared with EMI sensor interpolation is to be expected due to the noise prone passive nature of statistical counting gamma decays. The errors of the interpolation range from 0.05 (dose rate) to 0.07 ( 238 U). All results of the 10-fold cross-validation are shown in Table 2. TA B L E 2 Results of the 10-fold cross-validation of the interpolation with ordinary kriging. The sensors and sensor setups we used for the weighted conditioned Latin hypercube sampling with extremes (wecLHS) sampling design are highlighted in bold The spatial variation of the measured EC a varies between the sensors and sensor orientation.  Figure 4d). The measures with the CMD-Mini-Explorer in the same orientation (VDP and HDP) showed considerable changes in EC a (Figure 4a-4c, 4d-4f) with increasing intercoil spacing, whereas the measurements with CMD-Explorer were more alike (Figure 4g-4l). However, we used all EMI measures as independent variables, since the similar depths of investigation may contain varying information while using different coil orientation resulting in different shapes of their sensitivity functions. The interpolations of 40 K, 232 Th, 238 U, and dose rate ( Figure 5) showed different spatial trends with some shared local minima in the center and the southeast and a shared local maximum in the west of the field. All results of the interpolations are visualized in Figures 4 and 5.
Given the low error (nRMSE ≤ 8%) of the interpolation for the EMI data and for gamma-ray measurements, the environmental covariates interpolated with ordinary kriging represented the spatial distribution of EC a at multiple depths, 232 Th, 238 U, and dose rate adequately. Thus, they were suitable to calculate a wecLHS sampling design, as well as for modeling and mapping of SOC and SM in the horizontal and vertical domain. The covariates with low cross-validation error and high coverage of the covariate space used for wecLHS were dose rate for the gamma-ray spectrometer and CMD-Mini-Explorer with VDP (0.32 m intercoil spacing), CMD-Mini-Explorer with HDP (0.32 m) and CMD-Explorer with VDP (4.49 m) for the EMI sensors.

Soil depth functions
The fitted soil depth functions showed the vertical trend of SOC and SM at the profile locations ( Figure 3). For both polynomial and exponential functions at each profile, the R 2 and RMSE values of the soil depth functions were calculated. The soil depth function with the highest R 2 for SOC and SM was the polynomial function (Equation 1) with a mean R 2 of .98 for SOC and 0.92 for SM (RMSE = 0.14 and 0.00). The exponential soil depth function had lower mean R 2 and a higher error for both SOC and SM. A summary of the evaluation of the soil depth functions is shown in Table 3.
The minimum values in R 2 and the high standard deviation of the soil depth functions for SM showed that the soil depth functions could not depict the vertical trend in some soil profiles (Table 3). This is the case for Profiles 2 and 10, where the SM was 0.22, 0.20, 0.23, and

3D predictions
The independent validation with the validation set of 10 randomly located profiles with the same sampled depth  increments showed a high overall explained variation and low errors for both polynomial and exponential functions and for both Cubist and random forests. The R 2 of Cubist models for SOC and for SM was .86 and .88 with the polynomial function and .86 and .87 with the exponential function (Table 4). Random forest models showed a similar R 2 except for SM with the polynomial function (R 2 = .84). The CCC of all Cubist models was slightly higher than the CCC of the random forests models. Cubist models had a CCC of .91 for SOC and .91 for SM with polynomial function. The CCC for SM with exponential function was slightly higher (CCC = .93). The error of all four models for SM was identical (RMSE = 0.02). For SOC, the error of the Cubist and random forests models with polynomial function was about 0.02-0.03% lower than the error of models with exponential function. The difference in RMSE between Cubist and random forests was <4% (Table 4). Because of the higher R 2 and CCC and the lower error of the random forest model with the polynomial function, this was chosen for SOC modeling. Cubist in combination with the polynomial function was chosen for the final SM predictions. However, it is worth noting that the differences in error, as well as R 2 and CCC of all models, were rather small.

TA B L E 4
Results of the independent model validation with the explained variation as coefficient of determination (R 2 ), Lin's concordance correlation coefficient (CCC), and RMSE (in %) of the polynomial and exponential soil depth functions for soil organic C (SOC) and soil moisture (SM Comparing the model coefficient of determination, CCC and RMSE by individual depth increment showed a similar pattern. Cubist models with polynomial function for SOC had a higher R 2 and CCC and a lower RMSE than Cubist with exponential function and random forests with both polynomial and exponential functions. For SM, the differentiation by R 2 was not clear as the R 2 varied between the used machine learning methods. However, the Cubist models with polynomial functions also showed the lowest RMSE for soil depth functions and modeled depth of the soil (Table 5).
More important is that the nRMSE of Cubist and random forest models with the polynomial function did vary least with depth in absolute values. This showed that the model could predict SOC and SM with low error throughout the sampled interval of the soil profile (Table 5). Both Cubist and random forests with the exponential function for SOC and SM had high errors (nRMSE) in the depth increment ranging from 15 to 30 cm. Therefore, we conclude that the exponential function could not depict the vertical trend of SOC and, to a lesser extent, of SM within the sampled profile. We ascribe this to the 30-cm-deep plough horizon, which needs to be accounted for with a less uniformly decreasing soil depth function. The flexibility of polynomial functions of third degree or higher is potentially capable of depicting local variations in the soil better than exponential functions. We recommend that this be investigated in more detail.

Vadose Zone Journal
F I G U R E 6 Three-dimensional (3D) predictions of soil organic C (SOC) and soil moisture (SM) with polynomial depth function and Cubist (see Supplements 1-4 for detailed animated cross sections) Further, the R 2 of the SOC models decreased from around .80 to .50 on average with increasing depth and increased from .76 to .88 with increasing depth for SM. Lin's CCC showed a similar pattern. These differences in explained variability indicate differences in explanatory power of the geophysical measurements for SOC and SM modeling for the depth intervals used in this study. On the one hand, geophysical measurements and especially EMI measurements are influenced by SM, whereas SOC content is related indirectly through SM content and influenced by other soil and environmental processes. This may not be covered by EMI and gamma-ray sensors, where covariates of the latter have little influence on depths >30 cm. On the other hand, this may refer to the decreasing range of SOC content and the increasing range of SM (Figure 3). The lower SOC content variability in depth may not be represented by the covariates. Therefore, these complex interactions need be investigated in more detail to make more precise conclusions about the use of geophysical measurements as covariates for 3D modeling with soil depth functions. The final predictions are sketched in Figure 6 and shown in detail in the Supplements 1-4.
Analyzing the interaction of spatial prediction of SOC and SM and the geophysical covariates, one can see that the highest SOC values were located in areas with high EC a values (compare Figure 1). Figure 5 shows topsoil SOC contents of up to 6% in the western and central part of the site (lower left side) and ≈3% in the south (lower right side). This range of SOC content at this particular agricultural field is similar to the range of SOC in most agricultural fields in central Europe (Tóth, Jones, & Montanarella, 2013). A similar pattern can be seen in the SM prediction. In the western part, SM is distributed uniformly, with 20-25% in the whole profile, and in the south, there is much less SM in the deeper subsoil (5%) than in the topsoil (15-20%). These patterns can also be found in the sampled soil profiles (Figure 3).
In the central part of the field, pillar-like patterns of higher SOC content values were visible. These pillars are well described and linked to old meanders of the rivulet Fließgraben or the river Elbe. In these areas, a farmer can expect better growing conditions for field crops, as SOC affects nutrient availability and SM retention. Thus, these areas need less additional fertilizer than the areas with less SOC and lower EC a . Using the proposed framework can therefore contribute to a sustainable agricultural approach (e.g., precision agriculture that applies fertilizer according to soil requirements). Threedimensional mapping is highly suitable for informing farmers, as sampling based on wecLHS is fast (only few profiles are required) and the model development based on the geophysical measurements is computationally efficient, relatively fast compared with conventional soil mapping, as well as potentially extendable to other soil properties such as pH, cation exchange capacity, and texture (Cassiani et al., 2012;Doolittle & Brevik, 2014).

CONCLUSION
In this case study, we predicted SOC and SM in the vertical as well as horizontal domain (i.e., in 3D using geophysical covariates derived from EMI and gamma-ray sensors with different intercoil spacings and thus different penetration depths and footprints of the signal). A weighted cLHS design was applied for calculation of the locations of the calibration samples. We hypothesized that the used sensor setup will lead to predictions of SOC and SM with high explained variation throughout the soil profile as well as in the spatial domain.
We showed that coefficient of determination and model error of the polynomial and exponential functions modeled and predicted with Cubist and random forests were stable over all depth increments. Thus, the data from two EMI sensors with depth-dependent sensitivity and gamma-ray spectrometry are well suited for the 3D prediction of SOC and SM despite the reduced number of samples. In general, the differences between the models' error were rather small. The differences between the used machine learning methods are smaller compared with the differences between the used soil depth functions. This demonstrates the suitability of the sampling design approach for modeling with Cubist and random forests. Therefore, we conclude that the choice between soil depth functions is more important than the choice of the machine learning method for spatial prediction, if both methods are well established in DSM. The flexible polynomial function is capable of depicting local variation, which is not limited to plow horizons but also comprises clay-enriched horizons, pH drops with decreases in CaCO 3 content, and others. We recommend the combination of second-degree polynomial soil depth function with Cubist for 3D mapping of SOC and SM with two EMI sensors and gamma-ray spectrometry covering a wide range of environmental covariates representing the horizontal and vertical domain of SOC and SM variation on the field scale. Within the scope of precision agriculture, this approach is suitable for SOC and SM estimation in similar environmental conditions, as it offers a spatial evaluation that incorporates the whole soil continuum. Thus, the 3D mapping of SOC and SM with high spatial and vertical resolution can help to optimize sustainable management strategies on the field scale with respect to fertilization, irrigation, and liming and subsequently to increase food and biofuel productivity.
For future investigation and to simplify the approach for field application, the contribution and importance of the individual sensors and sensor settings are of great interest. Since both orientations of the CMD-Explorer showed similar interpolation results, these measurements may be strongly cross-correlated and redundant. This can be evaluated and solved with the feature importance calculated within random forests, but it requires comprehensive and complex analysis of the interaction of the modeled depth function coefficients and the geophysical measurements (e.g., by using different slices of the target variable related to a specific depth increment compared to the depth sensitivity of the sensors). Further, the extension of the modeled depth may be of interest, since plants can uptake water from greater depth. This depends on the crop and also extends to forestry. For that purpose, other soil properties such as texture, water-holding capacity, permeability, horizontation, or the extension of the mod-eled depth to the depth of bedrock may be of interest. More complex soil depth functions such as polynomials of higher degree may be beneficial, when a tradeoff between soil sampling costs and model benefits is found. In our study, we successfully integrated depth-dependent EC a data; however, we refrained from inversion of the data because we did not want to introduce additional uncertainties and ambiguities into the data analysis. As recently shown by von Hebel et al. (2019), an enhanced processing chain can provide accurate and quantitative EMI data. This offers interesting possibilities to extend the presented approach by depth-true electrical conductivity values.

A C K N O W L E D G M E N T S
We thank the German Research Foundation (DFG) for supporting this research through the Collaborative Research Centre (SFB 1070) resourcecultures (Subprojects Z and S). We are particularly grateful to the Landwirtschaftsbetrieb e.G. Selbitz for the access to the field site. We further thank Marco Pohle for assistance with the field measurements.
Open access funding enabled and organized by Projekt DEAL.

C O N F L I C T O F I N T E R E S T
The authors have no conflicts of interest to declare.