Zhao Et Al.2025
Zhao Et Al.2025
Keywords:                                                  Soil Organic Carbon (SOC) is a key property for soil health. Spectral reflectance such as multispectral and
Digital soil mapping                                       hyperspectral data could provide efficient and cost-effective retrieval of SOC content. However, constrained
Soil organic carbon                                        by the availability of hyperspectral satellite data, current works mostly use a small number of spaceborne
Hyperspectral
                                                           hyperspectral imagery for SOC retrieval on a small scale. In this work, the first large-scale hyperspectral
Temporal composite
                                                           imaging reflectance composites were built, and they were used for SOC estimation. Specifically, DESIS satellite
Deep learning
DEM
                                                           images were used to predict SOC over the whole state of Bavaria in Germany (∼ 70,000 km2 ). We prepare
DESIS                                                      850 hyperspectral images from the DESIS satellite and build temporal composites from them. For the soil
                                                           data, data was gathered from LfU(Bavarian State Office for the Environment), LfL(Bavarian State Research
                                                           Center for Agriculture) and LUCAS 2018 (Land Use and Coverage Area Frame Survey). 828 soil samples were
                                                           selected after data filtering. For this regression task, different machine learning and deep learning methods
                                                           were implemented and explored. Moreover, a spectral attention mechanism was added to the model. Besides
                                                           hyperspectral input, the digital elevation model (DEM) was also included as an auxiliary input as the measured
                                                           spectrum has inter-variability dependent on the elevation and the generated topographical features are also
                                                           relevant with SOC distribution. Based on the regression results evaluated by 𝑅𝑀𝑆𝐸, 𝑅2 , and 𝑅𝑃 𝐼𝑄, the
                                                           deep learning models showed much better performance than machine learning methods. Especially when only
                                                           using hyperspectral data as input, the best result was achieved with 𝑅𝑀𝑆𝐸 1.947%, 𝑅2 0.626, and 𝑅𝑃 𝐼𝑄
                                                           1.710 on the test set. After incorporating topographical features, the fused model achieved further improved
                                                           performance with 𝑅𝑀𝑆𝐸 1.752% and 𝑅2 0.695 and 𝑅𝑃 𝐼𝑄 1.919. From the interpretability analysis for model
                                                           performance, it was found out that the bands in the range of 530 nm–570 nm, 770 nm–790 nm, and 840 nm
                                                           - 870 nm are the most relevant bands for SOC estimation. In the end, several SOC maps were generated and
                                                           analyzed together with soil types. The SOC maps indicate that water-associated areas, such as coastal soils
                                                           and bogs, tend to have higher SOC, while mountain areas tend to contain lower SOC. Such findings align with
                                                           SOC distribution across soil types and show the effectiveness of the model.
1. Introduction                                                                                 the organic carbon content, gathering soil samples through fieldwork
                                                                                                and analyzing in the lab is commonly used. However, such fieldwork
    Soil organic carbon (SOC) belongs to one of the most important                              is time-consuming and could not generate soil mapping efficiently,
soil parameters. It is of huge relevance to soil fertility, sustainable                         especially for a large area scale. With the increasing number of space-
agriculture, and crop production (Johnston et al., 2009). To derive
 ∗ Corresponding author at: Technical University of Munich (TUM), Chair of Data Science in Earth Observation, Arcisstraße 21, 80333 Munich, Germany.
    E-mail addresses: xiangyu.zhao@tum.de (X. Zhao), zhitong.xiong@tum.de (Z. Xiong), paul.karlshoefer@dlr.de (P. Karlshöfer), ntziolas@ufl.edu (N. Tziolas),
Martin.Wiesmeier@lfl.bayern.de (M. Wiesmeier), uta.heiden@dlr.de (U. Heiden), xiaoxiang.zhu@tum.de (X.X. Zhu).
https://doi.org/10.1016/j.jag.2025.104504
Received 9 January 2025; Received in revised form 6 March 2025; Accepted 24 March 2025
1569-8432/© 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
X. Zhao et al.                                                                     International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
borne satellites such as Sentinel-2, DESIS, PRISMA and EnMAP, imaging             Among the terrain attributes that are useful for SOC retrieval (Zeraat-
spectroscopy has become an efficient method for retrieving informa-               pisheh et al., 2022), there are different features such as local ones like
tion on the ground. By analyzing the spectral reflectance from the                slope, aspect, curvature, etc, regional ones like Topographic Wetness
remote sensing images, properties of soil, water, or vegetation could             Index (TWI), Relative Slope Position (RSP), and also the original eleva-
be estimated accurately (Rast and Painter, 2019).                                 tion level. Therefore, topographical attributes was also included in this
    Many studies used remote sensing data for SOC estimation. As                  study to provide more information into the models.
summarized in the paper of Angelopoulou et al. (2019), Odebiri et al.                 The overall objective of this study is to estimate SOC accurately on
(2021), different input data modalities such as multispectral (Castaldi
                                                                                  a large area scale by building hyperspectral composites and using deep
et al., 2019; Gholizadeh et al., 2018; Vaudour et al., 2019) or hyper-
                                                                                  learning methods. The main contributions of this paper are as follows:
spectral (Anne et al., 2014; Bartholomeus et al., 2011; Gomez et al.,
2008) data from different platforms including spaceborne, airborne,
                                                                                      1. The first large-scale hyperspectral reflectance composites were
and unmanned aerial systems were used. Simple machine-learning
                                                                                         made for SOC regression by exploring multiple preprocessing
regressors such as Random Forest (RF) and Partial Least Square Re-
                                                                                         techniques. The composites were generated from DESIS images
gression (PLSR) were often selected to estimate SOC from such spectral
                                                                                         and cover the whole Bavaria state in Germany(∼ 70,000 km2 ).
images. Partially because of the small data size of soil data, deep
                                                                                      2. Deep learning frameworks for SOC regression tasks from hyper-
learning methods were not largely explored for SOC estimation. When
                                                                                         spectral images were investigated, and good regression results
using spaceborne hyperspectral data, most of the studies only focused
on areas of small size with hand-picked single imageries (Mzid et al.,                   were achieved.
2022; Wang et al., 2010). However, these studies on airborne and                      3. Topographical features were fused with hyperspectral features
spaceborne hyperspectral data have shown great potential for SOC                         and the performance was improved by modalities fusion.
prediction due to the high spectral information content. There is an                  4. Soil maps were generated. Both spectral and spatial
urgent need to use spaceborne hyperspectral images to detect SOC                         interpretability analysis were conducted for the model’s perfor-
for large areas. In the paper of Wang et al. (2022), the potential                       mance.
of airborne and spaceborne optical soil sensing by using soil library
hyperspectral reflectance was analyzed. The study showed the potential               This article is organized as follows: In Section 2, the datasets and
of soil mapping on the continental scale by using machine learning                methodology in this study are described. Then, Section 3 shows the
models.                                                                           experimental results using different input data and methods. Section 4
    When dealing with soil mapping problems, temporal composites                  provides some in-depth analysis and discussion. In the end, conclusions
have been proven to be useful by having better soil exposure (Demattê             are drawn up in Section 5.
et al., 2018; Diek et al., 2017; Rogge et al., 2018). For building temporal
composites automatically from multispectral imagery, SCMaP is already
                                                                                  2. Material and methods
developed and applied to generate temporal composite products for
soil-related applications (Heiden et al., 2022; Rogge et al., 2018; Xue
et al., 2024). Due to the broad coverage both in the spatial and                     In this work, data modalities including hyperspectral images and
temporal domains of multispectral satellites, it becomes feasible to set          DEM data were involved in retrieving SOC contents from the soil
up a unified pipeline for building composites from multispectral im-              dataset.
ages. However, hyperspectral data, including DESIS and EnMAP (Storch
et al., 2023), have more limited data availability than multispectral
                                                                                  2.1. Hyperspectral data
ones like Sentinel-2 (Drusch et al., 2012), and Landsat Woodcock
et al. (2008). Under such limited data conditions, building hyperspec-
tral temporal composite becomes significantly more challenging and                    For hyperspectral data in this work, hyperspectral images from
requires more preprocessing steps.                                                DESIS satellites were used. DESIS is the DLR Earth Sensing Imaging
    Deep learning methods have become powerful tools for analyzing                Spectrometer (German Aerospace Center (DLR), 2019). It is sensitive
complex data structures, especially for images and texts (LeCun et al.,           between bands of 400 nm to 1000 nm with a spectral resolution of
2015). In the field of remote sensing, deep learning also achieves                2.5 nm. The spatial resolution is about 30 meters on the ground.
great success for data of different modalities including multi- and               The data are delivered in images of about 30 * 30 km. From the
hyperspectral, Lidar, and synthetic aperture radar (SAR) (Zhu et al.,             preprocessed products, L2 A data was chosen, which is atmospher-
2017). For hyperspectral imagery, many deep learning frameworks                   ically corrected and ortho-rectified. From the DESIS data portal, all
were developed regarding the detailed spectral characteristics of hy-             850 hyperspectral images were gathered within the whole Bavaria
perspectral data. Different from the simple RGB images of three bands,            state, Germany (47.26◦ N–50.56◦ N, 9.53◦ E – 13.84◦ E) collected be-
hyperspectral data contains much more bands of several hundreds.
                                                                                  tween 2018–2023. The distribution of the cloud-free hyperspectral
Specially adapted RNN-based models and CNN models have shown
                                                                                  pixels over the whole state of Bavaria was shown in Fig. 1. The
great performance in hyperspectral classification tasks (Li et al., 2017;
                                                                                  pixel numbers were counted based on the entire dataset of the 850
Mou et al., 2017; Mou and Zhu, 2019). For tasks using reflectance
                                                                                  hyperspectral images. It can be seen that the valid hyperspectral cov-
spectra for prediction, deep-learning frameworks including MLP, CNN
                                                                                  erage is sparse and uneven, where some areas do not even have any
have been widely applied (Padarian et al., 2019; Xu et al., 2019; Zhao
                                                                                  existing hyperspectral data, but some other areas, like the Munich
et al., 2022). However, limited by the data availability and sparse
coverage of hyperspectral data, there are quite limited studies using             area, have multiple images up to 23. This unbalanced data availability
large-scale hyperspectral imagery. Moreover, compared with the rich               is challenging when generating temporal composites and preparing
number of studies on hyperspectral classification, the research area of           useful input spectra. Areas with multiple hyperspectral images result in
hyperspectral regression is highly unexplored.                                    reasonable composites by averaging out strong seasonal effects (e.g., on
    Besides the possibility of using spectral information for SOC esti-           soil moisture) and thus, produce stable soil reflectance composite spec-
mation, topographical features are also relevant. In many studies (Guo            tra (Dvorakova et al., 2021; Heiden et al., 2022). However, those areas
et al., 2019; McBratney et al., 2003; Möller et al., 2022; Rasel et al.,          with only limited images could be affected by the seasonal extremes,
2017), the digital elevation model (DEM) and generated terrain at-                thus resulting in biased reflectance, which could not correspond to the
tributes were applied for SOC estimation (Wiesmeier et al., 2014).                stable topsoil properties.
                                                                              2
X. Zhao et al.                                                                               International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 1. Dataset: (a) Research area: the Bavaria state; (b) Soil data: 828 sparsely distributed soil samples from LfU, LfL and LUCAS; (c) DESIS data: cloud-free observations from
850 scenes; (d) DEM 30: the elevation values above the sea level in the units of meters.
2.2. Digital elevation model (DEM)                                                          from each site, 30 samples from 5 sub-areas. All the data was collected
                                                                                            from the years 2001 to 2008. For this study, the averaged SOC value
    Regarding the hyperspectral data, the measured spectrum also has                        from the 30 samples was used as the ground truth value. For simplicity
inter-variability dependent on the topographical features. Therefore,                       and readability, the commonly used unit of (%) is used to describe
DEM data was included, which was derived from Google Earth En-                              SOC values in this work, where 1% equals to 10 g/kg. The LUCAS
gine (Gorelick et al., 2017). Specifically, GLO30 was used from Coper-                      2018 dataset consists of 18,984 soil samples for Europe. These samples
nicus by ESA (European Space Agency (ESA), 2019) with a spatial                             were collected in 2018. They contain information like pH value, SOC
resolution of 30 m. For this research, DEM was selected within the                          content, CaCO3, P, N, K, land class, land usage, etc. Based on these
Bavaria area (47.26◦ N–50.56◦ N, 9.53◦ E–13.84◦ E). The selected DEM                        three datasets, data points were filtered and selected by criteria like
data was visualized in sub-figure (a) in the Fig. 1.                                        agricultural land use, available DESIS images, and SOC value less than
                                                                                            18%. Specifically, the attached land use information of each soil sample
2.3. Soil data                                                                              in LUCAS data was used to select agricultural land use. The high SOC
                                                                                            value threshold was set because SOC values are highly imbalanced
    In this study, soil data was from three data sources LfU (Bavarian                      with high positive skewness. Following the works of Tziolas et al.
State Environment Agency), LfL(Bavarian State Research Center for                           (2024), van Wesemael et al. (2024), the threshold of 18% was used
Agriculture), and LUCAS 2018 (Land Use and Coverage Area Frame                              to exclude rare, extreme SOC values. In the end, 828 soil samples were
Survey) (Orgiazzi et al., 2018). LfU data contains 2548 soil samples’                       used for all the following experiments. It is worth noting that from the
information over the whole of Bavaria and samples were collected from                       year 1986 to 2015, there was no significant SOC change for Bavarian
the year 1953 to 2021. It includes soil information like soil type, SOC                     croplands (Kühnel et al., 2020). Such stability makes it possible to align
content, pH value, etc. LfL data includes 351 cropland sites, where                         soil samples with hyperspectral imagery of different collection times
                                                                                        3
X. Zhao et al.                                                                              International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
                          Table 1
                          Statistics of the selected SOC data (%). STD: standard deviation, Min: minimum, Max: Maximum.
                                                                                       4
X. Zhao et al.                                                                             International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 3. (a) Preprocess and Features: This figure describes the preprocessing and feature engineering pipeline for both DESIS data and DEM data. Surrounding each soil sample
which is marked in red, the DESIS patch and the DEM patch were selected. Based on the extracted hyperspectral patches, the temporal composite was generated and then the
first-order derivative was computed from the averaged spectrum. From the DEM patches, TWI, and slope features were computed, while keeping the original elevation values. (b)
Deep Learning Models: the 1D CNN framework was used for the hyperspectral input. The detailed structure in the figure is for illustration only. Besides the basic CNN model, a
spectral attention mechanism was incorporated. For modalities fusion using both hyperspectral and topographical input, features from both modalities were concatenated.
hyperspectral composite for the whole of Bavaria was visualized in Fig.                   2.5. Feature engineering
5. From the composite, it could be seen that there are many white areas,
indicating pixels permanently covered by water, vegetation, or build-                         From the extracted hyperspectral patches, feature engineering was
ings. Moreover, there are densely or sparsely distributed hyperspectral                   conducted to prepare for machine learning and deep learning frame-
pixels for different areas, as shown in the two example subplots.                         works. According to (Nocita et al., 2013; Stevens et al., 2010), prepro-
                                                                                          cessing steps such as absorbance conversion, smoothing, and getting
    Based on the hyperspectral composites, hyperspectral patches were
                                                                                          first-order derivate were proven to be useful for SOC estimation. There-
extracted surrounding each soil sampling site. Regarding the patch size
                                                                                          fore, in the experiments, the same settings were followed for the
here, the size of 3 pixels times 3 pixels was chosen, which corresponds                   feature extraction procedures. The hyperspectral reflectances of the
to a square area of 90 meters times 90 m. The patch size was expanded                     whole patch were first averaged into a one-dimensional averaged spec-
from one pixel to three pixels by assuming that the SOC content is                        trum. Then the averaged spectrum was converted into absorbance by
the same within a small nearby area. Such aggregation of nearby                           logarithm computation. The absorbance was smoothed by the Savitzky–
reflectance could help generate better input spectra to train the model,                  Golay filter according to Steinier et al. (1972), and the first-order
especially under the sparse distribution of valid pixels.                                 derivative was computed from the smoothed absorbance. In the end,
                                                                                      5
X. Zhao et al.                                                                       International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 4. Basic scheme of generating the temporal composite from DESIS hyperspectral images.
a one-dimensional vector of the first-order derivative was generated,               computed. In the end, the logarithm conversion was conducted as
which was the model’s input. The preprocess and feature generation                  follows:
at each step were visualized in Fig. 6. In this figure, not only were                        (                    )
                                                                                                 sin(slope_rad)
the results of each step shown, but the difference between different                twi = ln                        .                           (4)
                                                                                               1 + cos(slope_rad)
SOC value ranges was also illustrated. These different patterns among
various SOC contents provide the essential information to feed into the             Here, the slope_rad represents the slope in the radians unit. After
regression models.                                                                  stacking with the original elevation values, these three features were
   For the integration of auxiliary variables generated by topographical            treated as images with three channels. Within each channel, the values
data, the approach of Tziolas et al. (2024) was followed. Hence, slope              were normalized by min–max normalization. In the end, patches were
and TWI (Topographic Wetness Index) were computed. To compute the                   extracted surrounding every soil sample. For the patch size of topo-
slope, the gradients in the 𝑥 and 𝑦 directions were first calculated using          graphical features, an area of 10 times 10 pixels was used, which is
the central difference. Then the arctan function was used to derive the             larger than the hyperspectral patches. The reason is that, on the one
slope by the following equation:                                                    hand, for hyperspectral input, hyperspectral reflectance with low vari-
                √
               ⎛ ( )2 ( )2 ⎞                                                        ance is required by reducing the patch size. On the other hand, variant
               ⎜    𝜕𝑧         𝜕𝑧 ⎟
slope = arctan            +           .                                (3)          elevations are needed for topographical input so that the difference can
               ⎜    𝜕𝑥         𝜕𝑦 ⎟
               ⎝                    ⎠                                               be captured and features can be extracted and normalized. Moreover,
In the end, the slope was converted into the unit of degrees. For                   a larger topographical patch size could provide a better topographical
the TWI feature, based on the generated slope values, the slope unit                context, and the topographical influences on soil properties often act at
was transformed into radians and the specific catchment area was                    a broader scale.
                                                                                6
X. Zhao et al.                                                                               International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 5. Temporal composite for the Bavaria state. For visualization, the bands 180 (860 nm), 122 (711 nm), and 42 (507 nm) were chosen as RGB channels. White areas refer
to permanent coverage by water, vegetation, or buildings.
Fig. 6. Preprocessing and Feature engineering steps: reflectance, absorbance, smoothed absorbance, and 1st order derivative. Different colors indicate features from different SOC
ranges.
2.6. ML and DL methods for hyperspectral data                                               of components is tuned to maximize the covariance between input and
                                                                                            output. Both RF and PLSR possess excellent regression capabilities and
   In the field of digital soil mapping, machine learning regression                        are suited for small-size datasets. Following the settings in Wang et al.
models such as Random Forest (RF Breiman, 2001) and Partial Least                           (2022), the computed first-order derivative was fed into these models
Square Regression (PLSR, Geladi and Kowalski, 1986) have been most                          and their performance was used as the comparison baseline.
commonly used and achieved great performance in many different                                  Regarding the deep learning frameworks, 1D CNN was chosen and
tasks. Random forest is an ensemble method that comprises multiple                          an attention mechanism was incorporated in this work (subfigure (B) of
decision trees. It combines the predictions of multiple models to im-                       Fig. 3). For 1D CNN, the input is the average spectrum derivative in the
prove the overall performance. PLSR is a method used to find the latent                     form of a one-dimensional vector. The 1D CNN model would extract
variables that simultaneously explain the maximum variance between                          spectral features from the input data and produce regression output.
input features and output labels. During model training, the number                         Each convolutional layer is followed by an activation function, pooling
                                                                                        7
X. Zhao et al.                                                                    International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
layer, and dropout layer. Afterward, the intermediate feature is fed into        were reliable. During training and validation, five cross-validation was
fully connected layers and the model outputs regression results.                 used. Since the target SOC contents follow an imbalanced distribution
    Based on the 1D CNN, an attention mechanism was adopted to                   with the majority of contents lying between 0 and 4%, stratified cross-
improve the performance further. Specifically, spectral attention was            validation (Hastie et al., 2009) was used by converting regression
applied here. Following the setting from (Hang et al., 2020; Mou and             values into five classes with discretization boundaries [0, 1, 2, 3, 4,
Zhu, 2019), a spectral gate was generated and multiplied elementwise             18]. During cross-validation, the folds were made by preserving the
with the original spectrum. As shown in subfigure (b) of Fig. 3, the             percentage of samples for each class. Following (Nalepa et al., 2022),
spectral attention consists of a simple 1-dimensional convolutional              the loss function based on the simple RMSE (Root-Mean-Squared Error)
layer, where the input size and filter number are equal to the band              was adopted. The loss was computed by the RMSE score divided by
number of hyperspectral data, and both the stride size and kernel size           a baseline RMSE score. The baseline RMSE was generated by simply
are one. Under such a parameter setting, the 1D CNN layer would learn            using the training average SOC content as the prediction for all points.
the unique attention weights for each hyperspectral band. The goal is            The goal of customizing such a loss function is to prevent the model
to focus on more relevant bands for the regression task since different          from converging to the training average. Moreover, a higher weight of 2
bands have different sensitivities to SOC content.                               was assigned for higher SOC contents (larger than 4%) when computing
                                                                                 the loss function. It could help the model suffer less from the limited
2.7. ML and DL methods for DEM                                                   samples within the high SOC range. For evaluation, three metrics were
                                                                                 selected: 𝑅𝑀𝑆𝐸, 𝑅2 (coefficient of determination), and 𝑅𝑃 𝐼𝑄 (Ratio
    Based on the extracted topographical features - elevation, slope,
                                                                                 of Performance to Interquartile Range):
and TWI, a simple 2D CNN was implemented to estimate SOC. The                              √
                                                                                           √ 𝑛
reason for choosing 2D CNN frameworks is their strong capabilities to                      √1 ∑
extract spatial patterns from images. 2D CNN-based frameworks have               RMSE = √          (𝑦 − 𝑦̂𝑖 )2 ,                                      (5)
                                                                                             𝑛 𝑖=1 𝑖
been widely used and achieved excellent performance in many tasks                           ∑𝑛
for RGB images, including classification, regression, segmentation, etc.                        (𝑦𝑖 − 𝑦̂𝑖 )2
                                                                                 𝑅2 = 1 − ∑𝑖=1
                                                                                             𝑛               ,                                        (6)
Since three features were included from the DEM data and patches                                       ̄2
                                                                                             𝑖=1 (𝑦𝑖 − 𝑦)
were extracted from them, the topographical features could also be
                                                                                            𝐼𝑄
regarded as images with three channels, which are in a similar format            RPIQ =         .                                                                    (7)
                                                                                           𝑅𝑀𝑆𝐸
to RGB images. For comparison, random forest was used for regression
by feeding the averaged topographical features from every patch into             In these equations, 𝑦𝑖 is the ground truth value, and 𝑦̂𝑖 is the estimated
the machine learning model.                                                      regression value. 𝑛 is the number of samples. 𝐼𝑄 is the interquartile
                                                                                 range of ground truth values 𝐼𝑄 = 𝑄3 − 𝑄1 .
2.8. Modality fusion                                                                 For machine learning models like RF and PLSR, the scikit-learn
                                                                                 library (Pedregosa et al., 2011) was used and they are trained on
    When dealing with input data from different modalities, the modal-           Intel(R) Core(TM) i7-1355U processor. Optuna (Akiba et al., 2019) was
ity fusion strategy is particularly useful in deep learning. In (Schmitt         used for parameter tuning among 100 trials. For RF, the number of
and Zhu, 2016), the fusion strategies in remote sensing are classified           estimators (10–100), maximum depth (1–20), minimum samples per
into three types: observational level fusion, feature level fusion, and          split (2–20), and minimum samples per leaf (1–20) were tuned. For
decision level fusion. In this study, since the DESIS data and DEM data          PLSR, the number of components (1–50) was tuned. The loss value
reflect information from two totally different perspectives - spectral and       computed on the validation set was used to choose the best parameter
topographical, feature-level fusion was chosen by simply concatenating           combination.
representations after CNN layers. Because these two input modalities                 For deep learning models, the best structure was also investigated
have different characteristics, two separate CNN branches were used              using Optuna by tuning the following parameters: learning rate (1 ×
to generate intermediate spectral and topographical representations              10−4 , 2 × 10−4 , 5 × 10−4 , 1 × 10−3 ), batch size (8, 16, 32), number of
respectively, as shown in Subfigure (b) in Fig. 3. The same 1D CNN with          CNN layers (2, 3, 4), number of FC layers (2, 3, 4), CNN filters(16, 32,
and without spectral attention was used for the spectral branch. For             64, 128, 256), FC layers’ units (16, 32, 64, 128), activation functions
the topographical branch, the same 2D CNN structure was used. After              (ReLU, Tanh, Leaky ReLU), and kernel size (2, 3). For each framework,
that, the late fusion strategy was applied by concatenating weighted             100 trials were conducted. PyTorch (Paszke et al., 2019) was used to
representations from these two modalities. It is worth noting that               build and train all the deep learning models. To stabilize the training
within the fusion strategy, a gate mechanism was incorporated (Arevalo           process, a grid clip of max norm = 1.0 was used. All the models were
et al., 2017), where the fusion weights for two modalities are learnable         trained with 1000 epochs and early stopping. To optimize the model,
parameters and could be optimized automatically. Specifically, both in-          Adam optimizer was used. All the experiments were implemented on
termediate features would be fed into a fully connected layer so that the        the NVIDIA A40 GPU.
intermediate vector could be transformed into the same size. Afterward,              For interpretability analysis, the SHapley Additive exPlanations
these transformed vectors are multiplied with fusion weights, where the          (SHAP) values were computed based on the best-performed 1D CNN
summation from spectral and topographical weights is equal to one.               model. Specifically, the average SHAP values were calculated across
In the end, the combined representation is fed into fully connected
                                                                                 different wavelengths on the test set. The aim is to see which bands
layers, and the model outputs the estimated SOC value. For comparison,
                                                                                 provided relevant information for SOC and were extracted by the 1D
random forest was also applied by simply using concatenated averaged
                                                                                 CNN model.
features from both data.
   For implementation, the whole dataset was split into calibration                 The experimental results of SOC estimation were summarized in
and test sets, where the calibration set was used to train and validate          the following Table 2 according to different input data modalities and
the model, and the test set was used to evaluate the trained model               models. All the results here were computed from the final test set. The
externally. Calibration and test sets were split geographically to guar-         scores was the averaged scores from cross-validation. For each model,
antee that there was no data leakage such that the experimental results          the best performance from parameter tuning was reported here.
                                                                             8
X. Zhao et al.                                                                           International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
                       Table 2
                       Test results of SOC regression: This table shows the performance on the test set when using different data modalities as the
                       input (HS: Hyperspectral, DEM: Digital Elevation Model). Specifically, the averaged 𝑅𝑀𝑆𝐸, 𝑅2 , and 𝑅𝑃 𝐼𝑄 during cross-
                       validation on different models are showed. The best performance for each data modality is highlighted.
                        Data modalities           Model                                                 𝑅𝑀𝑆𝐸              𝑅2                𝑅𝑃 𝐼𝑄
                        HS                        RF                                                    3.137             −0.023            0.892
                        HS                        PLSR                                                  3.045             0.037             0.920
                        HS                        1D CNN                                                1.947             0.626             1.710
                        HS                        1D CNN with spectral attention                        2.333             0.491             1.299
                        DEM                       RF                                                    3.132             −0.019            0.894
                        DEM                       2D CNN                                                2.591             0.297             1.090
                        HS +   DEM                RF                                                    3.119             −0.011            0.898
                        HS +   DEM                PLSR                                                  3.033             0.044             0.923
                        HS +   DEM                1D CNN + 2D CNN                                       2.075             0.572             1.626
                        HS +   DEM                1D CNN with spectral attention + 2D CNN               1.752             0.695             1.919
3.1. Results using hyperspectral data                                                   additional redundancy and noise brought by the topographical input.
                                                                                        To gain more insights into the regression results, the regression value
    As shown in Table 2, deep learning-based models including 1D CNN                    generated by the best modalities fusion model and ground truth values
and 1D CNN with spectral attention achieved much better performance                     were visualized in Fig. 7. In this figure, the five plots were gener-
than machine learning regressors such as RF and PLSR. From the                          ated during cross-validation on the test set. It could be seen that the
results, both RF and PLSR models were underfitted for several reasons.                  regression results are good, especially for the lower SOC values (less
First, the input hyperspectral derivative is noisy due to the limited                   than 4%). However, as shown in Fig. 2, samples with higher SOC
number of hyperspectral images. Such noisy data could lead to poor                      values (larger than 4%) are scarce. Therefore, the model has limited
regressions from simple ML models. Second, the input hyperspectral                      exposure to the corresponding spectral characteristics, which results in
derivative is of high dimension, and models could suffer from the                       poor prediction in this range. Considering the dominance of low SOC
curse of dimensionality because of redundant and irrelevant input data.                 values in the calibration set, the regression model would be biased,
Third, the SOC labels are imbalanced, and the simple ML models may                      leading to an underestimation of high SOC samples.
ignore extreme cases. Based on the test results, 1D CNN showed good
performance with an 𝑅2 score of 0.626. The reason for such good                         3.4. Bands importance analysis
performance could be the capability of capturing local spectral features
and correlations between adjacent spectral bands. After incorporating                       Besides regression experiments, explainability analysis was also
the spectral attention mechanism into 1D CNN, the model achieved                        conducted by using the SHAP (SHapley Additive exPlanations) method
slightly worse performance with 𝑅𝑀𝑆𝐸 2.333 and 𝑅2 0.491. Com-                           (Lundberg and Lee, 2017). SHAP is used for analyzing feature impor-
pared to the 1D CNN without attention, although the spectral attention                  tance from the model’s input. It can be used to analyze why the model
mechanism makes the model focus more on informative bands and less                      generates certain outputs. Here, the absolute values of SHAP were used
on the other bands, it also increases the model’s complexity. Regarding                 to investigate important hyperspectral bands for SOC from 1D CNN’s
the small data size in this work, such increased complexity makes it                    test results. The results were visualized in Fig. 8, together with the first-
easier for the model to get overfitted on the dataset.                                  order derivative of the hyperspectral spectrum. It can be observed that
                                                                                        there are a few wavelengths of high SHAP values, indicating they have
3.2. Results using DEM data                                                             a great influence on the regression results. In detail, the bands between
                                                                                        the range of 530 nm–570 nm, 770 nm–790 nm, and 840 nm–870 nm
    The 2D CNN model achieved better regression results than RF with                    are informative for SOC content. By comparing the input derivative
an 𝑅𝑀𝑆𝐸 of 2.591 and an 𝑅2 score of 0.297 (Table 2). Here, 2D CNN                       features in the bottom plot 8, it is evident that there are relatively more
can extract spatial features such that useful topographical information                 distinguishable features within these bands. Such alignment between
can be used to estimate SOC contents. Although such regression results                  the feature importance and feature differences provides interpretable
on DEM input were worse than the results on hyperspectral input, they                   analysis for the successful performance of the 1D CNN model.
showed that topographical features could be treated as useful auxiliary
information and have the potential to be combined with hyperspectral                    3.5. Soil map products
input.
                                                                                            Besides using metrics like 𝑅𝑀𝑆𝐸, 𝑅2 , and 𝑅𝑃 𝐼𝑄 to evaluate mod-
3.3. Results using modalities fusion                                                    els, SOC maps were also generated on the test area to gain insights
                                                                                        into the spatial representation of SOC (Fig. 9). The following trained
    Modalities fusion with a spectral branch of 1D CNN, spectral at-                    models and input data were used to generate SOC maps on the test
tention, and topographical branch of 2D CNN achieved good perfor-                       area: 1D CNN and hyperspectral data in plot (a), 2D CNN model with
mance of 𝑅𝑀𝑆𝐸 1.752 and 𝑅2 of 0.695 (Table 2). The success of                           DEM data in plot (b), modalities fusion model with both hyperspectral
this modalities fusion setting proves that the spectral and topograph-                  and DEM input in the plot (c). Then, in plot (d), a soil type map
ical information could compensate for each other and provide more                       was included from BGR (The Federal Institute for Geosciences and
relevant input for SOC regression. Importantly, the additional DEM                      Natural Resources)1 . In the soil maps, the commonly used SOC classes
features act as a regularizer, counterbalancing the increased complexity                in Germany were used to colorize predicted SOC contents by the unit of
introduced by the attention mechanism. They enhance model stability                     mass percentage. In addition to these maps, histograms of the predicted
and improve generalization. Also, spectral attention is effective here in               SOC were made to have better insights into the value distribution
emphasizing useful bands for SOC estimation. Given the fact that the                    (Fig. 10). From this figure, it can be seen that the modalities fusion
filtered hyperspectral input here contains 157 bands, many of which                     model performs best. The distribution of the predicted SOC value
contain redundant information for SOC because there are no small-scale                  in plot(c) follows a pretty similar distribution with the ground truth
spectral features expected in this wavelength range. It is interesting that
the regression results got worse than when using only hyperspectral
                                                                                          1
input, using 1D CNN + 2D CNN. This could be partially due to the                              https://geoportal.bgr.de/
                                                                                    9
X. Zhao et al.                                                                                 International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 7. Regression Result of the best-performed model by five-fold cross-validation (subfigures (a), (b), (c), (d), (e) represent results of fold 1, 2, 3, 4, 5): This figure shows the
test result from the modalities fusion model which uses 1D CNN + spectral attention for HS + DEM. For each fold during the cross-validation, the regression result and ground
truth values are plotted. Within each figure, the black line is the identity line, which is the perfect agreement between prediction and ground truth. The red line is the best linear
approximation of the relationship between ground truth and predicted values.
Fig. 8. SHAP values analysis: (a): Absolute SHAP values; (b): The input hyperspectral derivatives. High SHAP values and corresponding highly distinguishable derivative features
are highlighted.
                                                                                          10
X. Zhao et al.                                                                             International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 9. SOC Map for the test area generated by different models: (a): SOC Map generated by 1D CNN with only hyperspectral input; (b): SOC Map generated by 2D CNN with
DEM input; (c): SOC Map generated by fusion model with spectral attention from both hyperspectral and topographical input. The unit of SOC value here is mass percent (%) (d):
Soil Type Map from BÜK1000 of BGR (production date: 2013).
SOC contents, as described in Fig. 2. However, for the predicted SOC                      predictions of each soil type, as described in Table 3. From the table,
contents from only DEM input (b), the prediction concentrates in the                      it can be observed that SOC contents are related to soil types. For
range between 1.0 and 5.0 without having higher SOC predictions.                          example, for all three prediction results, soil type 1 has the highest
Moreover, from all three SOC maps, the higher SOC contents along                          average prediction among 5 types. Thus, the soil contains higher SOC
rivers and floodplains could be observed.                                                 contents in water-associated areas, such as coastal soils and bogs. For
    Regarding the soil types within the test area, there are coastal soil                 mountain areas like soil type 5, the predicted SOC contents are always
and bogs (6), floor of the wide river valleys (11, 13, 14), soil of the                   the lowest compared to the other types. These findings align well with
undulating lowlands and hilly areas (18, 30), soils of the loess areas                    the prior knowledge of SOC distribution across various soil types and
(42), soil of the mountain and hill regions (50, 61, 63). Different                       demonstrate the effectiveness of the model.
soil types have various SOC distributions. For example, for coastal
soil and bogs, the SOC value is higher because of water logging and                       4. Discussion
slow decomposition. For soils in the mountain area, the soil tends to
have lower SOC due to the erosion caused by the steep slopes. For                         4.1. Temporal composites from hyperspectral images: challenges and signif-
the other types here, the SOC value lies in a moderate range, which                       icance
could be affected by topography, hydrology, vegetation and climate,
etc. To have better insights into the correlation between soil type and                      In this study, the first large-scale temporal composites were built
SOC contents, the mean and standard deviation were computed within                        from hyperspectral DESIS images. Compared to many previous works of
                                                                                     11
X. Zhao et al.                                                                                    International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
Fig. 10. Estimated SOC value distribution for different input data modalities: (a): SOC distribution by 1D CNN with only hyperspectral input; (b): SOC distribution by 2D CNN
with DEM input; (c): SOC distribution by fusion model with spectral attention from both hyperspectral and topographical input.
Table 3                                                                                          period can thus provide the mean or median spectra by reducing strong
Mean and Standard deviation of SOC contents of different soil types (%). The                     seasonal fluctuations. The fused images could then support a more
predictions were from the best model using HS data, DEM data, and HS + DEM data
input, respectively. In this table, the considered soil types are type 1(coastal soil and
                                                                                                 robust estimation of SOC contents of the topsoil surface. Secondly, since
bogs), type 2(floor of the wide river valleys), type 3(undulating lowlands and hilly             only bare soil pixels could be selected as input data, making composites
areas), type 4(loess areas), type 5(mountain and hill regions).                                  produce a larger number of valid pixels than using only one single
                           Statistics of Hyperspectral Predictions                               image layer. This is meaningful in both dealing with the data sparsity
 Soil type         Type1            Type2          Type3             Type4       Type5           and providing larger input data for deep learning models. Thirdly, the
 Mean              4.116            2.388          2.750             2.392       2.075
                                                                                                 composites are of higher data quality than only one captured image.
 Std               2.422            1.685          1.860             1.563       1.050           Different from the spectra measured in the lab condition, spectra from
                                                                                                 spaceborne satellites are affected by issues including cloud coverage,
                               Statistics of DEM Predictions
                                                                                                 vegetation coverage, geo-reference offset, etc. When aggregating im-
 Soil type         Type1            Type2          Type3             Type4       Type5           ages by calculating the mean or median, such issues could be mitigated
 Mean              3.665            2.897          2.644             2.453       2.409           by filtering out outlier data. Such data cleaning and the resulting
 Std               1.067            1.132          1.080             1.009       1.003
                                                                                                 improvement of data quality are essential, especially under the settings
                              Statistics of Fusion Predictions                                   of small data sizes and large deep learning frameworks.
 Soil type         Type1            Type2          Type3             Type4       Type5
                                                                                                 4.2. Features and deep learning mechanisms
 Mean              3.929            2.228          2.637             2.217       1.879
 Std               2.474            1.663          2.030             1.609       1.413
                                                                                                     From the generated temporal composite in this study, each valid
                                                                                                 pixel would contain a one-dimensional vector, corresponding to the
                                                                                                 spectrum. Before finalizing the complete experimental pipeline, the
multispectral images, building temporal composites from hyperspectral                            spectrum data was tried to be fed directly into the regression model
images is much more challenging. One main reason behind such diffi-                              without further feature engineering. The results turned out to be un-
culties is the hyperspectral data sparsity from both temporal and spatial                        derfitting, which shows the necessity of conducting feature engineering
domains. The currently available hyperspectral data does not have time                           on the raw hyperspectral reflectance for SOC estimation. In papers
series images or large spatial coverage. For example, the multispectral                          using multispectral images for SOC regression like (Dvorakova et al.,
satellite like Sentinel-2 has global coverage and possesses a constant re-                       2021; Gholizadeh et al., 2018), the authors calculated the following
visit time. Such spatial and temporal conditions make it relatively easy                         features from multispectral bands: NDVI, TVI, NBR2, etc. In papers
and possible to preprocess the multispectral data and build high-quality                         using hyperspectral data for SOC regression like (Stevens et al., 2010;
temporal composites. However, hyperspectral satellites such as DESIS,                            Wang et al., 2022), the authors used the spectrum’s derivative up
EnMAP, PRISMA, and Hyperion do not have such ideal data conditions.                              to second-order. The similarities among these works are that they
Therefore, there are just a few studies using temporal composites from                           considered the difference and variation among spectral bands and then
hyperspectral data but only for a very small area (Tagliabue et al.,                             used them to feed into regression models. Also, the experimental results
2022). Regarding DESIS data in this work, there are still many areas                             show that the extracted first-order derivative proves to be a good input
without any data within Bavaria. Such heavily sparse data conditions                             feature for SOC estimation.
emphasize the importance of the data cleaning and preprocessing steps.                               Regarding the regression models, all the CNN-based models
Especially, to generate soil reflectance composite for bare soil pixels,                         achieved better performance than machine learning regressors such
the effects brought by cloud coverage, vegetation coverage, and soil                             as RF and PLSR. The success of the 1D CNN model using hyperspec-
moisture fluctuation have to be considered.                                                      tral input can be attributed to CNN’s capability of capturing local
    Although it is tricky to make such hyperspectral temporal compos-                            dependencies and correlations in the spectral dimension. Based on the
ites, soil reflectance composites are useful in many aspects. Firstly, SOC                       basic CNN framework, the spectral attention mechanism could further
content is a relatively stable property over time in Bavaria (Wiesmeier                          make the model focus more on relevant bands and less on the others.
and Burmeister, 2022). However, the hyperspectral reflectance could                              However, such an additional attention module increases the complexity
fluctuate with the seasonal change, primarily due to factors such as                             of the framework, making the model harder to fit on the small dataset.
soil moisture variability. Fusing hyperspectral images covering a large                          Therefore, the 1D CNN model without spectral attention achieved
                                                                                            12
X. Zhao et al.                                                                     International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
better results than the one with the attention mechanism. For the 2D              about soil mapping, the size of collected soil samples was always lim-
CNN model using topographical input, it is capable of extracting spatial          ited. Most works used less than one thousand soil samples to calibrate
features. The hierarchical CNN layers can help the model understand               the models. However, deep learning models require a large amount of
terrain attributes in different scales. The modalities fusion framework           data to be trained well. In this case, self-supervised learning algorithms
includes both spectral and topographical information (McBratney et al.,           could be a solution, where we use some other downstream tasks to pre-
2003). By using the gate mechanism to fuse intermediate features                  train the model and fine-tune it on soil data. Also, some areas might
dynamically, features are not only scaled into the same scale but also            only have a few sparsely distributed soil samples. Few-shot learning
assigned by different fusing weights. The fusion model thus combines              would then become a suitable approach by incorporating these new
the advantages of the 1D CNN and 2D CNN. It then achieved the best                samples into the already-developed models.
result among all the experiments in this work.
                                                                                  5. Conclusion
4.3. Limitation of DESIS spectral coverage and NDVI
                                                                                      In this study, the first attempt was made to use large-scale hy-
    DESIS data provides spectral coverage in the visible and near-                perspectral temporal composites for SOC estimation. The novelty of
infrared (VNIR) range from 400 nm to 1000 nm, lacking shortwave                   this research lies in the generation of hyperspectral composites with
infrared (SWIR) bands from 1000 nm to 2500 nm. This property                      a systematic approach for extracting bare soil pixels, followed by the
would constrain the accurate selection of bare soil pixels, as spectral           application of deep learning techniques for SOC regression. By fusing
characteristics in SWIR are necessary for distinguishing bare soils from          DEM data and exploring different deep learning methods, good regres-
dry vegetation.                                                                   sion results were achieved with 𝑅𝑀𝑆𝐸 1.752% and 𝑅2 0.695 and
    In this study, NDVI was computed to identify bare soils based on              𝑅𝑃 𝐼𝑄 1.919 in agricultural soils of Bavaria, Germany. The following
VNIR bands, leveraging the chlorophyll absorption in the red band                 conclusions are drawn: (1) the spaceborne hyperspectral imagery could
and reflectance in the Near-Infrared (NIR) band to differentiate veg-             be used to estimate SOC accurately on a large area scale with enough
                                                                                  data filtering and preprocessing. Especially when creating temporal
etation from non-vegetation. However, NDVI has certain limits when
                                                                                  composites, it is essential to pay attention to several technical details,
filtering out dry vegetation. Compared with healthy green vegetation,
                                                                                  according to data availability of the hyperspectral images; (2) the first-
dry vegetation has lower chlorophyll content and higher concentrations
                                                                                  order derivative of the spectrum is informative for SOC content. 1D
of lignin and cellulose (Elvidge, 1990). In contrast, bare soils also
                                                                                  CNN model could extract relevant spectral information from it and
lack strong chlorophyll absorption but contain very limited lignin and
                                                                                  achieve good regression results. On the basis of the 1D CNN model,
cellulose content. As a result, dry vegetation and bare soils exhibit
                                                                                  the spectral attention mechanism could further focus on useful bands;
similar spectral signatures in the VNIR range but differ in the SWIR
                                                                                  (3) topographical features could provide compensation information
range, making NDVI insufficient for their differentiation.
                                                                                  with spectral input for the SOC regression task. The modalities fusion
    To fully address this issue of dry vegetation, the SWIR bands would
                                                                                  method could fuse both spectral and topographical input and then
be essential for detecting lignin and cellulose (Jacques et al., 2014).
                                                                                  achieve the best result. Overall, this work successfully built the first
Based on the SWIR bands, other indices, such as Cellulose Absorp-
                                                                                  hyperspectral temporal composites from DESIS and develops deep-
tion Index (CAI) (Daughtry, 2001), could be useful to distinguish dry
                                                                                  learning methods for SOC estimation. In the future, other soil property
vegetation from bare soils.
                                                                                  data and hyperspectral images could be further explored. Moreover,
                                                                                  deep learning algorithms such as self-supervised learning and few-shot
4.4. Future works                                                                 learning can be promising research directions.
    In this study, the temporal composites from hyperspectral images              CRediT authorship contribution statement
for a large area were generated for the first time and used for SOC
estimation. The experimental pipeline, including composite generation,               Xiangyu Zhao: Writing – original draft, Visualization, Software,
feature engineering, and deep learning, shows good performance in                 Methodology, Investigation, Formal analysis, Data curation, Concep-
estimating SOC for agricultural soils of Bavaria. In the next steps, there        tualization. Zhitong Xiong: Writing – review & editing, Supervision.
will be future work on both data and method aspects.                              Paul Karlshöfer: Writing – review & editing, Data curation. Nikolaos
    From the data aspect, other soil properties such as texture, soil             Tziolas: Writing – review & editing, Methodology. Martin Wiesmeier:
nitrogen, and potassium content in different regions will be interesting          Writing – review & editing, Data curation. Uta Heiden: Writing – re-
to explore. Further studies could be a good way to investigate the                view & editing, Validation, Project administration, Methodology, Data
generability of the developed method. Besides the DESIS satellite,                curation, Conceptualization. Xiao Xiang Zhu: Writing – review &
there are some other satellite images available for this task, such               editing, Supervision, Resources, Project administration, Methodology,
as EnMAP hyperspectral images and Sentinel-2 multispectral images.                Funding acquisition, Conceptualization.
EnMAP has a larger spectral range, covering the related reflectance
for more soil properties. But it also suffers from data limitation with           Declaration of competing interest
less spatial coverage. Sentinel-2 only has 10 bands, but it possesses
full spatial coverage with time series collection. The properties from                The authors declare that they have no known competing finan-
EnMAP and Sentinel-2 could compensate for the DESIS data. In the                  cial interests or personal relationships that could have appeared to
future, modalities fusion from these satellites could also become a               influence the work reported in this paper.
promising approach (Li et al., 2022). Moreover, since hyperspectral
images suffer a lot from data scarcity, different hyperspectral images            Acknowledgments
such as DESIS, EnMAP, and PRISMA might be able to be used together
to create more densely distributed reflectance composites. Additionally,              The work is jointly supported by Munich Center for Machine Learn-
by incorporating other hyperspectral data that covers SWIR range like             ing, by the German Federal Ministry of Education and Research (BMBF)
EnMAP/PRISMA, the bare soil pixels could be differentiated from dry               in the framework of the international future AI lab ‘‘AI4EO – Artificial
vegetation and better identified.                                                 Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and
    From the methodological perspective, more sophisticated deep                  Beyond’’ (grant number: 01DD20001), by German Federal Ministry for
learning algorithms could be useful for SOC regression. In many studies           Economic Affairs and Climate Action in the framework of the ‘‘national
                                                                             13
X. Zhao et al.                                                                                      International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
center of excellence ML4Earth’’ (grant number: 50EE2201C) and by the                               Heiden, U., d’Angelo, P., Schwind, P., Karlshöfer, P., Müller, R., Zepp, S., Wies-
German Research Foundation (DFG GZ: ZH 498/18-1; Project number:                                        meier, M., Reinartz, P., 2022. Soil reflectance composites—improved thresholding
                                                                                                        and performance evaluation. Remote. Sens. 14 (18), 4526.
519016653), and by the German Federal Ministry for the Environ-
                                                                                                   Jacques, D.C., Kergoat, L., Hiernaux, P., Mougin, E., Defourny, P., 2014. Monitoring
ment, Nature Conservation, Nuclear Safety and Consumer Protection                                       dry vegetation masses in semi-arid areas with MODIS SWIR bands. Remote Sens.
(BMUV) based on a resolution of the German Bundestag (grant number:                                     Environ. 153, 40–49.
67KI32002B; Acronym: EKAPEx). The authors would like to thank the                                  Johnston, A.E., Poulton, P.R., Coleman, K., 2009. Soil organic matter: its importance
                                                                                                        in sustainable agriculture and carbon dioxide fluxes. Adv. Agron. 101, 1–57.
Bavarian State Research Center for Agriculture (LfL) and the Bavarian
                                                                                                   Kühnel, A., Wiesmeier, M., Kögel-Knabner, I., Spörlein, P., 2020. Veränderungen der
State Office for the Environment (LfU) for providing the soil datasets.                                 humusqualität und -quantität bayerischer böden im klimawandel.
                                                                                                   LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436–444.
Data availability                                                                                  Li, J., Hong, D., Gao, L., Yao, J., Zheng, K., Zhang, B., Chanussot, J., 2022. Deep
                                                                                                        learning in multimodal remote sensing data fusion: A comprehensive review. Int.
                                                                                                        J. Appl. Earth Obs. Geoinf. 112, 102926.
    The authors do not have permission to share data.                                              Li, Y., Zhang, H., Shen, Q., 2017. Spectral–spatial classification of hyperspectral imagery
                                                                                                        with 3D convolutional neural network. Remote. Sens. 9 (1), 67.
                                                                                                   Lundberg, S.M., Lee, S.-I., 2017. A unified approach to interpreting model predictions.
References                                                                                              Adv. Neural Inf. Process. Syst. 30.
                                                                                                   McBratney, A.B., Santos, M.M., Minasny, B., 2003. On digital soil mapping. Geoderma
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M., 2019. Optuna: A next-generation                  117 (1–2), 3–52.
    hyperparameter optimization framework. In: Proc. 25th ACM SIGKDD Int. Conf.                    Möller, M., Zepp, S., Wiesmeier, M., Gerighausen, H., Heiden, U., 2022. Scale-specific
    Knowl. Discov. Data Min.. pp. 2623–2631.                                                            prediction of topsoil organic carbon contents using terrain attributes and SCMaP
Angelopoulou, T., Tziolas, N., Balafoutis, A., Zalidis, G., Bochtis, D., 2019. Remote                   soil reflectance composites. Remote. Sens. 14 (10), 2295.
    sensing techniques for soil organic carbon estimation: A review. Remote. Sens. 11              Mou, L., Ghamisi, P., Zhu, X.X., 2017. Deep recurrent neural networks for hyperspectral
    (6), 676.                                                                                           image classification. IEEE Trans. Geosci. Remote Sens. 55 (7), 3639–3655.
Anne, N.J., Abd-Elrahman, A.H., Lewis, D.B., Hewitt, N.A., 2014. Modeling soil                     Mou, L., Zhu, X.X., 2019. Learning to pay attention on spectral domain: A spectral at-
    parameters using hyperspectral image reflectance in subtropical coastal wetlands.                   tention module-based convolutional network for hyperspectral image classification.
    Int. J. Appl. Earth Obs. Geoinf. 33, 47–56.                                                         IEEE Trans. Geosci. Remote Sens. 58 (1), 110–122.
                                                                                                   Mzid, N., Castaldi, F., Tolomio, M., Pascucci, S., Casa, R., Pignatti, S., 2022. Evaluation
Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A., 2017. Gated multimodal
                                                                                                        of agricultural bare soil properties retrieval from Landsat 8, Sentinel-2 and PRISMA
    units for information fusion. arXiv preprint arXiv:1702.01992.
                                                                                                        satellite data. Remote. Sens. 14 (3), 714.
Bartholomeus, H., Kooistra, L., Stevens, A., van Leeuwen, M., van Wesemael, B., Ben-
                                                                                                   Nalepa, J., Le Saux, B., Longépé, N., Tulczyjew, L., Myller, M., Kawulok, M.,
    Dor, E., Tychon, B., 2011. Soil organic carbon mapping of partially vegetated
                                                                                                        Smykala, K., Gumiela, M., 2022. The hyperview challenge: Estimating soil parame-
    agricultural fields with imaging spectroscopy. Int. J. Appl. Earth Obs. Geoinf. 13
                                                                                                        ters from hyperspectral images. In: Proc. 2022 IEEE Int. Conf. Image Process.. ICIP,
    (1), 81–88.
                                                                                                        IEEE, pp. 4268–4272.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
                                                                                                   Nocita, M., Stevens, A., Noon, C., van Wesemael, B., 2013. Prediction of soil organic
Castaldi, F., Hueni, A., Chabrillat, S., Ward, K., Buttafuoco, G., Bomans, B., Vreys, K.,
                                                                                                        carbon for different levels of soil moisture using vis-NIR spectroscopy. Geoderma
    Brell, M., van Wesemael, B., 2019. Evaluating the capability of the Sentinel 2 data
                                                                                                        199, 37–42.
    for soil organic carbon prediction in croplands. ISPRS J. Photogramm. Remote Sens.
                                                                                                   Odebiri, O., Odindi, J., Mutanga, O., 2021. Basic and deep learning models in remote
    147, 267–282.
                                                                                                        sensing of soil organic carbon estimation: A brief review. Int. J. Appl. Earth Obs.
Daughtry, C.S., 2001. Discriminating crop residues from soil by shortwave infrared
                                                                                                        Geoinf. 102, 102389.
    reflectance. Agron. J. 93 (1), 125–131.
                                                                                                   Orgiazzi, A., Ballabio, C., Panagos, P., Jones, A., Fernández-Ugalde, O., 2018. LUCAS
Demattê, J.A.M., Fongaro, C.T., Rizzo, R., Safanelli, J.L., 2018. Geospatial soil sensing
                                                                                                        soil, the largest expandable soil dataset for Europe: a review. Eur. J. Soil Sci. 69
    system (GEOS3): A powerful data mining procedure to retrieve soil spectral
                                                                                                        (1), 140–153.
    reflectance from satellite images. Remote Sens. Environ. 212, 161–175.
                                                                                                   Padarian, J., Minasny, B., McBratney, A., 2019. Using deep learning to predict soil
Diek, S., Fornallaz, F., Schaepman, M.E., De Jong, R., 2017. Barest pixel composite for
                                                                                                        properties from regional spectral data. Geoderma Reg. 16, e00198.
    agricultural areas using landsat time series. Remote. Sens. 9 (12), 1245.
                                                                                                   Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T.,
Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B.,
                                                                                                        Lin, Z., Gimelshein, N., Antiga, L., et al., 2019. Pytorch: An imperative style,
    Isola, C., Laberinti, P., Martimort, P., et al., 2012. Sentinel-2: ESA’s optical high-
                                                                                                        high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32.
    resolution mission for GMES operational services. Remote Sens. Environ. 120,                   Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
    25–36.                                                                                              Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Dvorakova, K., Heiden, U., van Wesemael, B., 2021. Sentinel-2 exposed soil composite                    Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine
    for soil organic carbon prediction. Remote. Sens. 13 (9), 1791.                                     learning in python. J. Mach. Learn. Res. 12, 2825–2830.
Einzmann, K., Atzberger, C., Pinnel, N., Glas, C., Böck, S., Seitz, R., Immitzer, M.,              Rasel, S.M., Groen, T.A., Hussin, Y.A., Diti, I.J., 2017. Proxies for soil organic carbon
    2021. Early detection of spruce vitality loss with hyperspectral data: Results of an                derived from remote sensing. Int. J. Appl. Earth Obs. Geoinf. 59, 157–166.
    experimental study in Bavaria, Germany. Remote Sens. Environ. 266, 112676.                     Rast, M., Painter, T.H., 2019. Earth observation imaging spectroscopy for terrestrial
Elvidge, C.D., 1990. Visible and near infrared reflectance characteristics of dry plant                 systems: An overview of its history, techniques, and applications of its missions.
    materials. Remote. Sens. 11 (10), 1775–1795.                                                        Surv. Geophys. 40 (3), 303–331.
European Space Agency (ESA), 2019. ESA copernicus digital elevation model (DEM).                   Rogge, D., Bauer, A., Zeidler, J., Mueller, A., Esch, T., Heiden, U., 2018. Building
    http://dx.doi.org/10.5270/ESA-c5d3d65.                                                              an exposed soil composite processor (SCMaP) for mapping spatial and temporal
Geladi, P., Kowalski, B.R., 1986. Partial least-squares regression: a tutorial. Anal. Chim.             characteristics of soils with landsat imagery (1984–2014). Remote Sens. Environ.
    Acta 185, 1–17.                                                                                     205, 1–17.
German Aerospace Center (DLR), 2019. DESIS - Hyperspectral images - global. http:                  Rouse, J.W., Haas, R.H., Schell, J.A., Deering, D.W., et al., 1974. Monitoring vegetation
    //dx.doi.org/10.15489/hxom21uqeo90.                                                                 systems in the great plains with ERTS. NASA Spec. Publ. 351 (1), 309.
Gholizadeh, A., Žižala, D., Saberioon, M., Borŭvka, L., 2018. Soil organic carbon and             Schmitt, M., Zhu, X.X., 2016. Data fusion and remote sensing: An ever-growing
    texture retrieving and mapping using proximal, airborne and Sentinel-2 spectral                     relationship. IEEE Geosci. Remote. Sens. Mag. 4 (4), 6–23.
    imaging. Remote Sens. Environ. 218, 89–103.                                                    Steinier, J., Termonia, Y., Deltour, J., 1972. Smoothing and differentiation of data by
Gomez, C., Rossel, R.A.V., McBratney, A.B., 2008. Soil organic carbon prediction by                     simplified least square procedure. Anal. Chem. 44 (11), 1906–1909.
    hyperspectral remote sensing and field vis-NIR spectroscopy: An Australian case                Stevens, A., Udelhoven, T., Denis, A., Tychon, B., Lioy, R., Hoffmann, L., Van Wese-
    study. Geoderma 146 (3–4), 403–411.                                                                 mael, B., 2010. Measuring soil organic carbon in croplands at regional scale using
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., Moore, R., 2017.                      airborne imaging spectroscopy. Geoderma 158 (1–2), 32–45.
    Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sens.            Storch, T., Honold, H.-P., Chabrillat, S., Habermeyer, M., Tucker, P., Brell, M.,
    Environ. http://dx.doi.org/10.1016/j.rse.2017.06.031.                                               Ohndorf, A., Wirth, K., Betz, M., Kuchler, M., et al., 2023. The EnMAP imaging
Guo, Z., Adhikari, K., Chellasamy, M., Greve, M.B., Owens, P.R., Greve, M.H., 2019.                     spectroscopy mission towards operations. Remote Sens. Environ. 294, 113632.
    Selection of terrain attributes and its scale dependency on soil organic carbon                Tagliabue, G., Boschetti, M., Bramati, G., Candiani, G., Colombo, R., Nutini, F.,
    prediction. Geoderma 340, 303–312.                                                                  Pompilio, L., Rivera-Caicedo, J.P., Rossi, M., Rossini, M., et al., 2022. Hybrid
Hang, R., Li, Z., Liu, Q., Ghamisi, P., Bhattacharyya, S.S., 2020. Hyperspectral image                  retrieval of crop traits from multi-temporal PRISMA hyperspectral imagery. ISPRS
    classification with attention-aided CNNs. IEEE Trans. Geosci. Remote Sens. 59 (3),                  J. Photogramm. Remote Sens. 187, 362–377.
    2281–2293.                                                                                     Tziolas, N., Tsakiridis, N., Heiden, U., van Wesemael, B., 2024. Soil organic carbon
Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H., 2009. The Elements of                       mapping utilizing convolutional neural networks and earth observation data, a case
    Statistical Learning: Data Mining, Inference, and Prediction, vol. 2, Springer.                     study in Bavaria state Germany. Geoderma 444, 116867.
                                                                                              14
X. Zhao et al.                                                                                       International Journal of Applied Earth Observation and Geoinformation 140 (2025) 104504
van Wesemael, B., Abdelbaki, A., Ben-Dor, E., Chabrillat, S., d’Angelo, P., Demattê, J.A.,          Xu, Z., Zhao, X., Guo, X., Guo, J., 2019. Deep learning application for predicting soil
    Genova, G., Gholizadeh, A., Heiden, U., Karlshoefer, P., et al., 2024. A European                   organic matter content by VIS-NIR spectroscopy. Comput. Intell. Neurosci. 2019
    soil organic carbon monitoring system leveraging Sentinel 2 imagery and the LUCAS                   (1), 3563761.
    soil data base. Geoderma 452, 117113.                                                           Xue, J., Zhang, X., Huang, Y., Chen, S., Dai, L., Chen, X., Yu, Q., Ye, S., Shi, Z., 2024.
Vaudour, E., Gomez, C., Fouad, Y., Lagacherie, P., 2019. Sentinel-2 image capacities to                 A two-dimensional bare soil separation framework using multi-temporal Sentinel-2
    predict common topsoil properties of temperate and Mediterranean agroecosystems.                    images across China. Int. J. Appl. Earth Obs. Geoinf. 134, 104181.
    Remote Sens. Environ. 223, 21–33.                                                               Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C.,
Wang, S., Guan, K., Zhang, C., Lee, D., Margenot, A.J., Ge, Y., Peng, J., Zhou, W.,                     Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S., et al., 2022. ESA
    Zhou, Q., Huang, Y., 2022. Using soil library hyperspectral reflectance and                         WorldCover 10 m 2021 v200. Zenodo.
    machine learning to predict soil organic carbon: Assessing potential of airborne                Zepp, S., Heiden, U., Bachmann, M., Wiesmeier, M., Steininger, M., Van Wesemael, B.,
    and spaceborne optical soil sensing. Remote Sens. Environ. 271, 112914.                             2021. Estimation of soil organic carbon contents in croplands of Bavaria from
Wang, J., He, T., Lv, C., Chen, Y., Jian, W., 2010. Mapping soil organic matter based                   SCMaP soil reflectance composites. Remote. Sens. 13 (16), 3141.
    on land degradation spectral response units using Hyperion images. Int. J. Appl.                Zeraatpisheh, M., Garosi, Y., Owliaie, H.R., Ayoubi, S., Taghizadeh-Mehrjardi, R.,
    Earth Obs. Geoinf. 12, S171–S180.                                                                   Scholten, T., Xu, M., 2022. Improving the spatial prediction of soil organic carbon
Wiesmeier, M., Barthold, F., Spörlein, P., Geuß, U., Hangen, E., Reischl, A., Schilling, B.,            using environmental covariates selection: A comparison of a group of environmental
    Angst, G., von Lützow, M., Kögel-Knabner, I., 2014. Estimation of total organic                     covariates. Catena 208, 105723.
    carbon storage and its driving factors in soils of Bavaria (southeast Germany).                 Zhao, W., Wu, Z., Yin, Z., Li, D., 2022. Attention-based CNN ensemble for soil organic
    Geoderma Reg. 1, 67–78.                                                                             carbon content estimation with spectral data. IEEE Geosci. Remote. Sens. Lett. 19,
Wiesmeier, M., Burmeister, J., 2022. 35 Jahre boden-dauerbeobachtung in der                             1–5.
    landwirtschaft - band 4: Humus.                                                                 Zhu, X.X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F., Fraundorfer, F., 2017. Deep
Woodcock, C.E., Allen, R., Anderson, M., Belward, A., Bindschadler, R., Cohen, W.,                      learning in remote sensing: A comprehensive review and list of resources. IEEE
    Gao, F., Goward, S.N., Helder, D., Helmer, E., et al., 2008. Free access to Landsat                 Geosci. Remote. Sens. Mag. 5 (4), 8–36.
    imagery. Sci. 320, 1011.
15