Application of the Random Forest Model to Predict the Plasticity State of Vertisols

Vertisol plasticity is related to moisture content, and it requires an in-depth physicochemical characterization. This information allows us to use the land under the most adequate conditions and avoid soil physical degradation, especially its compaction. The objective of this study was to characterize the Vertisol in the Moroccan region of Doukkala-Abda and to predict soil plasticity based on the physicochemical parameters of soil, such as texture, electrical conductivity, Soil Organic Matter (SOM) and other chemical parameters for 120 samples. Determination of soil plasticity using Atterberg limits is a challenging and time-consuming method. Thus, this study aimed to develop a new model that can predict soil plasticity using the Random Forest algorithm. The soils presented homogeneity in the majority of physicochemical parameters, except a significant difference observed in the SOM and the electrical conductivity, which in turn influenced the soil plasticity state. The results showed significant and positive correlations between SOM, Soil Clay Content (SCC), Electrical Conductivity (EC), and plasticity in the Vertisol fields of the region. For the training phase, the model gave excellent results with a coefficient of determination of 0.995 and an RMSE of 0.164. Almost the same results were observed in the validation phase with a coefficient of determination of 0.974 and an RMSE of 0.361, which shows that the model succeeded in predicting plasticity in both phases. On the basis of these results, this model can be used for the plasticity prediction using other physicochemical parameters and the Random Forest Model. The prediction of soil plasticity is an important parameter to respect the timing of introducing machines/tools in the fields and avoid Vertisol degradation.


INTRODUCTION
Soil, as a particular resource, plays a vital role in the sustainability and survival of civilizations. It can secure the provision of food and other essential goods (Hillel, 2009). This resource presents different physicochemical and biological properties that influence their diverse potentialities of use, such as agronomic productivity (Anderson, 2005;Resende et al., 2014). Characterization of those properties is considered as a key element for adequate soil management and conservation (Severiano, et al., 2009).
Generally, the soil is characterized by rapid degradation rates and slow regeneration processes ( Van-Camp et al., 2004). This degradation can affect the productivity and the environment as well, which makes it more disastrous in case it occurs in the soil plastic state (Zuazo & Pleguezuelo, 2009). Thus, researchers have defined the plastic state as the range of soil moisture content in which soil has a plastic consistency.
The limits of this state (plastic limits) are also known as Atterberg limits in tribute to the work of Atterberg (Atterberg, 1911). In this state, soil can be sheared without forming cracks (Campbell & O'Sullivan, 1991). The moisture content that reveals the plastic properties of the soil can be determined by the plasticity index (P i ) (Gobinath, 2016;McBride et al., 2008).
In this context, soil plasticity or plastic limits were perceived through a specific approach (Campbell & O'Sullivan, 1991), which is the water film theory. In this approach, the soil is in its plastic consistency. That is to say that the clay particles are surrounded by the water and slide each other. Moreover, if any vertical force is applied, the soil water tension can be increased and leads to plastic deformation of the soil without any regeneration. The soil water retention capacity controls the delay of this plasticity. The latter depends on the physical properties, especially the clay content (% C) and SOM (Bahmani & Palangi, 2016;Merdun et al., 2008).
In this regard, the total water storage (maximum water quantity) that can be received by the soil is determined by the measurement of several soil properties (Nyvall, 2002). SOM has a high retention capacity, and it is known as an essential factor that can be practical for use in soil water conservation, especially in the agricultural sectors (Stone & Garrison, 1940). In this respect, several studies have specified the amount of SOM that can contribute to water retention (1% SOM hold around 233,750 L/ha) (Stevenson, 1994;Mengel, 2012). In another study, Emerson (1995) indicates that SOM can contribute between 2.2% to 12.5% of the available water. This contribution is determined by soil texture, treatment, and some parameters such as organic materials and bulk density. The water content is influenced not only by the presence of SOM, but also by the clay percentage.
Vertisols, in general, are very fertile soils in many developed and developing countries (Ahmad & Mermut, 1996). These soils are often used for agricultural productions (Coulombe et al., 1996). In the American stratification (US-SEA, 1975), Vertisols are characterized by several properties (shrinking/swelling) due to the variation of the moisture content (Booltink et al., 1993). However, direct measurement of soil hydraulic properties is very important but is costly and time-consuming (Minasny & Hartemink, 2011;Rustanto et al., 2017). Thus, the development of alternative, rapid, and inexpensive methods to estimate those properties is a good way for active and new investigations (Patil & Singh, 2016; Tomasella & Hodnett, 2004).
The majority of the environment modeling work requires quantitative soil information primarily at the regional scale (Gessler et al., 1996;Minasny et al., 2008;Hartemink & McBratney, 2008). The prediction techniques have become more popular, and they offer a faster way to estimate different soil attributes (Amanabadi et al., 2019). In this context, several studies have used the machine learning algorithms for modeling the soil classes The main objective of this study was to characterize the Vertisol in the Doukkala-Abda region of Morocco and to evaluate the effect of soil physical parameters on its moisture content in different fields of the region. Additionally, it aimed to examine the possibility of using the machine learning algorithms, the Random Forest method to predict the plasticity state of Vertisol in the region.

Localization of study zone
The study zone is part of the area of the coastal Meseta in southern Morocco. It is limited in the north and northwest by the Atlantic Ocean, in the east by Oued Oum Er-rbia and the plain of Chaouia, in the south by the Mouissate collines, and in the southwest of the Abda and Rhamna massif ( Fig. 1). From a geological view, the region is part of the domain of coastal Morocco (Meseta), which is framed by the chains of the Atlas and Rif.

The importance of Vertisol in the Doukkala region of Morocco
Several studies have attempted to map the geographical distribution of Vertisol in Morocco (Villar, 1953;Watteau, 1967;Bryssine, 1971). Generally, this soil type is localized in the plains of Doukkala, Chaouia, Zaer, Gharb, Loukkous, Tangier, Pre-Rif, Sais, Tadla and Haouz. In 1965, Wilbert estimated the area of Vertisol in Morocco that was about 0.2 Mha. Later on, the latter proved to be less than the real estimation of the area. Returning to previous studies, almost 6.5 Mha of soil studies that have been carried out over the last decades at different scales allowed us to estimate an area of 0.9 Mha of Vertisols distributed over several regions of Morocco. The values of the areas occupied by Vertisol across the country were cursorily estimated, as few studies have tried to carry out an accurate estimation of Vertisols in the regional scale. As it was reported previously, the determination of the Vertisols in the Doukkala-Abda region and their plasticity was the objective of this study for managing the agricultural practicability. The different samples are represented in Figure 2. Here, the analysis shows that Vertisols represent around 29% (187 879,4 ha) in the total area of the region, which is a high percentage ( Table 1).
The adequate management of this vast area can lead to improving the agricultural productivity and the economy. Moreover, it can reduce the physical degradation of the area, especially its compaction. Vertisols have high yield potential and are dedicated to intensive crop production. They are considered as the most sensitive soils to physical degradation (compaction) based on their high clay and water contents during agricultural operations.

Soil sampling and analyses
In the course of the research, 120 soil samples were collected from agricultural fields in different locations in the study area. The sampling period was between January and June 2019. These samples represent soils with different geological histories, under different climatic conditions and different soil management. Still, they almost have the same range of soil texture and OM content.
Here, all the soils collected are the Vertisols of the region. Vertisol plasticity was evaluated using the ATL method (Atterberg, 1911). A hydraulic press was used to evaluate the plasticity limits for the 120 soil samples (7 treatments and 4 replicates). Each sample (< 2 mm) was poured in a press cylinder (269.26 cm 3 ) and humidified by tap water (150 mL). Each replicate sample was pressed up to 6 bar during different days (1 st , 2 nd , 3 rd , 4 th , 5 th , 10 th and 15 th days) to show the variation of volume according to the moisture content. This method was developed in a previous study, in order to measure soil plasticity (Al Masmoudi et al., 2019).
The physicochemical evaluation of the Doukkala-Abda Vertisols was performed to characterize the quality of the samples. The parameters analyzed are texture (the percentage of clay, silt and sand) using Robinson Pipette, Electrical conductivity (EC) using an EM 38 conductivity meter, SOM using Walkey and Black method, Sodium oxide Na 2 O (extracted by ammonium acetate solution), N-NO 3 , N-NH 4 , Boron B, Iron Fe, Manganese Mn, Zinc Zn, Copper Cu and the plasticity for the 120 samples.

Statistical analyses of data
In order to analyze the relationship between the parameters and plasticity, a statistical analysis was carried using the Statistical Package for Social Sciences (SPSS). This is done to obtain various statistical parameters such as the min, max and mean of values in each parameter of the soil samples and evaluate the difference, using the Skewness and Kurtosis statistical tests.

Random Forest Model
Random forest (RF) is considered an algorithm that allows an exploration of data, analysis and predictive modeling (Breiman, 2001). This algorithm is relatively robust to errors and outliers. That is to say, when the number of trees in the forest is large, the generalization error converges. Still, the over fitting of the training dataset cannot be a problem (Han et al., 2012). Several parameters influence the RF accuracies, such as the strength of the individual classifiers and the level of dependence between each other. Maintaining the strength of individual classifiers is ideal and optimal without increasing their correlation. The

Fig. 2.
Localization of the soil sampling sites RF model is characterized by a potential that can improve prediction using the classification and regression trees (Breiman, 2001). Indeed, the trees are constructed using the entire dataset and the splits at each node are made from the best randomly selected subset of predictors from the entire suite of input variables, which prevents over-fitting (Liaw & Wiener, 2002). The most important parameters that must be defined are the number of trees (ntree) and the number of variables randomly sampled to be tested at each node (mtry).

Development of RF model and accuracy assessment
In constructing the predictive plasticity model, the RF model was trained using 75% of data that corresponds to 90 soil samples, and the remaining samples 25%, which corresponds to 30 samples, were used for model validation. The parameters used for the development of the model do not require any preprocessing (unlike, e.g., the Linear Multiple Regression) and this is one of the advantages of this machine learning model.
The performance of the model developed by Random Forest was examined by comparing the difference between the observed and predicted plasticity coefficient, using two parameters, the coefficient of determination (R 2 ) and the root mean square error (RMSE).
where: O i , P i and Ō are the observed, predicted and mean O i value at site i, respectively; and n is the number of samples.

Plasticity results
The plasticity state of the different Vertisols in the region (120 samples) shows different plasticity delay (per day). The plasticity durations vary from 3 days to 8 or 12 days. On the basis of these results, the farmers can manage their Vertisol for the agricultural practicability according to the plasticity delay of their soils, considering the physicochemical parameters and the texture. Indeed, the soil plasticity state can be evaluated concerning the soil moisture content in order to timely manage the induction of machines to avoid Vertisol degradation (soil compaction). The Atterberg limits conventionally translate these differences in Vertisol behavior as a function of the water content. Said differently, while the liquid limit characterizes the transition between the plastic state and the liquid state, the plastic limits mark the transition between the solid-state and the plastic state. Soil compaction is influenced by the moisture content. Dry soil is more difficult to compact than wet soil. If the water content increases to replace all the air occupying the pores, the soil cannot be compacted because the water is almost incompressible. The proctor compaction test is used to estimate the plasticity limits, that represent the phase when the soil is vulnerable to physical degradation (soil compaction). If the water content of a soil sample decreases, the soil successively changes from a liquid state to a plastic state, which is the most vulnerable state to compaction, for the soil is subject to irreversible deformation despite its stability, then to a solid-state.

Statistical analysis
The analytical data of the samples used for model building are presented in Table 2. There is some difference in the values of several evaluated soil parameters, as demonstrated in Table 2, the texture of different soils is almost the same, and the percentage of clay varies between 37 and 48% for the 120 soil samples. Sand and silt vary between 26.80-55.20% and 6.20-32.90%, respectively. Those soils present a percentage of SOM varied between 0.80-3.40% with a mean of 1.87%. Moreover, the electrical conductivity in the soil samples was important, reaching between 0.04-1.40 ms with a mean of 0.80 ms. The elevation of soil salinity in the coastal region is generally due to marine intrusion. Generally, data are highly skewed. Furthermore, CaCO 3 , Na 2 O, N-NO 3 , B, Mn and Zn represent high kurtosis values, which means the presence of several outliers. Low kurtosis in a data set is an indicator that this data has light tails or a lack of outliers for some parameters such as clay, sand, EC, SOM, pH, and soil plasticity.
Considering the 120 samples, a high amplitude was obtained for the studied parameters. They also represent variability in different parameters. This variability of data can contribute to generating more reliable models with possible use for the soils with similar conditions, since the used values contemplate a wide range of values of the analyzed properties. Figure 3 presents the different correlations between all parameters. Here, the clay content shows a significant correlation with the plasticity delay that goes in accordance with the high capacity of water retention. The same correlation was observed between the SOM and the plasticity is generally due to the capacity of OM to conserve water in the soil.
In the same figure, a significant correlation was found between the EC and the plasticity, which means that the soil can hold a greater quantity of water in the presence of some percentage of salinity. As it was expected, the different land uses of various areas and soil management practices, as well as the previous culture, can provide us with different values of several parameters that can, in turn, influence the plasticity timing and thus, the soil plasticity.

Random Forest results and the importance of variables
The performance of the Random Forest model was evaluated by calculating uncertainty indicators such as Coefficient of determination (R 2 ) and Root Mean Square Error (RMSE). For the training phase, the model gave excellent results with a coefficient of determination of 0.995 and an RMSE of 0.164 (Fig. 4). Almost the same results were observed in the validation phase with a coefficient of determination of 0.974 and an RMSE of 0.361, which shows that the model succeeded in predicting plasticity in both phases. According to these results, the model can be used for the prediction of this parameter in the case of the availability of the input data and especially the most important parameters in the prediction.
In the same sense, the main aims behind such an attempt to develop this model are to predict the plasticity delay of the Vertisols and to look for the parameters influencing the model development. Figure 5 shows the parameters that play an important role in the soil's capacity for water retention and, consequently, its duration in the plastic phase. According to the obtained results, the most relevant parameters are hydraulic conductivity, clay content and SOM. Meanwhile, the analysis of the same figure shows that the chemical parameters such as CaCO 3 , Zn, soil pH, Cu and Zn do not influence the capacity of the soil to hold the water. In other words, the model developed did not perform well due to the high variability of soil properties and low terrain variation.

DISCUSSION
Unfortunately, the studies that predict the soil plasticity using RF are scarce to non-existent, making it impossible to compare our prediction results to other findings. Nevertheless, a comparison can be made using the research that used this type of model, especially RF, to predict other soil parameters.
In that way, Rawls et al (2003), who based their study on the USDA and NRCS soil survey characterization database, have revealed that the soil water retention depends on the SOM and clay contents in the soil. Moreover, the increase of the SOM leads to increased water retention in the different soil textures. The latter is due to the deceases in the bulk density that affect the structure and aggregation of soil. In another system, the presence of the SOM in the O horizon can increase the soil waterholding capacity, which can, in turn, intercept the rainfall. In this regard, Hudson (1994) indicates that the increase of SOM from 0.5% to 3% leads to an increase in the water retention  capacity about the double in the three textures groups (sands, silt loams, and silty clay loams).
On the basis of our results, there was a significant correlation between OM and plasticity delay (R 2 = 0.949). Here, the SOM has a strong effect on the plastic limits. This effect was especially evident while analyzing the soils of similar texture with a range of SOM contents. Still, it will be more specific when we analyze the soils from different treatments of long-term field experiments. Blanco-Canqui et al (2006) has reported that soil management significantly affects the soil consistency, and he has found significant positive correlations between SOM and plasticity period in the agricultural soils, which is in agreement with our results presented previously (Fig.3) The plastic limits are affected not only by SOM but also by the soil texture, especially the clay content, as shown in the previous part ( Fig.3 and Fig.5). According to the results of this study, it was a positive correlation between the clay content and plasticity delay (R 2 = 0.9050).
As mentioned in the Keller study (Keller et al., 2012), soil plasticity is related to the specific surface area (SSA) of the soil particles for the small particles have a higher SSA than the large ones. In this context, we found that the plastic limits were much more strongly correlated with the clay content comparing with either the silt or sand content.
Furthermore, in some previous studies (Butler, 1955;McIntyre, 1976), authors have found that sometimes the correlation between plastic limits and the clay content may be very low. They revealed that the soil has no visible structure. The reasons behind this state may be the low SOM of these soils or the existence of micro-aggregated clay that consists of sub-plastic (Butler, 1955;Keller & Dexter, 2012;McIntyre, 1976) has come with an equation that relates the plasticity limits and the clay content: PL = = 21.28 (0.812) + 0.004 (0.0004) Clay 2 ; R 2 =0.578 (3) The management of soil moisture content, especially its plastic state, can control its practicability and disregard the physical degradation of soil, particularly its compaction. The compaction can occur in the cropping cycle and affect the crop yield and performance. This physical degradation can be more severe and intense when it is applied to a Vertisol under its plastic state. In a degradation phase, the behavior of the latter changes depending on its structure and type of dominant clay. The field crop losses were due to the intensity of agricultural traffic and tillage practices.

CONCLUSIONS
The Vertisols of the Doukkala region of Morocco represent an agronomic richness with regard to their importance in agronomic productivity. Unlike the other types of soil, their physicochemical characteristics show a specific identity. The Vertisols of this region cover a significant area compared to the total area of the region. Therefore, the good management of this type of soil and its state of humidity is an essential approach for its preservation and for the sustainability of its productivity. This study evaluated the link between soil plasticity and other physicochemical parameters of Vertisols. SOM, clay content and electrical conductivity are the most relevant parameters that can contribute to soil water storage. Here, it is moisture content that can control the plasticity limits. Indeed, the realization of a model in Random Forest method can give a precise and accurate prediction of the Vertisol plasticity delay under the semi-arid conditions and identify the Vertisol parameters that influence the model development.