Clustering Analysis of Soil Environmental Quality for Perennial Crop Recommendations in Vinh Long Province, Vietnam

The study aimed to evaluate the soil environmental characteristics of Vinh Long Province’s perennial crop-grow-ing area using principal component analysis (PCA) and cluster analysis (CA). Soil environmental quality data were collected in eight districts of Vinh Long province for 27 physical and chemical parameters. CA and PCA analysis was used to group and identify critical parameters affecting perennial crops’ soil environment. The findings dem - onstrate low to moderate soil compaction porosity, buffering capacity, and structure for perennial crops. In addi - tion, the soil has a low pH, electrical conductivity, total soluble salts, aluminum, and cation exchange capacity. Although rich in nutrients, the content of organic matter, available phosphorus, cations, and trace elements is only low to moderate. CA results showed three districts suitable for strongly developing perennial crops: Tra On, Mang Thit, and Vung Liem. The PCA results showed that except for density, the buffer capacity of the soil, and dissolved Al 3+ , the upcoming monitoring program must incorporate all remaining criteria. The study’s findings offer crucial information to help the management organization devise strategies for enhancing and sustainably expanding perennial crops in the province. It is necessary to further evaluate the soil’s environmental quality over time and soil depth and determine the frequency of monitoring in the study area.


INTRODUCTION
Due to its many applications, data mining and artificial intelligence communities are interested in data clustering. Without class information, it is challenging to identify the clustering scheme that best matches the raw partitions. When samples from the same group are comparable and grouped in clusters, clustering automatically separates the data points into the many groups the samples belong to. The data set has been divided into groups based on commonalities using a variety of clustering approaches that have been provided. These algorithms can generally be divided into viable clusters, soft (fuzzy), and complicated (sharp) methods [Höppner et al., 1999;Krishnapuram & Keller, 1993; Thompson et al., 1974].
Among various methods for choosing the essential variables (as components), principal component analysis (PCA) is one. It extracts low and high dimensions from the featured dataset to get as much data as possible. Also, visualization gains a lot more significance when variables are added. PCA is helpful when working with data with three dimensions or more. Asymmetric covariance or correlation analysis is performed. The underlying problem with multivariate statistics is that it is difficult to visualize data with several variables. Many variable groupings frequently move together in datasets with numerous variables. Multiple variables can measure the system's controlling principle. The abundance of usefulness makes measuring a wide range of characteristics possible. Information redundancy can be used to your advantage when it occurs. A new variable can be used instead of a group of existing variables to simplify the issue.
Agricultural land is one of the essential components of the ecosystem and a place to produce food for humans. According to Mulat et al. (2021), sustainable agriculture and production productivity are closely related to the soil environment quality. Therefore, soil environmental quality assessment has always been considered essential for managing and assessing soil property changes over space and time [Sulyman, 2018]. In addition, assessing soil environmental quality according to different land use purposes in areas with a tropical monsoon climate is very useful when making land use decisions and implementing sustainable land environmental management [Orobator, 2019]. Yet, because both natural factors (such as temperature, humidity, and rainfall) and human activities frequently change the qualities of the soil environment, it can be difficult to accurately determine the soil environmental quality (excessive fertilizers and pesticides) [Du et al., 2018;Saleh et al., 2021].
Vinh Long is a province located in the Mekong Delta, close to the lowest reaches of the Mekong River and positioned between the Tien and Hau Rivers. Compared to other regions of the Mekong Delta, this province had early and diverse agricultural growth. The province's total agricultural land area is 120,490.1 ha, accounting for 78.97% of the province's total natural land area. According to land use type, the province's agricultural land is divided into rice, perennial, annual, aquaculture, and other agricultural land. With nearly 50,000 hectares of land for perennial crops, Vinh Long has much potential to develop fruit trees such as oranges, pomelos, longans, and rambutans and bring high incomes to farmers in the province. However, in recent years, saline intrusion and excessive pesticide use have changed perennial crop soil properties [Department of Natural Resources and Environment of Vinh Long Province, 2020]. Therefore, to boost agricultural productivity in Vinh Long province in the Mekong Delta, Vietnam, and as a foundation for wise land use planning, it is required to constantly monitor and analyze the soil characteristics of the perennial fishing region. Principal component and cluster analysis are the most common multivariate statistical techniques for finding, categorizing, and minimizing the number of variables in soil environmental quality evaluation [Jiang et al., 2020;Tian et al., 2022]. Therefore, this study evaluated the soil environmental quality factor affecting perennial crops in Vinh Long province by clustering and the principle component analysis method. The study's findings offer valuable scientific data that can help sustainably manage agriculture and increase agricultural output.

Study area
The natural area of Vinh Long is 1,525.73 km 2, or 3.74% of the Mekong Delta's total area. The province of Vinh Long is located on a plain with a relatively flat landscape. Less than a 5 o slope is present. The Tien, Co Chien and Hau Rivers run through the province. The subsurface, shaped like a basin, gradually rises along the Tien River, Hau River, Mang Thit River banks, and other significant rivers and canals from the province's center. Vinh Long is a tropical monsoon region with an annual average temperature of 27.4°C, an average humidity of 83%, and an average precipitation of 1,409 mm. It demonstrates the area's ideal climate for agricultural growth and development. The land in Vinh Long province has almost all been used up and has direct access to the sea. The province's four main types of soil resources are alluvial, sandy, and acid sulfate. Rice farming primarily uses 120,490.1 hectares of agricultural land (71,642.0 ha). Perennial crops, which comprise 46,676.5 hectares, or 38.74% of agricultural land, are planted primarily in fruit trees and are concentrated on islets between the Tien and Hau rivers and canals along the traffic axis in the shape of gardens and dwellings. Most are in the Long Ho, Tra On, and Vung Liem districts. The administrative map of the study area is shown in Figure 1.

Materials
Soil samples were collected in Vinh Long province's city and districts, including Vinh Long cities, Binh Minh town, Long Ho, Mang Thit, Vung Liem, Tam Binh, Binh Tan, and Tra On districts ( Figure 1). The soil was analyzed with 27 parameters for agricultural production. The soil's physical properties include soil texture composition (sand, loamy sand, silt, clay), density, and buffering capacity. In addition, the chemical composition of the soil, including its acidity, organic matter (OM), total nitrogen (N), total phosphorus (P), available phosphorus (P 2 O 5 ), soluble salts, pH H2O , electrical conductivity (EC), and exchangeable cations (K + , Na + , Mg 2+ , Ca 2+ , Al 3+ , Mn 2+ , and Zn) are all essential factors.

Clustering analysis
Clustering is the process of grouping data objects into several groups with a high degree of similarity and a high degree of dissimilarity between them. Also, while things between groupings are rarely comparable, objects inside clusters have a high intraclass similarity. Thus, an effective clustering algorithm should yield results that have these characteristics (low interclass similarity).
This study applies clustering analysis (CA) to group soil sample survey locations according to physicochemical parameters. The CA results are displayed as a dendrogram, which groups the places with comparable soil environmental quality [Ibrahim, 2015].
The purpose of the clustering algorithm is to divide the given data set X ={x 1 , x 2 , …, x n } into groups X 1 , X 2 , …, X C . The partition matrix U of size c×n may be represented as U = [u ij ], i = 1, 2, …, c, and j = 1, 2, …, n, where u ij is the membership of the sample x j to cluster X i . In the case of crisp partitioning, the following condition should be satisfied: u ij = 1 if x j ∈ X i ; otherwise, u ij = 0 [Mulat et al., 2021].

Principle component analysis
Conversely, principal component analysis (PCA) is used to identify the factors that significantly impact the research area's soil environment quality. Using PCA allows us to condense the representation space and retain only the parameters that offer discriminating information. The more significant the eigenvalue is in explaining the variability of the data, the larger it is, and it should ideally be greater than 1 [Mohab & Abdulaziz, 2022]. The correlation of load factors explains the data variables. With absolute values of load factor larger than 0.75, between 0.5 and 0.7, and less than 0.5, respectively, they are categorized into three levels: "strong," "moderate," and "weak" [Islam et al., 2017].
Even though various clustering analysis criteria lead to a range of clustering results, the following stages must typically always be followed by all clustering algorithms: a) Data preparation: data preprocessing; b) Feature selection and extraction: choose valuable attributes to preserve as much information in the processed data as feasible; c) Clustering: based on the data structure and properties, choose the suitable clustering algorithm; d) Clustering effectiveness evaluation: to confirm the outcomes of the clustering, choose the suitable cluster effectiveness index., and e) Results analysis: The clustering results were examined with experimental data, and the proper conclusion was drawn.
The selection of the clustering technique and efficiency index will immediately impact the reliability and accuracy of the clustering results [Mahi et al., 2015].
Primer V5.2 for Windows licensing software was used to analyze PCA and CA analyses (PRIMER-E Ltd., Plymouth, UK). Table 1 shows the investigation findings for the soil's physical properties. The texture composition of soil samples in the study area is mainly loamy (0.002-0.02 mm), with an average value of 52.83%. Next is the clay particle size (<0.002 mm), which accounts for 41.72%, and the sand content (0.02-2 mm) in the soil is relatively low compared to the clay content of 5.46%. The soil sample's texture composition in the study area is mainly silty to clay soil, which is suitable for growing perennial crops. Soil with compacted surface bulk density at locations ranges from 1.1 to 1.3 g/cm 3 . The average value is 1.2 g/cm 3 . Otherwise, the average soil density in the study area reached 2.52 g/cm 3 . According to Nam et al. (2021), soils with a bulk density of less than 2.5 g/cm 3 are assessed to have high humus content. However, if the soil bulk density is more significant than 1.2 g/cm 3 , it will lead to difficulties in cultivation. Compacted soil limits water uptake and increases surface runoff and soil erosion, reducing soil nutrient content [Odey, 2018]. It shows that Long Ho, Mang Thit, Binh Tan, and Tra On districts may face more difficulties growing perennial crops than other districts.

Soil physical properties characterization
The surface soil porosity in the study area had an average value of 48.74%, the highest in the Long Ho district (55.75%). Soil porosity should be above 50% because, according to Afriawan et al. (2021), it improves plant growth conditions and boosts soil respiration and microbial activity. Thus, only soil samples in Long Ho and Mang Thit districts have medium porosity, which is quite good for the growth of perennial plants. The remaining sites, on the other hand, all have soil porosities of less than 50%, indicating that the soil in these areas exhibits evidence of compaction and poor aeration and considerably impairs plant root absorption. Therefore, surface soil must often be tilled to promote soil porosity and aeration for plant root development in the research area and maximum nutrient absorption. Table 2 shows the findings from examining chemical parameters in the soil. The soil pH of the study area ranges from 4.40 to 5.90 for pH H2O and from 3.70 to 4.40 for pH KCl . Low soil pH for perennial crops may be due to the depletion of cations during long cultivation, and the high use of inorganic fertilizers has increased H + ions in the soil [Mulat et al., 2021]. In addition, the growth of plant roots and the nutrient absorption mechanisms of plants also affect the release of H + ions into the soil [Dung et al., 2020]. According to Msimbira & Smith (2020), acidic soil would significantly impacts the growth and plant development and microorganisms in the soil. Otherwise, it will be difficult for plants to absorb the necessary minerals and nutrients, reducing crop yield. The EC value in the soil for perennial crops in the study area is relatively low, ranging from The findings indicate that the soil samples had low to moderate organic matter content, with an average value of 3.08% and a range of 2.50% to 3.60%. However, the highest was in the Vung Liem district (3.60%), and the lowest was in the Binh Tan district (2.50%). Due to the high rate of soil organic matter oxidation and the comparatively long duration of perennial crops, the soil's relatively low organic matter content can be linked to both factors [Mulat et al., 2021]. Therefore, low soil organic matter content will reduce soil fertility and significantly affect crop yields [Gerke, 2022].

Soil chemical properties characterization
The concentration of soluble salt differs significantly between sampling locations in the study area, ranging from 0.01 to 0.12. The highest concentration of soluble salts was in Binh Minh Town (0.12%), while the lowest was in Vinh Long City (0.01%). According to Chau et al. (2021), the soil environment in the study area was assessed to be free of salinity (dissolved salt concentration < 0.128%) and had a negligible effect on the yield of the crops. In addition, according to the Department of Natural Resources and Environment of Vinh Long province (2020), it is said that people combine land work, shoveling, and pumping irrigation for crops in the dry season and during salinization. Therefore, the accumulated salinity in the soil is not high, which improves crop yield.
According to Table 2, the pH 7 buffer CEC value, which ranged from 12 to 15 meq/100g, did not substantially differ between the sampling sites. Meanwhile, the buffered mean CEC value at pH 8.1 increased to an average of 16.16 meq/100g, and the unbuffered mean CEC value was 12.95 meq/100g. Soil development, organic matter concentration, and clay mineral composition significantly impact the soil CEC. [Dung et al., 2020]. The buffer capacity of the soil in the perennial crop area of Vinh Long province is low to moderate, with an average value of 229.04 meq OH. Better soil ecosystem resilience is associated with high soil buffer capacity [Dvoáková et al., 2022].
There were no significant differences in the study area's total N content between the sampling sites, ranging from 0.13% to 0.22%. According to research by Chau et al. (2021), soil samples in the perennial crop area of Vinh Long province had moderate to rich total N content. The soil's environmental quality in the study area ensures an adequate supply of nitrogen nutrients for plants. However, when the total N content is too high, it will lead to soil hardening and pollution, while too low total N will reduce soil fertility [Ma et al., 2022].
All soil samples were rich in total P content (0.11-0.22%), but available P content was low to The results also showed that the available P content in soil samples with high total P content was also high. Therefore, some of the available P content in the soil was fixed. From there, providing available P for perennial crops to achieve high yields is necessary. Compared with the study of Dung et al. (2020), the amount of available P in the soil was lower (10-50 mg/100g) in the perennial crop region of Vinh Long province than it was in the pomelo growing region of Hau Giang province.
The research area's total K content ranged from 1.50% to 1.90% without significantly differing. However, according to Nam et al. (2021), K was found in moderate amounts overall in the study area. It is because perennial crops in the study area can meet their current crop needs and increase the overall K content of the soil. However, it is necessary to add potassium reserves to the soil for perennial crops, especially when the crop has a high demand for potassium. Therefore, the total K content was higher than that of the pomelo habitat (0.23%) in the study by Chau et al. (2021).
Sulfur is also necessary for growth and plant development. Sulfur is supplied to plants as an anion (SO 4 2-) and is readily leached out of the soil [Narayan et al., 2022]. Most of the sulfur in soil is present in organic matter and hence not accessible to the plants. Anionic form of sulfur shows that the total S content in soil samples is medium to high, ranging from 0.11 to 0.37% (Table 2). It shows that the total S content in the study area meets the demand for sulfur in crops. However, too high a soil sulfur concentration will also adversely affect plants through root damage, leaf burn, deformation, and reduced growth [Likus et al., 2018]. Therefore, each crop in the study region must maintain an appropriate total S concentration in the soil through proper management and fertilization regimes.
The exchange Al 3+ content in the perennial crop area was recorded at a low level and had an average value of 1.59 meq/100g, while the low concentration of dissolved Al 3+ had an average value of 1.29 meq/100g. Research results show that Al 3+ content in soil is less likely to be toxic to perennial crops in the study area. Hung et al. (2017) suggested that when soil pH is less than 4.5, Al 3+ is more soluble. In addition, organic fertilizers will help reduce the amount of Al 3+ exchanged and Al 3+ dissolved in the soil through chelation [Dung et al., 2018].
The findings demonstrated that exchanged K + and Na + concentrations were low to moderate in soil samples from perennial crops, ranging from 0.23 to 0.45 meq/100g and 0.27 to 0.79 meq/100g, respectively. As can be observed, the amount of K + and Na + in the soil has little to no impact on how well plants grow and develop. Its outcome is comparable to Dung's study [Dung et al., 2020]. The amounts of K + and Na + in the soil used for cultivating pomelo in the Chau Thanh district of Hau Giang province were similarly found to be moderate (0.36-1.60 meq/100g and 0.2-0.7 meq/100g, respectively). Meanwhile, the study area's exchangeable Ca 2+ and Mg 2+ concentrations were moderately high, ranging from 3.77 to 9.17 meq/100g and 1.20 to 4.53 meq/100g, respectively.
On the other hand, the exchangeable K + , Na + , Ca 2+, and Mg 2+ cations in the soil had an average value of 0.05 meq/100g, 0.51 meq/100g, 0.25 meq/100g and 0.16 meq/100g, respectively. Therefore, the content of exchangeable K + , Ca 2+, and Mg 2+ cations was low. At the same time, the Na + content in the soil was assessed as medium. This result is consistent with the study by Hieu et al. (2015). The exchangeable Na + content is always the highest and most highly variable among the four essential cations in the soil solution. Chaganti & Culman [2017] suggested that sodium is a substance that adversely affects the structure of the soil. On the contrary, calcium and magnesium facilitate soil cohesion and promote soil degradation. Therefore, reducing the exchangeable Na + content in the soil for perennial crops in the study area is necessary to ensure soil environmental quality and crop yield.
Finally, the analysis of some trace elements in the soil of perennial crops in Vinh Long province showed that Mn and Zn concentrations in soil samples ranged from low to moderate, with an average value of 46.73 mg/kg and 28.28 mg/ kg, respectively. The highest concentrations of Mn and Zn were recorded in Tra On district (93.70 mg/kg) and the Mang Thit district (90.93 mg/kg). It may be because Zn is a potent antagonist of P. Therefore, when the soil is poor in P, the content of Zn in the soil increases, and vice versa. Besides, when the soil is too rich in P, it will reduce the ability to supply Zn to the plants and cause an imbalance of nutrients [Loan et al., 2016]. Therefore, adding enough Mn and Zn to the soil in the study area is necessary to avoid an imbalance of nutrients when the content is too low and toxicity to plants when the concentration is too high.
In Vinh Long province, the perennial crop soils often have silt to clay soil with relatively high density, porosity, buffering capacity, and low soil structure stability. It also typically has a relatively low soil pH. Low to moderate concentrations of OM and CEC, low Al 3+ content, electrical conductivity, and total soluble salts indicate that the area has not been salinized and is not toxic to plants. The soil in the research area is also rich in N and P, although only low to moderate amounts of available P and moderate to high levels of total K and S are present. Base saturation, exchange cations, and trace elements are low to moderately prevalent. As a result, compared to other districts, the soil samples in the districts of Tra On, Mang Thit, and Vung Liem will be pretty favorable for vigorously developing perennial crops, notably citrus trees.

Clustering soil environmental quality in the study area
Cluster analysis using 27 physicochemical parameters of soil environmental quality in 2018 in 08 districts of Vinh Long province. At Euclidean distance 3, the soil sampling locations are divided into 6 clusters ( Figure 2). Clusters 1, 2, 3, and 5 have only one monitoring location: Binh Tan, Mang Thit, Long Ho, and Vinh Long City districts. Thus, the upcoming monitoring program must monitor these districts much more independently. Cluster 4 includes two districts, Vung Liem and Tra On. The main reason is that these two districts' physical and chemical characteristics are located in the eastern part of Vinh Long province. In addition, these two districts have the highest concentrations of OM, total P, available P, total K and S, and Al 3+ , and the highest soil buffering capacity compared to the remaining districts.
Meanwhile, cluster 6 includes Binh Minh and Tam Binh districts due to similar soil environmental quality characteristics such as low total soluble salts, low nutrients, and trace elements, so they are grouped into one cluster. Therefore, Tra On and Tam Binh districts were selected based on soil characteristics and properties for future monitoring. Generally, the districts selected for long-term monitoring and representing soil environmental quality characteristics in Vinh Long province are Tra On, Tam Binh, Binh Tan, Mang Thit, Long Ho, and Vinh Long city districts.
Principal components are a new collection of variables the PCA approach can provide [Shihab et al., 2020]. The original variables are combined linearly to form each primary component. As all Figure 2. Results of soil environmental quality classification significant components are orthogonal, there is no redundant data, laying the foundation for the orthogonal data space. A single axis in a matrix makes up the first principal component. The most viable option available for the first axis is the variable variance. Another axis in a matrix perpendicular to the first is related to the second primary component. The observation on this axis creates an additional new variable. Once more, this variable's variance is the highest of all the options available for the second axis. Table 3 presents the principal component analysis (PCA) results and identifies 7 factors that explain 100% of the variation in soil environmental quality. PC1 to PC6 are the main factors, explaining 97.3% of the variation with rates of 36.4%, 19.6%, 14.8%, 11.9%, 10.2%, and 4.4%, respectively. These PCs have eigenvalue values greater than 1, contributing significantly to the variation in the data. The PC7 is only regarded as a secondary source and does not substantially impact the variation in soil environmental quality in the research area.
The contribution of physical and chemically connected soil environmental quality factors, including density, porosity, and cation exchange capacity, explains PC1 (CEC buffered at pH 8.1). Like PC1, PC2 through PC5 are influenced by physical traits and mainly contribute via the cations exchange and Al 3+ , total soluble salts, Zn, and other elements. Soil nutrients (N, P, and K total, total S, easily digestible phosphorus) significantly affect soil environmental quality. In the meantime, Mn, unbuffered CEC, and saturated cations in the soil are the main contributors to PC6. The results of PCA analysis show that the sources affecting soil environmental quality in the perennial crop area of Vinh Long province are related to natural processes (pH KCl , pH H2O , EC, density, porosity, organic composition world, CEC, exchangeable Al 3+ , total soluble salts) and related sources of nutrients (OM, total N, P, K, total S, readily digestible phosphorus, exchangeable and saturated cations, Mn, Zn). This outcome aligns with Abdel-Fattah et al. (2021), suggesting that EC, organic matter, CEC, NPK, and soil mechanical composition significantly influence soil environmental quality.

CONCLUSIONS
The soil pH is low, the soil texture is primarily silty-clay, and the soil tends to be compacted by low porosity, according to the soil environmental quality characterization results of the perennial crop region of Vinh Long Province. The EC content, total soluble salts, Al 3+, and cation exchange capacity (CEC) were low. The buffer capacity is modest to moderate. While the soil's NPK concentration ranged from moderate to rich, its organic matter content was low to moderate. Low to moderate concentrations of cations, trace elements, and available phosphorus can impact plant growth. Soil samples in Tra On, Mang Thit, and Vung Liem districts are suitable for properly growing perennial crops.
CA analysis results suggest the Tra On, Tam Binh, Binh Tan, Mang Thit, Long Ho, and Vinh Long districts in the soil environmental quality monitoring program still represent the soil environmental characteristics of Vinh Long province. The results of the PCA analysis showed that there were 06 main PCs that explained 97.3% of the change in soil environmental quality in the study area. Key parameters that need to be continuously monitored include pH KCl , pH H2O , EC, density, porosity, texture composition, CEC, Al 3+ exchange, total dissolved salts, organic matter, a total of N, P, K, and S, available phosphorus, exchange and saturation cations, Mn, and Zn. Further studies must focus on assessing soil environmental characteristics over time and soil depth while determining the frequency of environmental monitoring of perennial crops in the study area.