Multivariate and Geostatistical Analyses of Groundwater Quality for Acid Rock Drainage at Waste Rock and Tailings Storage Site

A multi-disciplinary approach is indispensable for adequate acid rock drainage (ARD), mineral leaching impact, and groundwater management. Groundwater is a valuable resource, and it is critical to protect as well as mitigate the effects of pollution such as ARD in the mining environment. Mine waste storage facilities (waste rocks and tailings) are potential ARD sources capable of degrading groundwater reserves. This research investigated and reported the application of a case study of multivariate statistical and spatial variability of selected parameters associated with ARD in groundwater around WRD and TSF at mine sites. Water quality analysis data of seventy water samples from 10 boreholes located at the WRD and TSF mine were utilised in this study. The correlation matrix and principal components analysis was applied to the data set to determine the associated variability in groundwater in relation to ARD. Geostatistical analysis was used to produce contour maps to ARD principal components of the study site, using ordinary kriging of the best fit models. The application of multivariate statistical and geospatial analysis in groundwater quality assessment with coupled soil and groundwater modelling of flow and transport at waste rock dump and tailings storage sites provides an essential tool for exploratory data analysis, and spatial extent determination of the relationship between various data sets significant to acid rock drainage.


INTRODUCTION
Waste rock dumps (WRDs) and tailings storage facilities (TSFs) at a mine site are significant sources of acid rock drainage and potential groundwater contaminants. The central element associated with acid rock drainage in mining activities is sulphide, which can severely deteriorate the groundwater quality. The oxidation of sulphide minerals within the waste rock pile materials ensues in the generation of acidic solutions with a high level of toxic metals, which are potential groundwater contaminants (Sracek et  The magnitude of groundwater contamination during the operational phase of mining depends on factors such as groundwater-surface water interaction, soil and waste rock erosion intensity, levels of present toxic minerals and trace metals, sulphide mineral levels, available buffering agents, hydrogeological and climatic conditions (Wahsha et al., 2016Agboola et al., 2020Vriens et al., 2020;Ram et al., 2021). Groundwater hydrochemistry comprises variable characteristics, and therefore, the spatial dependence of water quality parameters is required to characterise the relationship between the hydrochemical parameters for effective mapping of groundwater attributes of a region (Allwright et al., 2013;Emenike et al., 2018;Jiang et al., 2020).
Statistical applications linked with spatial correlation are a helpful tool and modelling technique for creating groundwater quality monitoring maps. The application of geostatistical methods considers the spatial correlation of the regionalised variabilities by computing and modelling the semivariogram. Geostatistical methods can predict the concentration of regionalised variables, such as groundwater Multivariate and Geostatistical Analyses of Groundwater Quality for Acid Rock Drainage at Waste Rock and Tailings Storage Site parameters in non-sampled locations (Karami et al., 2018Adadzi, 2020Asghar et al., 2020).
The correlation coefficient can be described as a small correlation for -0.10 ≤ r ≤0.29, a medium correlation for -0.30 ≤ r ≤ 0.49, and a significant correlation for -0.50 ≤ r ≤ 1.0 (Pallant, 2011). Positive and negative indicate the direction of the relationship; positive suggests the increase in the other variable associated with the rise in one variable, and negative correlation indicates the decrease in the other variable related to the increase in one variable. The coefficient of determination (r 2 ), which explains the change in one variable as explained by the difference in the other variable, was calculated.
Principal component analysis (PCA) minimises information loss and reduces large data sets (King and Jackson, 1999; Jolliffe and Cadima, 2016). This study used principal component analysis to obtain the correlation between groundwater quality parameters. The first principal component explains much of the total variance in a data set, while the other members then explain the remainder of the variance. In PCA, only the components with an eigenvalue >1 were often selected and then subjected to the varimax rotation before being used for interpretation (Patil et al., 2008).
The PCA method consists of five main steps: 1. Original data matrix: where: x ij -in the matrix is the original measurement data; n -the monitoring station; p -each water quality parameter.
2. Standardisation of the original data with the Z-score standardisation formula to remove the impact of dimensionality (Belkhiri and Narany, 2015) as: where: x ij * -the standard variable; x j -the mean value for the jth indicator; s j -the standard deviation of the jth indicator.
3. Calculation of the correlation coefficient matrix R using standardised data to establish the correlation between the indicators (Equation 4) (Chen et al., 2015): ( , = 1,2, … , ) = 1 1 * + 2 2 * + ⋯ + * ( = 1,2, . . . , ) 4. Calculation of the eigenvalues and eigenvectors of the correlation coefficient matrix R to establish the number of principal components. The eigenvalues of the correlation coefficient matrix R are expressed by λ i (i = 1, 2 · · · n) and the eigenvectors are u i (u i = u i1 , u i2 , · · · u in ), (i = 1, 2 · · · n). The λ value corresponds to the variance of the principal component. Also, the variance value is positively correlated with the contribution of the principal components. The cumulated contribution rate of the first m principal components should be more than 80%, which means: ∑ m j=1 λj⁄∑ n j=1 λj ≥ 0.80 (Uddameri et al., 2014). The principal component is expressed by Equation (5).

5.
Weighing and summing the resulting principal components to obtain a comprehensive score function, as shown in Equation (6): ( , = 1,2, … , ) = 1 1 * + 2 2 * + ⋯ + * ( = 1,2, . . . , ) Applying geostatistical methods, the surfaces are connected if the sampling points are at different locations. Deterministic methods use mathematical functions to interpolate based on surrounding measurements directly. Geostatistical methods rely on both statistical and mathematical functions, including autocorrelation. The geostatistical tool ArcGIS (ESRI, 2019) was used to predict the distribution of groundwater quality within the study catchments. Kriging interpolation techniques and a semivariogram modelling approach were used to analyse the spatial distribution of parameters in groundwater associated with acid rock drainage. Semivariogram model types were considered for the selected groundwater quality parameters: exponential, linear, circular, spherical, stable, Gaussian and quadratic. Each parameter was analysed across various semivariograms. The most appropriate model of the semivariograms was selected by examining the spatial distribution of the dataset along with geostatistical features such as standard deviation, percentage error and skewness. A Kriging interpolation technique was used to interpolate readings without actual data. A spatial analysis of groundwater quality was conducted to identify variations in groundwater quality across vulnerable areas and sites. Kriging interpolation and semivariograms were the main geostatistical methods used in spatial analysis.
Geostatistical applications through either deterministic or stochastic methods have been applied in many studies to model and generate a spatial mapping of groundwater quality dynamics and patterns of major elements and trace metals. Amongst several methodologies, the kriging method has been used successfully to estimate the pollution patterns in groundwater, soil and the environment for designing proper management, remediation and preventive strategies ( The geostatistical simulation approach by kriging has been used to assess the uncertainty in the distribution of significant groundwater contaminants. The application of the ordinary kriging method in spatial modelling and mapping of groundwater quality distribution produces smoothing effects that emanate to accuracy in the interpretation of produced contaminant This study applied an integrated multivariate statistical and geostatistical analysis to investigate and determine the spatial models for major groundwater parameters associated with acid rock drainage at the waste rock dump and tailings storage site at a mine. Despite the availability of monitoring data at mine sites (WRD and TSF), the data alone is insufficient for assessment and the distribution, extent and magnitude of ARD impact at specific locations within the mine. The multivariate spatial distribution assessment ensures that attention is focused on specific locations of concern, resulting in time and cost components of environmental impact assessment and mitigation measures. The spatial distribution maps are a useful management tool for groundwater protection, remediation strategies and optimisation of groundwater-monitoring design at the mining site.

STUDY AREA
The study area is at a mean altitude of 1500 metres above mean sea level (mamsl) and has a temperate Highveld climate with warm to hot summers as well as mild to cold winters. The mean annual precipitation for the area at a 50% percentile is 286.9 mm, with mean yearly evaporation of 2600-2800 mm (Midgley et al., 1997). The mine is situated on a topographic high, mainly sloping east and northeast of the mining area ( Figure 1).
The WRD and a fine residue dam (FRD) are located to the west, and the TSFs are situated to the east of the open mine pit and cover a surface area of approximately 7.2 km 2 . Groundwater monitoring boreholes are located near the WRD, FRD and TSF to facilitate the monitoring of water chemistry and groundwater levels. The WRD is composed of rock materials that overlay the ore body and were displaced during mining. These rock materials mainly consist of dolomite from the excavated upper, middle, and lower dolomite members, as well as secondary dolomitisation. The WRD also contains remnants of the volcanoclastic kimberlite (Ekkerd and Stiefenhofer, 2003) discarded after removing economic minerals and fragments of Karoo and Transvaal Supergroup sedimentary rocks, xenoliths of basalt, mudstone, dolerite, and sandstone. The FRD, located adjacent to the WRD, is delivered in a slurry form to the deposit with about 49% moisture content. The return water from the slurry flows to the deposit pool and is then decanted in a controlled manner.
The regional geology of the study mine consists of 17.9 ha, 118 ± 2.8 Ma (Smith et al., 1985), and a Group-2 kimberlite (Smith, 1983a;Fraser and Hawkesworth, 1992), as presented in Figure 2. The geology of the mine at the open pit and sampling level 348 m below ground was described by Clement (1982) as a pipe through the Griqualand West suite, which consists of thick Proterozoic sedimentary rock suites, dolomites, banded iron formations, and shale. Large fragments of Karoo, lava, and dolerite deposits were observed in the tube and subsequently eroded. The study mine is directly underlain by dolomitic limestone with interbedded chert of the Ghaap Plateau Formation and Campbell Group, forming part of the Griqualand West Sequence of rocks.

MATERIALS AND METHODS
In this study, correlation matrix, principal component analysis and geospatial analysis were applied on the obtained groundwater quality data from 10 boreholes (70 water samples in total), taken quarterly from December 2018 to December 2020. The multivariate statistical analyses were performed using Origin-Pro statistical tools. The multivariate statistical analysis covers 14 groundwater parameters (EC, pH, Ca, Mg, Na, K, Cl, SO 4 , HCO 3 , NO 3 , Si, T. Hardness, TDS, and Turbidity), while the geospatial analysis focused on the most critical parameters, likely influencing ARD in the groundwater (SO 4 , pH, EC, Ca, Mg, Na, Cl, and HCO 3 ). Geostatistical modelling used a numerical model optimal for further analysis of each groundwater quality parameter. Semivariogram selection was made by visually reviewing the map and examining the distribution of groundwater throughout the study area, considering statistical characteristics such as standard deviation, percentage error, and skewness. The Kriging interpolation method analysed the selected groundwater quality parameters for all semivariograms. This procedure was repeated for all groundwater quality parameters to determine the optimal semivariogram for all data. The multivariate and spatial distribution assessment of the groundwater parameters associated with ARD ensures that attention is focused on specific locations of concern, resulting in time and cost components of environmental impact assessment and the application of mitigation and management measures.
The methodology was applied to the case study location, as illustrated in the schematic diagram of Figure 3.
The method was applied to the case study because of the distribution of sampling points and the number of samples, which require classification into several homogenous groups to reference the underlying significant hydrogeochemical anomalies related to ARD. The method involves subjecting the groundwater quality data to the following analyses: The methodology is based on a combination of the correlation matrix, principal component analysis, variogram modelling and kriging. In applying this method, the aim is the interpretation of specific anomalous values in the data, which could indicate the presence of acid rock drainage in groundwater. In my opinion, this can be done best if the interpretation of the anomalies is based upon an explanation of the chemical phenomena described by the data. In ideal situations, the groundwater parameters can be related directly to acid rock drainage influence. In practical applications, looking for the chemical interpretation of the parameters is generally helpful. However, it may not be possible to explain precisely at a particular stage of acid rock drainage development. The multivariate statistical analyses were performed using OriginPro software.
Correlation matrices are usually created using the Pearson method. On the basis of standardised data, R-mode factor analysis (FA) and principal component analysis (PCA) show the interaction of variables. In general, it seeks to simplify the complex and diverse relationships between a set of observed variables by revealing the common and unobserved factors associated with clearly unrelated variables.
The eigenvalues of the correlation/ covariance matrix represent the division of total variation that causes each principal component in groundwater relating to ARD property. The magnitude of the relationships between variables is used as indicators and as a covariance matrix standardised by setting all the variances equal to one. The correlation matrix calculation is a primary tool to identify the dominant correlating parameters/variables before the data is further subjected to principal component analysis.
PCA was mainly used for strongly correlated variables. PCA does not work well to reduce data when the relationships between variables are weak. PCA recommends a correlation coefficient greater than 0.3. Principal component variables were linear combinations of the original variables that produced the extracted eigenvectors. Scree plots and biplots were created to interpret the relationships between observations, while the biplots represented both loadings and scores. The correlation matrix and principal component analysis were used to identify the groundwater quality components associated with acid rock drainage linked to specific sampling locations. The clustering of groundwater parameters was then inferred to relate the potential for acid rock drainage influence on the groundwater quality.

Groundwater composition
The groundwater composition for the study area is based on the existing quarterly monitoring data obtained from the case study mine from December 2018 to June 2020. The results of the statistical representation of groundwater composition from the boreholes at the study location grouped according to the areas of the boreholes relative to the WRD, FRD and TSF are presented in Table 1.

Correlation matrix
This study used a Pearson correlation matrix to identify the relationships among the individual groundwater quality parameters/variables. The analysed correlation matrix for the groundwater samples is presented in Table 2.
The results indicate that the strongest correlated variables are SAR and Na (0.981), Mg and HCO 3   Note: *averages and standard deviations calculated from the last five sets of analytical data; # TH -total hardness. observed amongst relatively few groundwater quality variables at the study site. The presence of Na + controls the salinity load in groundwater, and then by Cland HCO 3 -. The strong positive correlation of Mg and HCO 3 (0.961) indicates carbonate buffering resulting in the weak correlation of SO 4 with pH (0.016). However, the strong positive correlation of SO 4 with Na (0.906) suggests acid rock drainage influence under saline conditions (neutral mine drainage). The strong positive correlation of Na + and Cl -(0.78) and the weak positive correlation between Ca 2+ and Mg 2+ (0.278) indicate these parameters have mixed sources of origin. Therefore, the acid rock drainage influence in the groundwater is likely due to the leachates from the WRDs and TSFs at the study location.

Principal component analysis
Principal component analysis was performed to identify the dominant groundwater component, which is inferred to acid rock drainage characteristics. Kaiser-Meyer-Olkin (KMO) suggests a value of 0.86, showing a perfect sample fit, and p <0.001, indicating a high sphericity significance Note: values > 0.7 represent strong positive factor loading.  (Table 3). The scree plot of the PCs presented in Figure 4 was used to identify the number of PCs retained to understand the structure of groundwater quality parameters. The calculated component loadings, cumulative percentage, and percentages of variance explained by each component are presented in Figure 4. The results indicate that 47.02% of the total variance represents the first principal component (PC). The PC1, PC2, PC3, and PC4 for groundwater quality data were 47.02%, 24.43%, 9.47% and 6.41%, respectively. The Biplot of PCA loadings scores for the dataset of groundwater quality parameters and sampling boreholes is presented in Figure 5. The primary principal component (PC1), explaining 47.02% of the data variance, is positively loaded with TDS, Cl, SO 4 , EC and Na, which is mainly distributed in boreholes E29 and E31 ( Figure 5).
PC1 shows a weak positive loading of SO 4 (0.308) and Na (0.295), indicating a moderate influence of secondary salts and leachate sulphide-bearing minerals at these locations. The PC2 shows that 24.43% of the total variance is positively loaded with HCO 3 , Mg, pH and TH, significantly distributed in boreholes E17, G8, and G12. The medium positive loading of Mg (0.413) and HCO 3 (0.431) indicates lithological influence and carbonate buffering in the groundwater at these locations. PC3 (9.47%) is positively loaded with Ca (E29) and Tbd (G2, G3, G4, Brits A and Brits B (E15), while PC4 (4.41%) is positively loaded with Ca (E29), Si (G2 and G3) and NO 3 (E29 and E31). PC3 and PC4 suggest significant carbonate weathering and silicate weathering, respectively, at these locations.
The PCA method uses an overall score to represent the critical groundwater quality parameters associated with each site within the study area. The study identified the groundwater quality components associated with acid rock drainage linked to specific sampling locations. The clustering of groundwater parameters indicates that the potential for acid rock drainage influence on groundwater quality is significant at boreholes E29 (located near TSF) and E31 (located in WRD). This is conspicuous in the biplot (Figure 5), showing a strong positive correlation and close clustering of SO 4 and Na within PC1 (47.02%). Sulphate and halite are strong indicators of anthropogenic influences of ARD in the groundwater at these locations. The biplot also indicated the evidence of carbonate buffering reactions at boreholes G8,

Geostatistical analysis
Geostatistical analysis was carried out with Ar-cGIS 10.8 to perform ordinary kriging for predictive surface output with no transformation. Predictive performance was assessed by cross-validation, examining the accuracy of the selected fitting models and parameters, as shown in Table 4.
All the groundwater quality parameters were evaluated by cross-validation as follows: • The Standardised Mean Error range closest to zero, • The Root Mean Square Standardised Error (RMSSE) range closest to 1, and • Standard errors are close to the root-meansquare prediction errors.
The results in Table 4 indicate the best fit semivariogram models based on the defined error analysis schemes selected for kriging interpolation to generate surface maps for each groundwater parameter associated with ARD at the study location. Semivariogram models (circular, spherical, tetraspherical, pentaspherical, exponential,    Gaussian rational quadratic, hole effect, K Bessel, J Bessel and stable) were tested for each parameter.
The best fit semivariogram model for each parameter-based cross-validation is presented in Table. By utilising the best model determined from the cross-validation process, the predicted groundwater quality parameters were used to produce the spatial distribution maps for SO 4 , pH, EC, Na, HCO 3 , Ca and Mg for the study, as presented in Figures 6 and 7.
The discussion on the spatial distribution of the groundwater quality parameters of the study area associated with acid rock drainage impact is discussed in Table 5. The generated map for each parameter indicates that the influence of ARD has a high magnitude at boreholes E31 and E29, located in the WRD and close to the TSF. This is because of the generally elevated concentrations of SO 4 , Na, Cl and EC, which are characteristics of potential ARD influence in the groundwater. Sulphate and EC are usually closely correlated, with the degree of correlation enhanced with increasing ARD contamination of groundwater (Heikkinen et al., 2002). Carbonated neutralisation of pyrites and the dissolution of calcite/dolomite were also observed in boreholes G2 and G8, contributing to alkalinity.

CONCLUSIONS
The application of multivariate and geostatistical analysis for identifying ARD in groundwater at mine sites provides a practical, cost-effective framework capable of pinpointing the areas of critical ARD impacts for implementing mitigation and remediation strategies. The multivariate statistical methods of the correlation matrix (CM) and principal component analysis (PCA) were integrated with geostatistical spatial analysis (GPA) to determine the potential localities of acid rock drainage influence at the study site. On the basis of the correlation matrix analysis, the strongest correlated groundwater quality parameters at the study location were Na, Cl, SO 4 , and HCO 3 , indicating the influence of secondary salts with leachate sulphide-bearing minerals accompanied by carbonate weathering. PCA shows that the primary principal component in the groundwater accounting for 47.02% of the groundwater parameter variance is SO 4 , Cl, Na, EC and TDS, which are mainly distributed at boreholes E29 and E31. The application of geostatistical analysis resulted in the generation of spatial distribution maps of the groundwater quality parameters for the study area. The spatial distribution maps indicate that the maximum SO 4 , Na, Cl and EC concentrations occurred at boreholes E29 and E31. The integrated CM-PCA-GPA showed elevated sulphate concentrations at circumneutral pH (Neutral Mine Drainage) at E29 near a tailings storage facility and E31 in a waste rock dump. The integrated CM-PCA-GSA benefits the ability to identify the patterns within multivariate groundwater hydrochemical data that are often difficult to characterise for acid mine drainage. The integrated CM-PCA-GSA application validates the proposed integrated hydrogeochemical analysis method by identifying the correlations and spatial distribution of groundwater parameters according to ARD influence on groundwater hydrogeochemistry. Multi-disciplinary research on acid rock drainage and mineral leaching from tailings and waste rock facilities at mining sites facilitates diverse prevention, mitigation, and management options.