Stacking Artificial Intelligence Models for Predicting Water Quality Parameters in Rivers

Scrutinizing the changes in the quality of river water is one of the main factors of monitoring the quality of natural flows, which plays a crucial role in the sustainable management of these ecosystems. The concentration of dis - solved oxygen (DO) in river water is one of the most important indicators of quality management in such water bodies. From an environmental point of view, exceeding the permissible and natural decay capacity of pollutants in natural streams leads to a decrease in DO and, consequently, causes serious risks for the survival of aquatic life in related ecosystems. Hence, in the present study, 10 daily variables with the amount of dissolved oxygen on the same day were collected and evaluated from Allen County. Moreover, half of these variables were chosen as effective inputs to the model based on statistical analysis, so as to calculate the dissolved oxygen concentration parameter. Modeling with artificial intelligence approaches was implemented in the form of four individual methods: ANFIS-PSO, OS-ELM, Bagging-RF and Boosting CART, and two ensemble-stacking methods: SMA and Meta-learner MLP. The outcomes of estimating the DO with RMSE, MAE, GRI, r, and MBE criteria and marginal-scatter and subject profile diagrams were discussed. Moreover, the efficiency of the models in estimat - ing the outlier of the observational data was scrutinized by subject profile diagram. Finally, it was found that the Meta-learner MLP model with RMSE of 0.965 mg/L had improvement in performance by 8.8%, 8.9%, 22.3%, 24.9% and 27.6%, respectively, compared to SMA, Boosting CART, Bagging-RF, ANFIS-PSO and OS-ELM methods. This remarkable improvement led to recommendations for using stacking techniques in water quality modeling and simulation.


INTRODUCTION
Natural flows are considered as one of the main sources of fresh water for diverse purposes (including drinking, agriculture, and industry). Therefore, rivers are one of the basic foundations of sustainable and environmentally friendly development in human societies, while industrial and welfare developments have led to increasing stresses on river water quality, so that these vital and valuable resources are exposed to danger. Hence, given the major impact of human activities on the changes in river water quality, evaluating river flows with a qualitative modeling approach is of a great importance in studying water resources [Abazi et al., 2022;Lusiana et al., 2022;Rahutami et al., 2022]. In the present investigation, despite diverse qualitative variables (which are included in the model), the qualitative study of river water was taken into account. At the same time, the amount of dissolved oxygen acts as the output of the model. Emphasis on the variable of dissolved oxygen and its acceptance in the role of the target parameter in modeling cover the set of reactions involved in the amount of oxygen. To put it another way, various factors affect the alterations in the concentration of dissolved oxygen in the river, in such a way that we can refer to its increase through direct absorption from the atmosphere and photosynthesis of algae (algae photosynthesis) and its decrease due to consumption in chemical and biological reactions during

Stacking Artificial Intelligence Models for Predicting Water Quality Parameters in Rivers
Mohammad Almadani 1* , Marwan Kheimi 1 , the processes of decay of pollution load in the river, oxidation of sediments and algae respiration [Benedini and Tsakiris, 2013]. Hence, the concentration of dissolved oxygen as a quality item is a good indicator of the condition of the water body of the river, demonstrating the resultant effect of physical, chemical and biological properties. All of the aforementioned items led to predicting the amount of dissolved oxygen in a natural flow in this study. Besides, physical variables such as water temperature and chemical variables such as phosphorus concentration were used in modeling. It is interesting to note that physical, chemical and biological classifications for qualitative variables are not an easy task at all since a number of qualitative parameters are the result of a set of physico-chemical-biological reactions and we can put them into various groups at the same time. This problem is designed while considering such an approach, so as to be in line with the comprehensiveness required in water resources management.
It is noteworthy that the use of physical, mathematical, and numerical modeling for simulating river pollution problems have been used widely during the last decades [Schaffner et al., 2009;Kisi and Parmar, 2016;Drozdov et al., 2021;Zounemat-Kermani et al., 2021a]. Mathematical water quality modeling has proved as a reliable and cost-effective approach to simulating pollutant distribution in surface waters and rivers that can be successfully employed in water resources planning and management. It should be noted that modeling is not a substitute to the field observations but it can be considered and used as a proper alternative in simulating or understanding observations under certain circumstances.
On the other hand, the demand for increasing accuracy in modeling water quality issues has led to a focus on the implementation of artificial intelligence methods in this field. During the last decades, soft computing methods and machine learning models have been successfully used and developed for modeling different areas of hydroenvironment systems [Kim et al., Najah et al. [2009] investigated and predicted the water qualitative variables in Johor River (Malaysia). They developed 6 architectures for neural networks, in such a way that the ANN model was used in the simulation and prediction of the parameters of total dissolved solids, electrical conductivity and turbidity, in two main stream and tributary positions. Due to the low prediction error, the outcomes of the aforementioned research proved the reliability of the model which were used in estimating the aforementioned parameters.
Sighn et al. [2009] demonstrated the capability and power of SNNs in modeling dissolved oxygen (DO) and biochemical oxygen demand (BOD) by using data gathered monthly over a 10-year period in Gomti River (India). This study indicated that optimal networks would be able to control and capture the observed long-term trends for the DO and BOD qualitative variables in time and space. Najah et al. [2014] compared the ability of the ANFIS model to predict the amount of dissolved oxygen in the Johor River basin with the MLP network and compared the capability of the ANFIS model to predict the amount of dissolved oxygen in the Johor River basin with the MLP network. For this purpose, four parameters of temperature, pH, nitrate concentration and ammonia nitrogen concentration were adopted in order to create the input compounds to modeling. Sarkar and Pandey [2015] implemented artificial neural network (ANN) to estimate the dissolved oxygen (DO) concentrations for Mathura city, located in India. Datasets in monthly intervals including flow discharge, pH, biochemical oxygen demand (BOD), water temperature, and DO were gathered for doing the analysis. The predicted values obtained from the ANN for DO concertation, showed high level of accuracy (Pearson's correlation coefficient > 0.9) between the measured and predicted parameters. Raheli et al. [2017] predicted dissolved oxygen and BOD parameters in Langat River (Malaysia) through various models including perceptron lattice (MLP) and MLP model integrated with the glow worm metaheuristic algorithm. The results demonstrated that hybrid model was more efficient and accurate in estimating the qualitative variables of the river water by involving an optimizer.
Haghibi et al. [2018] investigated the performance of some soft computing techniques including neural networks, group method of data handling (GMDH), and SVR for the prediction of water quality indices in rivers. They claimed that the results ANN and SVR were suitable for predicting the water quality indices. Li et al.
[2019] implemented a hybrid machine learning methodology embedding the metaheuristic firefly algorithm (FA) with the support vector regression (SVR) with for modeling water quality indicator prediction. The outcomes of the study showed that the SVR-FA model acted appropriately and provided promising results for the prediction of water quality index (WQI). Lu and Ma [2020] applied two hybrid tree-based soft computing models (extreme gradient boosting (XGBoost) and random forest (RF)) to predict the water quality in the Tualatin River, China. It was reported that the RF performed better than the other applied models in terms of the predicted values of DO, water temperature, and specific conductance. Moreover, stability analysis showed that the prediction stability of RF and XGBoost is higher than other benchmark models.
Varol [2020] scrutinized and assessed the effect of several stressors (such as agricultural runoff and untreated domestic sewage) on the water quality of Sürgü Stream (Turkey) with multivariate statistical techniques (MSTs) and water quality index. The majority of the studied qualitative parameters indicated significant spatial changes owing to the anthropogenic activities.
Pham et al. [2021] predicted WQI for the quality of water in wetlands using three artificial intelligence models (adaptive neuro-fuzzy system (ANFIS), ANNs, and GMDH). The results indicated that the ANFIS with (NSE = 0.9634 & MAE = 0.0219) had better performance to predict the WQI. Leong et al. [2021] applied the SVM machine learning model for predicting BOD and COD, as two WQI indices. They found that the SVM acted better than the traditional mathematical models.
By using the calculation of reflectance in remote sensing and the synchronous measurement of dissolved oxygen levels and water temperature in water bodies from 22 degrees north latitude to 45 degrees north latitude, Guo et al. [2021] developed and validated support vector regression (SVR) models and examined the effects of five climatic factors on the long-term behavior of dissolved oxygen. The results indicated the capability and generalizability of the SVR models developed in this study as well as better performance of these models in estimating dissolved oxygen by random forest methods and multiple linear regression.
Yu et al.
[2022] presented a new method including decomposition of water quality data into a number of subseries by wavelet transform method, recombined by fuzzy C-means clustering and prediction (prediction) with the bidirectional gated recurrent unit method. The proposed model was assessed by qualitative data (including dissolved oxygen variable) from Poyang Lake (located in China) which indicated high accuracy in forecasting data.
Using the variables of temperature and flow rate, Dehghani et al. [2022] predicted the amount of dissolved oxygen (DO) in Cumberland River (located in the United States). In the present study, time series were monthly. Support vector regression (SVR) was responsible for modelling by itself and in combination with CSO, SSD, BWO and AIG algorithms. The four hybrid models performed better than the single model since they increased accuracy of estimation from 1.75% to 6.52%.
These studies highlight the desirable capability of data-based methods in estimating and predicting the quality variables of surface water. It can be said that by relying on the capacity of these methods, direct measurement of quality indicators can be reduced and the level of planning and quality management in natural flows can be improved. In other words, artificial intelligence models, owing to understanding the relationships governing the processes in water bodies (without the need for basic equations), have great accuracy and power in assessing and estimating water quality conditions and are considered as an effectual tool in determining the parameters of river water quality. Conversely, successful and frequent implementation of ensemble techniques including resampling methods (such as bagging and boosting), averaging and stacking has been reported for simulating and predicting the defined goals in diverse fields of hydrology [Zounemat-Kermani et al., 2021a].
All of the aforementioned issues encouraged the authors of the present study to use new modeling tools in the field of water quality in order to use artificial intelligence models in a more innovative way. The comprehensive explanation is that in the present inquiry, the amount of dissolved oxygen in the river was predicted by two groups and, subsequently, their aggregation was conducted. The first group was network-based including ANFIS-PSO and OS-ELM, and the second group was a regression tree, consisting of two models including Bagging-RF and Boosting CART. So far, no such comparison has been conducted in the field of water quality. By stacking four models, as well as ensuring innovation in the methodology used, a more accurate assessment of the combinability of the models was provided. Stacking the models was implemented using two algorithms of averaging and MLP neural network, so that the analysis of this process and its eff ect on the predictive power of dissolved oxygen were completed. Such a comprehensive structure was adopted for the fi rst time in applying methods and comparing the power of models individually, in groups and collectively.

Study area and data
The data of this study were gathered on a daily basis from January 1, 2016 to December 31, 2018 (including 1,096 recorded values of each variable) from the United States Geological Survey [USGS, 2022]. Figure 1 represents the geographical location of measuring the data of the present study in Allen, Indiana (U.S), with the following features: (Hydrologic Unit Code 04100005, Latitude 41°10'59.1", Longitude 84°52'10.9" NAD83, Drainage area 12.36 square miles, Gage datum 723.46 feet above NAVD88).
In Table 1, a summary of the statistical status of the studied parameters is available. From among the introduced parameters, DO (dissolved oxygen) is a target variable that along with other parameters, makes it possible to qualitatively model the river water; it can be said that predicting and estimating DO concentration are the responses to the interaction between the qualitative variables in the river fl ow fi eld. In fact, analyzing the qualitative parameter of dissolved oxygen along with other qualitative parameters (Cl, OP, NO 3 + NO 2 , SSC, P, T, SC, pH and NH 3 + NH 4 ), as well as the fl ow rate (Q) form the problem structure of this study.
As expected, despite the quantitative changes of water over time ( Figure 2) and its eff ect on the  self-purifi cation potential of the river, remarkable changes in water quality are observed in low water and high-water months, in such a way that in July to September, the rate of DO was lower than other months. This is in line with the changes of fl ow rate in those months ( Figure 3) and refers to the relationship between qualitative variables and fl ow in a series of time. So, in order to avoid the eff ect of the corresponding temporal eff ects of the data used (Table 1) in the models and to prevent a time trend from entering the process of predicting dissolved oxygen concentration, the chronological order of all data is disordered randomly. 75% of the beginning of the new series obtained from the data is intended for the model learning course and the fi nal 25% is intended for testing the models.

Methodology
In this section, the methods used in order to predict the concentration of dissolved oxygen of the river in the present study are introduced from four individual models (i.e., ANFIS-PSO, OS-ELM, Bagging-RF and Boosting CART) and two stacking ones (i.e., SMA and MLP meta-learner).

ANFIS-PSO
Using adaptive neural network and fuzzy logic algorithms to design a nonlinear mapping between input and output spaces, an adaptive neural-fuzzy inference system (ANFIS) is developed. In the  learning phase, the input values are more similar to the actual values by modifying the parameters of membership degree according to the acceptable error rate. The major learning method in this system is the back propagation method under the least squares error algorithm, which corrects the parameters by returning the error value to the inputs. In order to achieve the desired framework of the neural-fuzzy system, it is essential to fit this system to the rules and functions of membership [Jang, 1993]. Hence, in the present article, the particle swarm optimization (PSO) algorithm is adopted to achieve the membership function and fuzzy rule extraction method in an optimal way, so that the algorithm can search for the optimal state by randomly creating solutions. In this study, initial population of particles is equal to 100, c1 and c2 acceleration parameters for the search space of -10 to +10 in each repetition are equal to 1 and 2, the best membership function is of the Gaussian type and the subtractive clustering method is obtained as the major partitioning technique.

OS-ELM
The extreme learning machine is a singlehidden layer feed forward neural network which determines input weights randomly and output weights analytically, except that it does not use bias for the output neuron. The ELM model decreases the network learning time remarkably by using different algorithms in calculating weights and biases. Moreover, by applying a set of weighted input signals to the network, activity functions allow for achieving a response ]. The online sequential extreme learning machine can be trained with individual data or blocks of them in a significantly variable or fixed size. This model adopts additive hidden nodes and radial basis function (RBF) in a unified framework [Liang et al., 2006; Zounemat-Kermani et al., 2021b]. The present study has a sigmoid activating function for the additive node so as to allow for the output matrix calculation of the hidden layer in the sequential learning algorithm.

Bagging-RF
In order to create a regression tree, reversal partitioning and multiple regressions are used. The decision process is repeated in each internal node from the root node, according to the tree rule, until the termination condition is satisfied. Each final node is connected to a simple regression model. At the end of the tree calling process, pruning is used to improve the generalization capacity of the trees by reducing the complexity of the structure. In order to avoid the accordance of various regression trees, the Bagging-RF model reduces the diversity of trees by creating diverse subsets of training data, which is referred to as bagging. Bagging is performed through random sampling of the main data set with replacement. Hence, some data may be used more than once in learning branches, while ineffective data may be excluded from modeling. This makes the model more stable and reliable in the face of minor changes in input data and enhances its prediction accuracy [Breiman, 2001]. In the present study, the sample size, maximum number of nodes, maximum tree depth and minimum child node size are calculated as 1, 10000, 10 and 5, respectively.

Boosting CART
The regression and classification tree model (CART) is in the form of a binary order tree that divides the problem space into segment parts [Fürnkranz et al., 2012]. This method creates its branches in a binary way and based on only one independent variable, in such a way that the information in the node is divided into two parts, based on the condition defined in each node. In the Boosting CART model, several new learners are generated from CART regression tree, which creates a more powerful algorithm by learning with previous learners. In this inquiry, maximum tree depth, number of component models for boosting and maximum surrogate in the pruning method are calculated as 5, 10 and 5, respectively. The Gini index is the impurity measure of decomposition and averaging is considered as the combining rule.

SMA model (stacking)
The simple moving average (SMA) model predicts target values by averaging the available data. In stacking mode, this model considers the average of the values which are figured by the individual models at a given time as the target value at that time [Zounemat-Kermani et al., 2021a].

Meta-learner MLP model (stacking)
The multilayer perceptron neural network (MLP) is created based on a computational unit called perceptron. A perceptron takes a vector of inputs with actual values and calculates a linear combination of these inputs. In this method, calculations are performed from the input of the network to its output and, afterwards, the obtained error values are released into the previous layers in order to make the completion of the learning process possible [Barzegar and Asghari Moghaddam, 2016]. In the stacking mode, by connecting the output of individual models and defining them as input to the MLP neural network, the structure of a powerful meta-learner model is established. In the present study, the sigmoid activation function is used in the middle layer and the linear function in adopted in the output layer and the Levenberg-Marquardt optimization algorithm.

EVALUATION CRITERIA
Comparing the efficiency of the models and interpreting their abilities needs the use of error measurement criteria. Concerning this issue, as well as allowing visual comparisons with subject profile and marginal-scatter diagrams ( Figures 5  and 6), quantitative metrics in Table 3 help increase accuracy in analyzing the modelling process.
In this research, the root mean square error (RMSE), mean absolute error (MAE), geometric reliability index (GRI), Pearson's correlation coefficient (r) and mean bias error (MBE) were used in order to scrutinize the results. RMSE, MAE and MBE were obtained based on the deviation of the predicted values from the observed values. Therefore, the lower the value, the more powerful the model would be, while GRI and r creates such a condition by approaching to 1. RMSE and MBE are two statistical measurements that have been widely used in environmental estimation models [Jacovides and Kontoyiannis, 1995]. Also, relative error measurements have a good level of reliability for analyzing positive data such as the values which are reported from the concentration of a variable [Jachner et al., 2007]. RMSE does not differentiate between over-estimation and under-estimation, while positive and negative MBE denote the model's tendency to over-predicted and under-predicted, respectively [Jacovides and Kontoyiannis, 1995]. GRI can also be considered as an exact simulation as a multiplicative factor in observational values, by virtue of which the corresponding predicted values are available [Jachner et al., 2007]. (1) In Equations 1 to 5, DO m and D ͞ O m the concentration of the measured dissolved oxygen and its mean respectively, DO c and D ͞ O c the dissolved oxygen concentration calculated by the model and its mean respectively, and N are the number of actual and predicted data pairs. Based on the aforementioned equations, the difference criteria (RMSE, MAE and MBE) are expressed based on the data unit used and the relative criteria (GRI and r) are expressed without units.

FEATURE SELECTION PROCEDURE
In this research two methods of Pareto optimization and best subset selection methods have been applied for constructing the best input combination.

Unsupervised feature selection using Pareto optimization
Variables Q, P, T, SC and pH are the selected parameters; the degree of their effectiveness at the significance level of 15% is represented in Figure 4 by Pareto method. In this figure, the reference line with the standardized effect of 1.44 shows the minimum value for a significant relationship between the input parameter and the output variable of the model, in a way that from among the chosen variables, temperature (T) and fl ow rate (Q) have the most and least eff ects on DO, respectively.

Best subset selection method
By taking into account a series of daily input parameters including 10 variables (Q, Cl, OP, NO 3 + NO 2 , SSC, P, T, SC, pH and NH 3 + NH 4 ), in this paper, we tried to evaluate the model of the output calculation (DO) on the same day. This requires the defi nition of statistical and analytical frameworks. In Table 2, the input variables with the maximum eff ect on dissolved oxygen values are selected with respect to the minimum Mallows' Cp statistic and in the highest correlation with DO (R-squared maximum); other parameters are not included in the modeling. In this section, the results are presented in the form of tables and graphs. Also, some explanations and clarifi cations are provided in order to provide the reader with a deeper understanding of numbers and fi gures.

RESULTS AND DISCUSSION
In subject profi le diagrams (Figure 5), the outliers in the observational data are really obvious. The data which are used to draw these graphs were associated with the test phase. Scrutinizing them reveals that these DO values are on either side of the graph; i.e., they have the lowest and highest values. The lowest outlier is related to a dissolved oxygen concentration data, the correspondent parameters of which, i.e., Cl, OP, NO 3 + NO 2 , SC and NH 3 + NH 4 , are equal to 0.43 ton/d, 0.7 kg/d, 73 kg/d, 963 μS/cm and 1.18 kg/d. The maximum output is fi ve data, the average value of which for the mentioned variables is equal to 0.19 ton/d, 0.3 kg/d, 24 kg/d, 814 μS/cm and 0.25 kg/d.
More chloride in outlier-Min compared to outlier-Max (0.43 vs. 0.19) increases the possibility of entering the agricultural runoff and municipal and industrial effl uents to the river on the day of gathering the data of outlier-Min. The persistence of chloride in water can indicate such an event because it leads to the absence of chloride in chemical and biological reactions in the river and the presence of this element can demonstrate the presence of water pollution to some extent. Signifi cant increase in nitrate (NO 3 ), nitrite (NO 2 ), ammonia (NH 3 ) and ammonium (NH 4 ) in the outlier-Min is in accordance with the hypothesis of entering the wastewater to the river and it refers to the nitrogen cycle, its eff ect on oxidation and reduction processes as well as the amount of water-soluble oxygen. It should be mentioned that it seems logical to reduce the concentration of dissolved oxygen to 1.9 mg/L and consume it by the nitrogen compounds in the effl uent. Specifi c conductance (SC) is the rate of electrical conduction Table 2. The eff ective input variables on the target parameters (DO) based on the maximum R-squared and minimum Mallows CP parameters 80. 8 11.0 X X X X X X X X X X

Note:
The fi ve selected parameters are Q, P, T, SC, and pH. through water-soluble salts. The SC is higher in the outlier-Min than in the outlier-Max, which is in accordance with the higher amount of chlorine ions in this case. Moreover, specific conductivity is directly related to total dissolved solids. These total dissolved solids contain organic matter and nutrients, as well as metals.
It should be noted that the outlier mentioned in this article means that the actual DO value is further away from the normal and expected data range, which may have been owing to an external factor (such as pollution) and a change in normal conditions in the water body. This approach introduces two approaches in order to address subject profile diagrams. To put it another way, we can divide these diagrams into two main parts: the major part of them consists of normal data and the minor and most important part contains outlier data. The importance of this view is reflected in tracking pollution on the days when the DO is severely reduced and, in fact, abnormal fluctuations happen in the concentration of dissolved oxygen. Comparing minimum outlier with the maximum outlier is performed, so that the data are of the same type; i.e., the river quality conditions are not normal and the data indicate the days that show a significant increase or decrease in the intensity of the effects of external factors. In completing this view, the maximum outlier indicates the days when the least pollution entered the river and its amount was less than the self-purification capacity of the river.
Also, in such a qualitative approach, the normal range of DO concentration can be interpreted as the equilibrium condition between the amount of pollutants and the self-purification capacity of the water body. This represents the value of subject profile diagrams (not yet seen in similar studies) that are highly in line with the RMSE and MAE error criteria in the test phase (Table 3).  According to Figures 5b and 5c, the Metalearner MLP model has the best performance and Boosting CART method was more powerful in predicting DO for normal and outlier data. Also, the SMA technique, despite having RMSE = 1.058 mg/L and MAE = 0.807 mg/L, did not act satisfactorily in estimating outliers. Actually, increasing horizontal lines in subject profile diagrams were associated with enhancing the performance of the models. ANFIS-PSO and OS-ELM models, in addition to having weakness in determining the amount of outlier data, did not have a good performance with RMSE = 1.284 mg/L and RMSE = 1.333 mg/L, respectively. Figure 5b shows that the Bagging-RF technique, unlike the other tree algorithm (Boosting CART), presents a state of classification in the results that accumulated and increased the error.
According to Table 3, the highest bias in the test phase can be observed in the network-based models, where the deviation of the computational values from the 1 : 1 line and the tendency to overestimating (placing the maximum points of the graph above the 1 : 1 line) are quite clear in Figure 6a. The symmetry of the points relative to the 1 : 1 line (Bagging-RF model, Figure 6b) causes the MBE values to approach 0. However, the centralization of the points on this line (Meta-learner MLP in Figure 6c; Boosting CART in Figure 6b) as well as satisfying the insignificance of MBE (MBE = -0.06) reveals higher effectiveness of the model. This is in line with the high conformity of the box plots drawn for the Boosting CART and the Meta-learner MLP to the box plot of observational data. It is interesting to note that the Boosting CART and Meta-learner MLP methods were mostly in line with the lower branch (the connecting line between the lower whisker and Q1) and the upper branch (the connection between Q3 and the higher whisker) of box plot of the actual data, respectively. In the test phase of the superior models (Meta-learner MLP and Boosting CART), the criterion r indicated high correlation between the estimated and actual values (r = 0.950 and r = 0.940) and the GRI criterion showed the highest geometric similarity (GRI = 1.132 and GRI = 1.128).
According to Table 3, the OS-ELM and AN-FIS-PSO methods (with the highest RMSE and MAE in the test phase) were the weakest models. These models had negative MBE in training phase and underestimated the desired items. However, in the test phase, only the most accurate models (Meta-learner MLP and Boosting CART) had negative mean bias error values. It is likely that in the data related to the dissolved oxygen concentration, there was a tendency to decrease due to natural refining and oxygen consumption. Therefore, even the best models were in a state of underestimation during the training  and testing phases. Finally, in the present study, the highest conformity was obtained between the performance of the models in DO prediction with the RMSE criterion, revealing the high validity of this measurement in modeling measurement. Hence, the relative improvement in the performance of the best model compared to other models in the test phase is represented in Figure 7 and Table 4.

CONCLUSIONS
In arranging test models in the order of desirability, the RMSE criterion can be relied upon (Meta-learner MLP is the most desirable model and OS-ELM is the least accurate method). At the same time, by analyzing the distribution diagrams and subject profile, it is possible to make a more accurate judgment and, according to the outlier examination, it was found that Boosting CART had much more predictive power than SMA. However, these models had almost the same RMSE. In the following, the superiority of the models was discussed based on the category and type. Network-based models had less accuracy than tree methods, which could be owing to the type of tree algorithm learning. It should be noted that there was little error in the training phase for network-based models, but the effect of the type of learning led to the superiority of regression tree models in the test phase. Stacking models also intensified the ensemble effect to the point that the simplest ensemble-stacking model had better performance than network-based and Bagging-RF models. Also, stacking performed under neural network experienced improved performance in modelling, which indicated not only the importance of ensemble modeling, but also the validity of combining models by stacking them.
Finally, it is suggested that in future studies, stacking models should be equipped with an optimization algorithm and the results obtained in two modes should be compared and evaluated, so as to predict the parameters of water quality. For instance, the MLP model should be integrated with particle swarm optimization (PSO) and gray wolf optimizer (GWO) algorithms.