Wastewater Pollutants Modeling Using Artificial Neural Networks

In this study, the execution and assessment of the ANN approach towards the declaration of the pollution was used. The ANN-based models for prediction of Chemical and Biological Oxygen demands, (COD & BOD5) and Total Suspended Solids (TSS) concentrations in the effluent were formed using a three-layered feed forward back propagation algorithm ANN towards assessing the performance of a wastewater treatment plant (WWTP). Two types of configurations were used, MISO and MIMO. The study showed the superiority of MIMO according to the results of R and MSE, which were used as evaluation functions for the predicted models. The results also showed that the model built to predict the values of BOD5 concentrations demonstrate the best performance among the rest of the models by achieving the value of correlation coefficient up to 0.99. Among the input combinations tested in the study, the models the inputs of which did not contain BOD5 had the best performance, which demonstrates that the BOD5 has the largest influence on the values of R in the COD prediction models as well as other predicted models than TSS and other parameters; consequently, the performance of the WWTP was greatly affected. This study demonstrated the value of using artificial networks to represent the complex and non-linear relationship between raw influent and treated effluent water quality measurements.


INTRODUCTION
Increasing the stringency of wastewater determinants requires efficient and effective treatment of these waters, characterized by dynamic and complex nature and characteristics, before they are released to water bodies [Güçlü and Sükrü, 2010].
The wastewater discarded from industries and municipal uses is the main source of pollution of the aquatic environment due to the many kinds of chemicals that are discharged into the environment. Therefore, it is important to apply efficient monitoring and control methods for wastewater treatment arrangements [Dogan et al., 2008].
Building an approved model for wastewater treatment in any treatment plant is important for the prediction of its efficiency and building a basis for the process. This process requires a high degree of precision and nonlinearity because it is complicated due to the organic pollutants present in it, which are difficult to model using conventional methods [Dogan et al., 2008].
During the past twenty years, the modeling methods using artificial neural networks have received great interest in the modeling of wastewater treatment methods and are applied in different environmental fields. The wastewater treatment processes are complex. Neural network models have an outstanding ability to address nonlinear relationships. Any former information of the associations between the variables and the processes to be modeled is not required in this technique [Güçlü and Sükrü, 2010;Hong et al., 2007]. The improvement in artificial intelligence approaches makes them usable for modeling complex systems [Hanbay et al., 2006;Tumer and Serpil, 2015].
Neural networks are mathematical approaches that consist of a number of processing units linked together by weights. This tool aims in linking the input data set with its output counterpart after several previous processing operations from this system [Kundu et al., 2013].
There are basic and important variables used to evaluate the performance of a wastewater treatment plant. These variables are the Chemical and Biological Oxygen requirements, (COD), (BOD 5 ) and Total Suspended Solids (TSS). These parameters are often used in the modeling of waste water processing and treatment units [Tumer and Serpil, 2015].
A variety of papers have documented the use of ANN to address and model environmental engineering issues. Tumer and Serpil in 2015 studied the use of ANN with various structures by MATLAB to model Konya wastewater treatment plant. The study compared the model efficiency by using (MSE), and R, "correlation coefficient". The required ANN model design is calculated after multiple test and error attempts.
The goal of the study by Dogan et al. [Dogan et al., 2008] was to build an artificial neural network model for the estimation of BOD wastewater treatment plants. For the assessment of influence of the parameters, many arrangements and groups of data were implemented as network inputs. On the basis of the evaluation, the developed model could be used efficiently to measure BOD, as it was observed from the study.
In 2010, Güçlü and Sükrü built a number of neural network models with BP training algorithm to forecast the concentrations of SS, MLSS and COD of the Ankara effluent wastewater treatment facility. The results of the RMSE, MAE and mean absolute percentage error showed that the produced model is efficient and can be implemented. Generally, the study outcomes also emphasize that the neural network modeling technique could be of unlimited applications in simulating, controlling process and accurate efficiency forecasting of wastewater treatment plants.
A paper presented by Kundu et al. in 2013 deals with treatment of slaughterhouse wastewater. The findings of the experiments were discussed to create and build a feed forward BP ANN for the prediction of combined removal efficiency of COD and (NH 4 +N). The study findings were used to test and validate three types of ANN models.
Artificial intelligent models have been widely applied to address many water and wastewater treatment problems like processing, forecasting and controlling the results The purpose of this study is to discuss the design, implementation, and evaluation of the ANN method to declare the pollution level of the wastewater and evaluating the performance of the wastewater treatment of highly polluted influents. The critical operation parameters most commonly used for this purpose are BOD 5 , COD, TSS, Total nitrogen (TN), Temperature, pH, NO 2 , NO 3 , NH 3 and PO 4 . The ANN models were created for the simulation and prediction of effluent COD, BOD 5 and TSS concentrations as well as the functioning of the wastewater treatment facility monitoring.

ARTIFICIAL NEURAL NETWORK
The Artificial Neural Network (ANN) is a data processing device derived from the brain's biological nervous systems that attempts to measure the difference between output and input data using certain internal equations [Delgrange et al., 1998;Tumer and Serpil, 2015].
Artificial neural networks consist of three or more layers, input layer, one or more hidden layers and output layer. Each layer contains many neurons as shown in fig. (1). The size of the ANN is determined by the number of hidden layers in the ANN. A neuron usually receives multiple data at the same time. Each input has its individual relative weight; coefficients within the network that calculate the strength of the input connections in training processes, these strengths can be adjusted [Güçlü and Sükrü, 2010].
The weighted summation of all inputs is computed as the first step in the application of processing elements (eq. (1)).
where: W is the weight factor and a, is the input values. Every neuron is bound to all the neurons in the next layer. Neural network receives the data through the input layer while the output layer presents the output of the neural network. This network can measure the complex relationships between input and output with the aid of the hidden layers [Kundu et al., 2013].
The selection of the number of hidden layers depends on the problem diffi culty. Usually one hidden layer is enough and adequate to investigate almost all problems. The number of neurons in the layer is determined based on trial and error approach beginning with the lowest value and gradually increasing according to the nature of the problem [Kundu et al., 2013].
The simplest and hence most widely employed neural network architectures are the mul- A feed forward neural network is one in which connections are created in just one direction from input to output without producing cycles. Information always fl ows in just one direction, from input to output [Mallikarjuna and Mise, 2019].
The output produced by the transfer function is propagated to the neurons in the next layer. Sigmoid function is a widely used transfer function. Learning data in ANN occurs by the continuous modifi cation of the neuron weights depending on the error between the modeled and target output values [Kundu et al., 2013;Tumer and Serpil, 2015].
The back propagation is basically a gradient declining procedure that minimizes the error of the network function according to eq. (2): in which the expected and intended values are e i (j) and t i (j), respectively. "k" denotes the number of training samples [Kundu et al., 2013].
The error is calculated based on the discrepancy in output for a given range of inputs and then back propagated to modify the neuron weights. This iterative procedure is repeated until all weights have been changed to the point where is no discrepancy between the measured and true values at the output neuron. This procedure is replicated with all inputs and is referred to as ANN training [Mallikarjuna and Mise, 2019].

AL-MUAMIRAH WASTEWATER TREATMENT PLANT
The Muamirah wastewater treatment plant ( Figure 2) is situated approximately 10 km south of the town of Al-Hillah. The facility is built to handle the Hillah's city domestic and pre-treated industrial wastewaters of the Hillah's city based on a biological technique. The plant is planned to handle approximately 25000 m3 of wastewater.

DATA COLLECTION
The average daily concentrations of BOD 5 , COD, NH3, TN, PO 4 , NO 3 , NO 2 , pH, TSS and temperature were gathered from the Muamirah wastewater treatment plant. The sets of data characterize the average values from two years of measurements. Table (1) presents the statistical analysis of the data measured.
Furthermore, the data was adjusted around its mean value according to the standard deviation, as follows: where: µ and σ are the mean and standard deviation values of the data to be normalized.

MODEL IMPLEMENTATION
A variety of steps have been taken throughout the creation of the model. Fig. (3) depicts them schematically [Güçlü and Sükrü, 2010].
All modeling programs applied in this analysis have been employed in MATLAB. In general, there are three modeling phases in the implementation of ANN: training, testing and validation. The training group of data is used to modify the weights connecting the neurons. The test group of data is utilized to evaluate optimality and generalization capabilities of the developed model. Finally, the validation group of data, on the other hand, is utilized to evaluate the network geometry and model parameters. It is important to realize that the validation set has not been implemented through the model creation process [Güçlü and Sükrü, 2010]. During training process, MSE and R, "correlation coeffi cient", values were used and tracked as monitoring and performance measures.
Overfi tting is a problem that arises during neural network training when the error on the training set is pushed to a relatively small value when the error on the test data set displayed to the network is high. This suggests that the network has learned descriptions in instruction but is unable to generalize to new confi gurations. In order to avoid overfi tting, the training results, trial and error must be used to determine the approximate node, hidden layers and epoch numbers [Dogan et al., 2008].
The MATLAB program arbitrarily partitions the input and target variables into three groups. In this paper, 70% of the data are allocated to training and 15% per each validation and test sets.

Evaluation of model performance (Optimization of ANN Model)
Every artifi cial neural network model performance was assessed by computing MSE between the modeled and the target output data sets for both training and testing by eq. (5). Moreover, the coeffi cient of correlation (r) given in Eq.
where: χ i is the target value, γ i is the forecasted value, ̅ is the average of χ, ̅ is the average of y, and N is the entire number of model outputs.
Lower MSE values are preferred, and a value of 0 implies that there is no error. The R values quantify the relationship between outputs and objectives, the greater the R value, the better. A close correlation is shown by a R value of 1, while a random relationship is shown by a value of 0 [Hassen and Asmare, 2019].

ANN Software and Network Properties
A feed-forward backpropagation neural network was used to implement the ANN model. In this analysis, various numbers of hidden layers and neurons within each hidden layer were used to determine the most fi tting model; one, two and three hidden layers with 5 and 6 neurons in each layer were used respectively. These numbers were calculated by a trial and error process by comparing the performances of various confi gurations. The Levenberg-Marquardt, Polak-Ribiere Conjugate Gradient and Fletcher-Powell Conjugate Gradient algorithms were used for ANN model training. The BP algorithm is an approximation of the steepest descent in which the correlation coeffi cient and MSE serve as performance functions.
Tansig transfer function at the hidden layer and purelin transfer function at the output layer were utilized for all algorithms. The mathematical justifi cations for these transfer functions that were employed are provided below.
where: n is any variable.
The general structure of all developed neural networks consisting of one input layer, variable number of hidden layers and one output layer was adopted. Several training to several neural networks were realized with varying iteration counts (epoch) and hidden layer node counts to decide the best architecture.
Normalization was performed for the data within the range 0-1, and normalization was made around SD also [Dogan et al., 2008]. In order to achieve a better fi t to the observed data, the three types of data (two normalized data and original data) were run separately. Since these experiments were ineff ective, they would not be addressed in depth here. All training functions included in the NN tool in MATLAB program were applied and tested. After studying the models results, it was found that most of the training functions did not succeed with the original data nor with the normalized data about the SD, where the best results and models were from the normalized data within the range 0-1 according to the MSE and R values, as shown in table (2), which indicates the best results obtained and the best training functions that gave the lowest values of MSE and highest values of R.

RESULTS AND DISCUSSION
On the basis of the pre-processed data of the raw influent and effluent wastewater, two types of configurations were constructed in this study to forecast the quality of the treated effluent. In this work, 6 models (3 MISO and 3 MIMO setups) were created and assessed.
As it can be seen from table (2), the training functions that produced the most appropriate and satisfied outcomes for prediction with correlation coefficient values range from 0.829 up to 0.99. Figures (4-6) compares the actual values to anticipated ones from neural network models. The figures show a high level of agreement between the experimental and anticipated values.

COD predicted models
One MISO configuration model with nine input variables and three MIMO configuration models were created to estimate the COD of the treated effluent. Table (3) shows the statistical properties of the highest performing models from each arrangement.
Among the COD prediction models created, the MIMO model with COD and BOD 5 as outputs has demonstrated high generalization and predictive performance with a R value of 0.9133, as shown in fig. (4). Even though the MISO and other MIMO models had a lower R value than the COD and BOD 5 models, high accuracy was confirmed by R values greater than 0.8 for all training and test data. High accuracy was confirmed by R values greater than 0.8 for all training and test data.

BOD 5 predicted models
Models were established for each of the three generated networks (one MISO and two MIMO) to fi nd the best network architecture for BOD 5 prediction in the treated effl uent. Table (4) displays the statistical properties outcomes of diff erent models.
According to the statistical characteristics of the two setups for BOD 5 prediction, the MISO model was found to be the best, with a R value of 0.99. As a result, when compared to the other evaluated models, the MISO model generalizes the data eff ectively and is likely to produce correct predictions, when new data is presented.   Figure 5 depicts the linear regression plot for the highest performing models from each setup. As demonstrated in Figure 5, the MISO model has a better match than other models when comparing the anticipated and actual data (based on MSE values).

TSS predicted models
One MISO confi guration model with nine input parameters and two MIMO confi guration models were created to estimate the TSS of the treated effl uent. Table (5) shows the statistical properties of the highest performing models from each arrangement.
The MIMO model with (TSS, COD, and BOD 5 ) as outputs was the highest performing model for TSS prediction in the treated effl uent, as shown in Table ( 5), with a R value of 0.9017. The regression plots between the actual output data and anticipated data, shown in Fig. (6), likewise corroborated the outcome.

CONCLUSIONS
According to the performance fi ndings, the MIMO model outperformed the MISO setup in terms of predictive performance, with the R values greater than 0.9. The current study exhibits the capability of ANN for COD, BOD 5 , and TSS modeling. The selection of ANN design and input parameters, on the other hand, is critical for obtaining excellent estimate accuracy. The parameter signifi cance could be interpreted from the model input combination that yielded the highest prediction accuracy, the infl uent BOD 5 concentration was important in the treatment process regarding the effl uent COD concentrations, while the effl uent BOD 5 concentration was directly dependent on the infl uent COD.  On the basis of the results, an ANN model seems to be a viable tool for predicting COD, BOD 5 , and TSS. The results demonstrate that BOD 5 has the largest influence on the values of R in the COD prediction models than TSS and other parameters. Among the input combinations tested in the study, the models the inputs of which did not contain BOD 5 had the best performance criterion. When BOD 5 exists among the input variables, it can be noticed that the value of R is lower than in other models that did not have it as can be seen in tables (3)(4)(5). The effect was opposite for the BOD 5 prediction models, as the model that included COD within the input data achieved the best performance and the highest value for R, reaching 0.99 compared to the models where COD was not included in the input, where the R value decreased for it, while there was no significant effect of the TSS. The same behavior and effect was observed in the TSS prediction models in terms of the presence or absence of the BOD 5 within the inputs as the models without it in the input data achieved higher values for R; tables (3 & 4) present those results.
This study demonstrated the value of utilizing neural networks to capture the non-linearity and complexity of the connection between raw influent and treated effluent water quality data. As a result, the plant control and monitoring are aided by this instrument.