Empirical Model for Estimating the Ecological Footprint in Ecuador Based on Demographic, Economic and Environmental Indicators

In this study, the existence of long-term trends in ecological footprint (EF), biocapacity, GDP, population and CO2 emissions for the period of 1961–2016, and their effect on the demographic, economic and biocapacity indicators on Ecuador’s EF were identified. The long-term trend analysis was performed by means of a Mann-Kendall, nonparametric test. The development of a multiple linear regression model of the EF considers the population, GDP, biocapacity and its logarithmic transformations as returners. A backward removal method was used, in conjunction with the Akaike criterion (AIC) to validate the most suitable model in terms of adjusted-R2, NSE, BIAS and RMSE, respectively. The results show significant changes (p<0.01) of the annual EF increase (0.015 hag), total population (216.375 inhabitants), GDP ($1.2 billion) and CO2 emissions (718.6 kt). However, the biocapacity has been declining (0.086 hag) at a faster rate than the ecological footprint. In other words, in a few years, the country will be facing ecological deficits. As for the empirical model of EF, it can be observed that for every increase of inhabitant’s units, the natural logarithm of biocapacity and GDP will increase EF by 1.68x10–7, 4.84 and 0.905 gha, respectively. Moreover, EF will be decreased by 0.6 gha each time the biocapacity increases by one gha unit. Finally, this robust and easy-to-interpret model allows accurate EF predictions that can be a tool to better forecast the environmental trends, allowing the development of sustainable projects in Ecuador.


INTRODUCTION
A nation's economic growth reduces poverty, provides infrastructure and improves people's lifestyles. However, the economic growth per se is neither sustainable nor achievable, mainly because it stimulates the extraction and consumption of natural resources, which can no longer keep up the pace with current demands (Danish et al., 2019;Ahmed, 2020). For this reason, the today's economic success must not be at the risk of the environment, since the tomorrow's well-being depends on it (Anand & Sen, 2000). Therefore, a proper interpretation of EF enhances the development of instruments and public policies in urban, economic and environmental matters, helping the population to live within the ecological budget, in order to overcome the existing models of sustainable cities such as Curitiba (Brazil) and Portland (USA) (Martinez, 2009).
EF is an instrument based on a system of indicators, usually used to study the impact caused by a determined consumption styles of a population or an individual on an ecosystem, the underlying context of which recognizes that the Earth has a finite amount of biological production, capable of sustaining all its life on it (Galli et al., 2014;Lin et al., 2018;Mancini et al., 2018;Ulucak & Lin, 2017). Ergo, EF reflects the environmental degradation (Charfeddine & Mrabet, 2017;Mrabet et al., 2016) and the influence of human activities on the land (Destek et al., 2018). EF per person is based on the fact that the planet Earth has about 12.6 billion productive ecological hectares from which, when divided by the total world population and 12% of the area for the maintenance of biodiversity, corresponds to each individual, or what it is equal to 2.2 hectares on average as ecological footprint. Nevertheless, currently, everyone exceeds this allocation by 30%, estimating that by the year 2050, when the world population will reach approximately 10 billion inhabitants, the productive ecological area will have decreased to 1 ha/inhabitant (Martínez, 2009;Borucke et al., 2013). In addition, this ecological deficit happens when the consumption of resources and/or the production of human waste surpasses the Earth's capacity to generate these resources and/or absorb the waste (Badii, 2008). If EF is greater than the carrying capacity, it means that there is an ecological deficit, i.e., the region is not self-sufficient because it consumes more resources than those it has available or, on the contrary, if such resources do not exceed the region, then it could be claimed that the region is sustainable or self-sufficient.
In this regard, in accordance with the 2017 "Ecological Footprints of Nations" report for 184 countries examined, Qatar (14.7 gha/inhab), Luxembourg (12.8 gha/inhab), United Arab Emirates (8.9 gha/inhab), Bahrain (8.7 gha/inhab) and Trinidad and Tobago (8.2 gha/inhab) presented the highest ecological footprint. In turn, the lowest values were found in Eritrea (0.5 gha/inhab), Burundi, Yemen and Haiti (0.6 gha/inhab), Timor-Leste and Afghanistan (0.7 gha/inhab). The countries with a high EF also have a high ecological deficit and are therefore importing biocapacity through international trade, depleting their national ecological assets and polluting the atmosphere with residual carbon dioxide emissions (Ahmed et al., 2020). In the case of Ecuador, it ranked 130 th , with an ecological footprint value of 1.7 gha/inhab. In other words, the area has considerable reserves. The country's biocapacity exceeds its ecological footprint by 14%. Ecuador is a producer and exporter of raw materials, supplying natural resources to other countries, which action contributes to the ecological deficit. For example, in 2009, 119.6 million barrels of crude oil were exported, contributing 40% of the country's export footprint (MAE, 2013 By analyzing the factors affecting EF, recent studies have focused on the greenhouse gas emissions, renewable and non-renewable energy use, real earnings, urbanization, lease of natural resources by means of dynamic ordinary least squares (DOLS) estimators, as seemingly unrelated co-integrating and dynamic regression models, or fully modified ordinary least squares (FMOLS)  . c) Total population is based on the de-facto definition of population, which includes all residents, regardless of their legal or citizenship status, with the exception of refugees temporarily settled in the country of asylum, who are usually considered part of the population of their country of origin. The values displayed are mid-year estimates. d) Gross domestic product (GDP) in USD at constant 2010 prices. It is the sum of gross aggregate value of all resident producers in the economy, plus any taxes on products, except any subsidies which were not included in the value of products. e) CO 2 emissions in kt per PPP $ of GDP are those that would otherwise come from fossil fuel combustion and cement manufacturing. They include the carbon dioxide produced during the consumption of liquid, solid, and gaseous fuels and from gas flaring.

Data and variables
All the data of EF and biocapacity was taken from the Global Footprint Network (http://data. footprintnetwork.org/), while trade openness data for GDP, CO 2 emissions and total population were taken from the World Bank (https://data.worldbank.org/). All the variables were transformed in natural logarithms (ln).
These variables have been widely used in EF analysis (Ahmed, 2020; Danish et al., 2020; Destek & Sinha, 2020; Kassouri, 2020; Zambrano-Monserrate et al., 2020). Using the five variables, the normality test was performed using a Kolmogorov-Smirnov test and, in order to check whether the annual data presents the same variance or if it is very close to be the same (homogeneity of variance), a Levene's test was applied (Carroll & Schneider, 1985).

Long-term trends analysis
Once the statistical analysis of the five variables was completed, the trends were determined on an annual scale, using the Mann Kendall nonparametric test (Mann, 1945;Kendall, 1975), to three statistical confidence levels: 90%, 95% and 99%. This test detects the changes in the average of the observed data and does not assume independence between them, being useful for seasonal data and recommended to evaluate the trend in environmental data series (Yu & Kao, 2007). The Mann Kendall test is initially based on the calculation of the S-statistic, defined as the following: (1) where: (2) where: n is the sample size y; x j and x i are sequential data. For large samples, the S-statistic is normally distributed with an average of zero and a variance of: where: Σt indicates that the term (t-1) (2t+5) and is evaluated for the t groups of existing ties in the series.
If S> 0: If S = 0: If S 0: In addition, this method considers the null hypothesis when there is no trend in the series, and an alternative hypothesis when the variable increases or decreases constantly over time (Güçlü, 2018). In addition to identifying trends, their magnitude was calculated by using a Sen's slope estimator (Sen's slope estimator Thiel-Sen test). This nonparametric method determines the decrease or increase per unit of time in a linear trend, representing the average slope of a linear regression (Sen, 1968;Theil, 1992).

Main component analysis
For multiple regression analysis we have a set of predicting (independent) variables from which we want to calculate the FE (dependent). However, several situations can occur: a) All predictors are essential for accurate prediction; b) Some predictors may not have predictive values (and therefore, can be eliminated); c) The existence of subsets of predictors, partially or completely separate, that provide a prediction as accurate as the full set of predictors (Hawkins, 1973). Therefore, it is important to evaluate if there is an exact multicollinearity between some of the predictors, and whether this was done by using the latent root regression technique (Hawkins, 1973;Jeffers, 1981). This technique is an extension of a main component analysis (PCA) and consists of including the dependent variable (EF) in the analysis. PCA is a data mining technique that linearly transforms the original set of variables into a smaller set of uncorrelated variables; this set maintains most of the variance of the original set (Duntenam, 1989). The advantage of this method is its ability to identify multicollinearities (Hawkins, 1973). Principal components (PCs) are linear combinations of the original variables: (7) where: PC i is the i-th main component and a ij is the coefficient of the main component PC i of the original variable x j .

Multiple linear regression analysis
The HF prediction was made using a multiple linear regression (MLR) model, based on the analysis of various factors, in which it is assumed that more than one variable (independent) has an influence and/or is correlated with the value of a third variable (objective). This method has the advantage of using more information in the construction of the model and, therefore, more accurate estimates can be obtained (Vasallo, 2015). A linear model that relates a response variable and to a set of predictors be: (8) where: are the regression coefficients; are independent variables or predictors and ϵ is a random error term that represents random fluctuations or measurement errors.
In our case study, y represents the EF variable. In addition, the dependent variables to be considered include: biocapacity, population, CO 2 concentration, GDP and/or some transformation of these variables. For the exponential-type relationships, the logarithm-type transformations were taken. In order to analyze which of these variables (or their corresponding transformations) generate a much more robust and statistically significant model, the Akaike criterion (AIC) was used for the step-by-step selection methods. Under the principle of parsimony, the models with fewer variables will be preferred, so the use of variable selection methods is indispensable.
Step-by-step regression Step-by-step regression is a method of adjusting regression models where the choice of predictive variables is done by an automatic procedure (Efroymson, 1960). At each step, a variable is added or removed from the explanatory variable set, based on some prespecified criteria. F-tests or t-tests are generally used, but other techniques are possible as well, such as adjusted-R 2 , AIC, Bayesian information criterion, Mallows Cp, PRESS, among others.
Three main step-by-step regression methods are distinguished: forward selection, backward deletion, and bidirectional deletion. Both forward selection and bidirectional deletion can discard the variables that are individually unpredictive, but still, in conjunction with other variables can effectively contribute to the model. On the other hand, backward elimination has the advantage of evaluating the combined predictive capacity of variables, since the process begins with all variables included in the model. Backward deletion also removes less important variables from the beginning and leaves only the most important ones in the model. However, one disadvantage of the backward removal method, is that once a variable is removed from the model, it cannot be re-entered, whereas a discarded variable can become more significant, later in the final model (Chowdhury & Turin, 2020). In general, there is no consensus as to which method is more appropriate (Royston et al., 2009). Nonetheless, the backward removal method was used, which has been proven to generate better results in terms of adjusted-R 2 . The Akaike Information Criterion (AIC) was also used to determine which variables leave the model in each iteration of the algorithm.

Akaike Information Criterion
The AIC is a tool that compares different models. The step-by-step selection methods include or exclude variables by generating a different model in each iteration. AIC provides a criterion for selecting a balanced model (not too few, not too many variables). Including too few variables often fails to capture the true relationship, while too many variables create a generalization problem. A model cannot accurately represent the exact relationship that exists in the data, as it will always generate information loss, especially when a limited number of variables is considered. It is believed that the quality of the model is better with minimal loss of information, so that it is mostly relevant to select the model that best minimizes such loss, that is, one that effectively lowers the AIC values, usually representing the models with a minimal loss of data (Akaike, 1974;Burnham & Anderson, 2004).
For general cases, the AIC is calculated (9) where: k is the number of parameters in the statistical model and L is the maximum value of the likelihood function for the estimated model. L is a measure of model fit, the greater the L, the better the adjustment For small samples , it is recommended to use a second-order AIC: (10) where: n the number of observations.

Linear regression model validation
A validation of the multiple regression model was performed, using the adjusted determination coefficient (adjusted-R 2 ), e root mean square error (RMSE), Nash-Sutcliffe coefficient (NSE) and BIAS, defined below.
RMSE: is based on the observed and simulated data for a given period (Ćalasan et al., 2020) and can be expressed with the following equation: (11) where: is the observed data, is the data obtained by the y model, and n is the number of observations. The Nash-Sutcliffe coefficient (NSE) sets the relative magnitude of residual variance compared to the observed data variation and is calculated as one, minus the reason for the error variance of the simulated data, divided by the variance of the observed data series. A perfect predictive model fit is given by the unit, but a negative value indicates that a model does not fit the observed data (Nash & Sutcliffe, 1970). (12) where: are the observed values of the dependent variable; are the values that result from the application of the y model; is the average of the observed over the data period.
The relative bias "BIAS", indicates the average trend of the simulated data to be greater or inferior than the observed data, reflecting the systematic predictive model for the under-or overestimation (Rodrigues et al., 2020) of the dependent variable. Zero values indicate a perfect fit, its calculation is obtained as (Büchele et al., 2019): (13) where: is the observed data, is the data obtained by the y model, n is the number of observations.

Descriptive statistics
The average EF analyzed for Ecuador from 1961 to 2016 was 1.74±0.3 hag per capita. EF reached a low value of 1.16 in 1963. In 1999, however, it reached a value of 2.3 hag per capita, with a low annual variability. The average biocapacity was 3.89±0.3 hag, a value which was higher than EF, this shows the ecological reserve for the 56 years analyzed in the study (Table 1). Until 2016, Ecuador consumed the ecological resources from its own production, without depleting its domestic ecological assets, with low air pollution in residual carbon dioxide emissions. In other words, it has not imported biocapacity through international trade . In addition, GDP, population and CO 2 emissions were analyzed. GDP in USD at constant prices in 2010 has been highly variable from year to year, but its average was $39.79 billion dollars. The year 1961 presented the lowest income, whereas in 2016, it reached 86.42 billion dollars. Similarly, the total population has the minimum population values for 1961 and in 2016 its maximum value, its average was 10.120.856 inhabitants with moderately low variability. Finally, the average CO 2 emissions were 18.372.76 kt, in 1962 and 2014 the lowest and highest carbon dioxide emissions were generated, respectively. GDP and the CO 2 emissions showed great year-over-year variability (Table 1).

Long-term trend analysis
The long-term trend analysis using the Mann-Kendall non-parametric test showed that from 1961 to 2016, there were significant positive trends (p<0.001) for EF, with an increase of 0.015 hag per capita/year. Similarly, positive increases (p<0.001) of GDP of USD 1.2 billion were determined at constant prices in 2010 per year, 216.375 inhabitants/year and 718.6 kt/year of CO 2 emissions. Nevertheless, a significant negative linear trend (p<0.01) for biocapacity with a decrease of 0.086 hag per capita/year was also found ( Figure 1). Therefore, biocapacity was declining at a faster rate than EF. These results are consistent with Lin et al. (2018), where the global EF continues to overcome biocapacity and with the investigation of MAE (2013). The increase in EF in Ecuador during the 56 years of study reflects the excessive use of natural resources from year to year (Destek et al., 2018) and their consequent impact on ecosystems and biodiversity (Galli et al., 2014). On the other hand, the increase in the CO 2 emissions is consistent with the work of Sánchez et al. (2020). Although MAE developed national climate change strategies in 2012, which proposed the policies and guidelines to reduce or stabilize the greenhouse gas emissions in the productive and social strategic sectors, the CO 2 emissions continued to increase. If the trends in these variables continue in subsequent years, the Ecuador's nature reserves will run out and will not be able to renew, unless the human demands temporarily exceed the reserves of nature (Wackernagel et al., 2018).
The HE was increased due to population growth (r=0.78; p≤0.01), the increased demand for resources per person, and the fact that this country is a producer and exporter of raw materials. Hence, it is using its biocapacity to supply resources to other countries (MAE, 2013). This automatically decreases the available biocapacity per person (r=0.86; p≤0.01). Similarly, CO 2 (r=0.76; p≤0.01) and GDP increase the ecological footprint (r=0.74; p≤0.01), this is due to the economic growth presented by a country (Nathaniel et al., 2020; Ahmed et al., 2020) and associated with the increasing levels of CO 2 emissions (Rentería et al., 2016).

Empirical model
A correlation chart was made between all variables: EF, biocapacity, population (number of inhabitants), CO 2 and GDP emissions. Considering this, exponential trends could be identified in the variables: population, biocapacity and GDP; consequently, it was decided to incorporate the ln(population), ln(biocapacity) and ln(GDP) variables into the initial set of variables. A regression model was tested with all variables, resulting in an adjusted-R 2 of 0.8036, but with no significant variables under the t-test (all p-values were greater than 0.1). A better model was then chosen through the backward removal algorithm using the AIC criterion, resulting in: Under this proposed model, all variables are signifi cant (p≤0.05) and an adjusted-R 2 of 0.8129 is obtained. The CO 2 variable does not remain in the fi nal model, a possible explanation is that by having a high correlation with the other variables, this could be explained by them, without requiring their presence in the selected model.
On the basis of the model, it can be concluded that for each new inhabitant (increased P by one unit) EF would be increased by gha; every time biocapacity increases in a gha unit, EF could decrease by 0.6 gha; for the variable ln(B), if it increases by one unit, EF is increased by 4.841 gha; and fi nally, when ln(GDP) increases by one unit, the footprint increases by 0.905 gha.
Once the model was established, all the hypotheses needed to make it a valid linear regression model were verifi ed. The scatter plots between each of the predictors and the model residuals showed a randomly distributed point cloud around zero, with constant variability along the X-axis. A straight line in the Q-Q chart shows that the residuals follow a normal distribution, as expected. On the other hand, when rendering the residuals against the values adjusted by the model, they are randomly distributed around zero, verifying that there is homoscedasticity. Model validation reports a Coeffi cient of NSE very close to 1 (0.83), a small value of MSE (0.12), and a BIAS with a value close to zero (0.00015). This shows that the empirical model is robust and may represent the conditions of EF observed for the period of 1961 to 2016. These results are consistent with the research for oil exporting countries such as Ecuador Kassouri & Altıntaş, 2020). The EF regression model can be a tool that allows the country to better forecast the environmental trends and develop sustainable projects, which aim to ensure the well-being of every individual within planetary constraints, as recommended by Wackernagel et al. (2018).

CONCLUSIONS
The long-term trend was carried out using annual data during the period 1961-2016, the results have shown signifi cant changes in the increase in demographic, economic and environmental indicators. The ecological footprint is increased due to the population and economic growth, which generates higher CO 2 emissions. In contrast, biocapacity is declining in Ecuador at a faster rate than its ecological footprint. This means that in a few years, the country will deplete its ecological assets. The determination of the empirical model of multiple linear regression of the ecological footprint considered the total population, GDP in US$ at constant prices for 2010, biocapacity and logarithmic transformations of these as returners. This robust and easy-to-interpret model enables accurate ecological footprint predictions and can be a tool to better forecast the future environmental trends and develop sustainable projects in Ecuador.
This is an initial short-term model, useful for explaining the ecological footprint in Ecuador. Like others, this model can be improved based on the new challenges that mankind faces due to its current environmental culture. In addition, new variables could be added to this proposal, considering the actions of the Ecuadorian economic, political and social entities participating, either directly or indirectly in the behavior of the ecological footprint.