Simulation of Favorable Habitats for Non-Gregarious Locust Pests in North Kazakhstan Based on Satellite Data for Preventive Measures

The paper considers the approaches and possibilities of using two types of simulation: the species distribution model and the ecological niche model. The study aimed to simulate favorable habitats and the potential spread of non-gregarious locust pests in North Kazakhstan based on satellite and ground data for preventive measures. The MaxEnt software was used to conduct the simulation. According to the species distribution model, high indicators of the habitat are predicted in the Pavlodar and Kostanay regions, on 69.9–100% of the studied territory. With the simulation of ecological niches for non-gregarious locust pests, the following class boundaries were determined for the transition from quantitative to qualitative indicators from I (85–100%) to IV (0–50%), which indicates the zones of the probability of pest attack from a higher indicator to a lower one. According to the fundamental model, high indicators of the area of pest occurrence, that is, zones I and II, are located in the central and northern parts of the Pavlodar region. Here, the probability of non-gregarious locust occurrence of zone I and II with a ratio of 1:1 is observed in a slightly arid, moderately warm agro-climatic zone. In the southern part of the Kostanay region, the simulation predicts the probability of occurrence on zones I and II with a ratio of 1:2 in the moderately arid warm agro-climatic zone of this region. In the southern and southeastern parts of the Akmola region, the model predicts the probability of occurrence in zones I and II with a ratio of 1:3 in a slightly humid, moderately warm agro-climatic zone of the region. The considered species distribution model can be used as a modern tool for long-term forecasting of the spread of non-gregarious locust pests since it takes into account the peculiarities of the agricultural landscape. The fundamental niche model can be used in a long-term population forecast since it focuses more on the theoretical conditions of pest habitats.


INTRODUCTION
When studying the nature of the distribution of pests invading crops, in particular locusts, makes it possible to identify their ecological features, find stable population parameters and build sound approaches to predicting their spatial distribution in the study area [Adu-Acheampong et al., 2017; Zhang et al., 2019; Klein et al., 2022]. Predictive simulation of the geographical distribution of the studied object has become an important tool in agroecology, since it uses the previous information about the spatial distribution of species in the ecosystem, limiting predictive models to the nearest ecological niche, thus generating a forecast of possible areas of occurrence based on environmental conditions that are similar to the identified populated area [Latchininsky and Sivanpillai, 2010; Kambulin et al., 2015;Sagitov and Duisembekov, 2016]. In this case, the modern simulation approach deserves special attention. Such models are often referred to as species distribution models (SDM), habitat suitability, or ecological niche models (ENM) [Sergeev, 2010;Merow et al., 2013;Orlov and Sheludkov, 2019]. SDM usually does not require an in-depth analysis of variables and simply provides a map of suitable habitat for a species. A predefined (or "customized") set of variables based on the biology of the species is usually used. SDM is a purely statistical approach that is loosely related to the natural features of the species. ENM is performed the same as SDM but includes an expanded set of factors [Malakhov and Zlatanov, 2020].
Meanwhile, there are differences between the SDM and ENM approaches. SDMs often associate known cases of species occurrence with certain environmental conditions characteristic of the sites where they were recorded to predict the possible places where populations could persist within the landscape; thus, only the main key parameters of environmental variables are required for simulation [Latchininsky and Sivanpillai, 2010;Malakhov et al., 2018]. According to the ENM concept, the resistance of species to certain environmental factors limits their stability in the landscape and an extensive database of environmental variables is required for simulation [Klein et al., 2022].
Both SDM and ENM approaches are trending in modern science [Aguilar and Lado, 2012]. They are performed using the geographical information system (GIS) technology products, such as the MAXENT software [Elith et al., 2011]. With the help of this simulation, it becomes possible to obtain information on the potential locust distribution and reproduction sites on digital maps of the studied territory, where monitoring and protective measures should be directed first of all. If this provision finds its application in the practice of phytosanitary monitoring, then it can be a good alternative tool for the basic logical model of pest forecasting, which is currently used by phytosanitary diagnostics and forecasts services [Malakhov and Zlatanov, 2020].
As the current research results in this area show, SDM and ENM depend primarily on environmental factors, which include abiotic and biotic factors [Phillips et al., 2006;Aguilar and Lado, 2012]. Referring to the studies by [Latchininsky and Sivanpillai, 2010; Malakhov et al., 2018;Malakhov and Zlatanov, 2020], the vegetation cover (the normalized difference vegetation index, NDVI) is given an important place as these factors and parameters, since it is the fodder base of phytophages and meteorological parameters of the studied environment, which determines favorable habitat conditions for certain pest groups.
If consider studies in the field of ENM of locusts and phytophages in general are considered, great attention is paid to their global distribution to prevent invasions and mass reproduction on a global scale [Adu-Acheampong et al., 2017; Kimathi et al., 2020], which is also a strategic issue of phytosanitary safety. At the same time, some works raise questions about improving the preventive pest management measures by ENM within countries and individual agricultural regions. Similar studies on locusts in Kazakhstan were conducted on gregarious locust species in their typical habitats [Malakhov et al., 2018;Malakhov and Zlatanov, 2020;Klein et al., 2022].
However, there are no similar studies on the complex of non-gregarious locusts, and due to the polyphagous nature of these pests, their study is very relevant for the agricultural regions of North Kazakhstan. In addition, the proposed approaches and methods can be used by other researchers in solving similar problems. In terms of the prevalence of non-gregarious locust pests, the regions of North Kazakhstan belong to a region with a high pest population degree. They damage grains, legumes, forage crops, and pasture lands [Kambulin, 2018]. According to the conducted observations , in recent years there has been an increase in the number of these phytophages and pest infestation exceeding the economic harmfulness threshold (above 10 individuals per 1 sq. m) has been observed in many grain crops of North Kazakhstan.
According to literary sources [Akmollaeva, 2004;Zhang et al., 2019], the complex of harmful non-gregarious locusts that destabilize the production of agricultural plants has 9-10 species. Among these, in the agricultural areas of North Kazakhstan, there are such species as Dociostaurus brevicollis (Ev.), Dociostaurus kraussi kraussi (Ingen.), the dark-winged grasshopper (Stauroderus scalaris (F.-W.)), the Siberian grasshopper (Aeropus sibiricus sibiricus (L.)), Pararcypteramicroptera microptera(F.-W.), the lesser marsh grasshopper (Chorthippus albomarginatus albomarginatus (Deg.)), and Euchorthippus pulvinatus (F-W.). From the practical point of view, all monitoring work on non-gregarious locusts is carried out simultaneously on a complex of harmful species for agriculture [Azhbenov, 2013;Azhbenov et al., 2015]. This can be explained by the fact that these species occur in mixed populations and for the prevention of their attack or the simulation of favorable habitats, it is of no practical importance to conduct observations for each species separately in phytosanitary monitoring.
The purpose of the study was to simulate favorable habitats and the potential spread of non-gregarious locust pests in North Kazakhstan based on satellite data for preventive measures against the damage from dangerous pests in agricultural areas of North Kazakhstan.

Area of the study
Since the natural and climatic data of North Kazakhstan are considered the most optimal environment for the spread and harmfulness of all locust species [Kambulin, 2018], including nongregarious species , four regions of North Kazakhstan were selected and covered as the research area, namely, the Akmola, Pavlodar, Kostanay, and North Kazakhstan regions. The analyses were carried out for the period 1999-2021. Figure 1 shows the classes of the studied territory, which are the layers of the Earth's cover. They simultaneously present the territory where the study was conducted. The layers of the ground cover are a very important criterion in determining the preferred habitats of non-gregarious locust pests since these data show vegetation, which is the food base for the pests under study. As it can be seen from the data, the selected study areas are mainly represented by herbaceous vegetation and arable land, which is a very favorable condition for the habitat of locust pests. The database of source data of high-quality vegetation cover plots and several auxiliary data sets reaches 80% accuracy (Copernicus Global Land Cover Layers: CGLS-LC100 collection).

Objects of the study
The objects of the study were non-gregarious locust species. During the monitoring surveys of the fields, the authors used the methods developed by the following scientists: Sagitov and Duisembekov [2016], Azhbenov [2013], Kambulin et al. [2015]. The ground data on the number of harmful non-gregarious locusts were collected jointly with specialists of the Republican State Institution "Republican Methodological Center for Phytosanitary Diagnostics and Forecasts" and cover coordinates The input data of the model were created randomly using coordinate points based on ground survey reports by study areas. The districts of the regions were ranked (from 1 to 6) according to the collected ground data on the areas inhabited by larvae of non-gregarious locusts. On the basis of the classification of areas in question, points were randomly created to train the model.

Using climate data and meteorological parameters to run the model
The satellite images from TERRA and Aqua (MODIS), Sentinel, and Landsat satellites were used as remote sensing data. The climate data were obtained from Bioclim sources [Booth et al., 2013]. According to the results of the analysis, the criteria of meteorological parameters at which locusts developed were clarified. The data from the Landsat and Sentinel satellites were multispectral images in the optical, infrared, near-infrared, and thermal ranges with a spatial resolution from 10 m to 60 m, with a periodicity of 3-16 days.
Such indicators as the Palmer Drought Severity Index (PDSI) and Solar Radiation (12% out of 100% each) for the months of March-July 1999-2021 were taken as secondary factors. PDSI was added in this year of the study since a pattern was revealed between the years with the hydrothermal index (HTI). PDSI is calculated using monthly data on temperature and precipitation, as well as the information on the moisture-holding capacity of soils. PDSI takes into account the received moisture (precipitation), as well as the moisture stored in the soil, taking into account the potential loss of moisture due to temperature influences. For many years, PDSI has been the only current drought index, and it is still very popular around the world. Spivak et al. (2011Spivak et al. ( , 2017 used soil moisture capacity data to assess the risk of outbreaks of gregarious locusts and other pests. Table 1 shows data on key climatic and meteorological parameters. Such key factors as PDSI (March-July, 1999-2021), NDVI (June Table 1. Climatic data defined as input parameters for the simulation of favorable habitats and the potential spread of non-gregarious locust pests Note: The PDSI scale ranges from -10 (very dry) to +10 (very wet), with 0 being normal. Cronbach's Alpha: > 0.9 very good; > 0.8 good; > 0.7 sufficient; > 0.6 dubious; > 0.5 bad; < 0.5 insufficient.   Figure 2 shows the percentage and contribution between the key input parameters (data input) for the ENM of non-gregarious locust pests. As it can be seen from the data, such factors as precipitation, minimal temperature, and maximal temperature for the period from 1999 to 2021 were used most of all (24% out of 100% each) for monthly ENM. This is because these factors are the main ones in the study of pest bioecological features [Zhang et al., 2019].
In addition, the data on soil moisture with a resolution of 1 km were downloaded from the Soil Moisture and Ocean Salinity (SMOS) and Soil Moisture Active Passive (SMAP) resources http://nsidc.org/data/smap and the temperature of the Earth's surface was calculated from Landsat images. Terrain data will be obtained from the open Shuttle Radar Topography Mission (SRTM) sources. According to the results of the analysis, ecological niches and migrations of locusts were determined [Jakob, 2001].

Simulation methods and tools
The basis for simulation was the concept of a multidimensional Hutchinson niche [Hutchinson, 1957;Colwell and Rangel, 2009; Orlov and Sheludkov, 2019]. As the simulation algorithm for the spatial distribution of grasshoppers, the maximum entropy method implemented in the MaxEnt software [Phillips et al., 2006] was used. MaxEnt is a machine learning algorithm that predicts the presence of the species in geographic space based on the registration types (presence-only), excluding the areas with a documented absence [Merow et al., 2013].
The ENM of the studied non-gregarious locust pests in the performed study covered two areas: 1) Conducting SDM or realized niche simulation based on certain climatic variables, more closely tied to the agricultural landscapes where the objects were discovered; 2) Conducting ENM or fundamental niche simulation based on expanded data of climatic variables where objects can theoretically be found; 3) Verifying between two predictive models, SDM and ENM. This work was carried out by comparing the ENM with the actual ground monitoring data for the studied objects. Comparisons were focused between ground-based monitoring data, the realized niche simulation launched in the first year of study, and the fundamental niche simulation launched in the current year of the study. Figure 3 shows the realized niche simulation obtained as a result of the correlation model. The selected climatic indicators of air temperature, soil temperature, the humidity of the upper soil layer (5-10 cm), and precipitation of the cold season falling in the form of snow are most likely related to the thermoregulation of the surface soil layer in which the egg-pods overwinter. These indicators directly affect the appearance of larvae in the spring, after the deposition of egg pods in the autumn of the previous year. In addition, an indicator of "pure radiation" on the Earth's surface was added as "the difference between the total radiation flux from above and the total radiation flux from below." In other words, pure radiation is the energy available to the earth on the surface of the soil.

SDM or realized niche simulation for nongregarious locust pests in North Kazakhstan
If the area of attack given in the realized niche simulation shown in Figure 3 is considered, it can be seen that according to the gradation of the probability of attack by these pests, high indicators are assigned to the Pavlodar and Kostanay regions (the majority of the studied zone falling within 69.9-100%). This is followed by the Akmola region, where the model predicts the probability of occurrence in most cases in the range of 42.2-78.6%. The North Kazakhstan region can be called a region with minimal exposure to the attack by non-gregarious locusts, where the probability of occurrence in most cases ranges from 22.7 to 36.6%. Figure 4 shows the error curve. The higher the area under the curve (AUC) index, the better the model, while the value of 0.5 demonstrates the unsuitability of the chosen method. The predicted frequency of assumptions is a straight line. In some situations, the test assumption line lies well below the predicted passing line: a common reason is that the test and training data are not independent, for example, if they were obtained from the same spatially auto-correlated data. The red (training) line shows the "fitting" of the model to the training data. The blue line (testing) indicates the compliance of the model with the testing data and is a real test of the predictive ability of the models. In the considered case, the indicators are quite high.

ENM or fundamental niche simulation of nongregarious locust pests in North Kazakhstan
The model was launched according to the basic settings. The optimal model is selected step by step, and the number of steps (maximum iterations) is set to 500 by default. This value is most often suitable only for simple models or for  For complex models with many factors, the parameter value needs to be increased. In the conducted study, the number of steps was set to 5,000. Besides, a cumulative result was established, which is most suitable when searching for the boundaries of species distribution. This type of result is proportional to the probability of the presence of the species if some additional conditions are met ( Figure 5).
To predict a larger probability of the spread of non-gregarious locust pests, the number of climatic factors was increased, which made it possible to carry out the mechanistic simulation of the fundamental niche. Thus, a mechanistic model has been realized (the ENM model, i. e. the fundamental niche simulation). The input data were taken into account for all districts, and the input parameters were taken for 1999-2021. In the case of the ENM of non-gregarious locust pests, the following class boundaries were determined for the transition from quantitative According to the launched ENM model (Figure 5), high indicators of the attack area are attributed to the central and northern parts of the Pavlodar region. Here, ENM in the majority of the territory shows the probability of an occurrence of non-gregarious locust in zones I and II with a ratio of 1:1 (the zones are equal) in a slightly arid, moderately warm agro-climatic zone. In the southern part of the Kostanay region, the ENM model predicts the probability of an occurrence on zones I and II with a ratio of 1:2 in the majority of the territory (zone II being the dominant one) in the moderately arid warm agro-climatic zone of this region. In the southern and southeastern parts of the Akmola region, the model predicts the probability of an occurrence of zones I and II with a ratio of 1:3 (zone II is over-dominant) in the slightly humid, moderately warm agro-climatic zone of the region. Zones I and II are not observed in the North Kazakhstan region. In this regard, this region can be attributed to the areas with minimal exposure to non-gregarious locust pests.
The basic measure for evaluating the quality of the model in MaxEnt is the AUC under ROC.

Verification between two predictive models, SDM and ENM
In general, the obtained simulation data do not contradict the data of ground-based phytosanitary monitoring. In this case, the realized niche simulation or SDM predicts the probability of attack depending on the coordinates within the agro landscape where the harmful object was found. In contrast, the fundamental niche does not tie the probability of attack to the agricultural landscape but predicts the possible attack of the This predictive ability indicator is interpreted as the probability that randomly selected coordinates are predicted better than randomly selected background coordinates.
According to the AUC value, the simulation quality can be divided into five categories (Cory et al. 2013): 0.9-1: "excellent", 0.8-0.9: "good", 0.7-0.8: "satisfactory", 0.6-0.7: "bad", <0.6: "very bad" (failed simulation). Figure 6 shows the error curve based on the simulation results. AUC= 0.856. Thus, the obtained mechanistic model for ENM is satisfactory (adequate).   object under study according to its general environmental requirements. Here lies the main reason why the fundamental niche simulation shows a greater probability of attack, than the realized niche simulation (Table 2). During the study period of 2020-2021, the forecast values of the models were quite satisfactory, and the correlation coefficient of climatic parameters was 0.63:0.66. It is worth noting that the data set on environmental factors should be adequate for the objects under study. In the future, to study the influence of environmental factors on the formation of the area, as well as to improve the model, it is necessary to minimize the correlation of climatic parameters.

DISCUSSION
Natural agricultural lands are a reservoir for non-gregarious locusts [Childebaev, 2002]. In this connection, the immediate proximity of crops to pasture areas makes it favorable for the attack of these phytophages [Baibusenov et al., 2020]. This approach to the simulation of favorable locust pest habitats is an innovative method for further forecasting their potential distribution sites [Klein et al., 2022] since scientists claim [Van Huis et al., 2007] that a ground survey requires a large number of specialists to study the vast areas of distribution of the studied pests.
Sometimes, millions of hectares should be surveyed within a short period [Latchininsky et al., 2016]. Thus, modern digital maps of harmful locust pests obtained by ENM, based on which it is possible to identify the most preferred places for their distribution, make it possible to focus primarily on the problem areas where the model shows a high probability of attack by the studied objects [Cressman, 2013;Klein et al., 2022]. On the other hand, this method can act as a preventive approach  to detect the rise in the number of non-gregarious locust pests. Non-gregarious locust pests occur and cause harm to agricultural plants in a complex ratio [Akmollaeva, 2004]. In addition, all monitoring and protective measures are also carried out by plant protection services against the entire encountered complex, which includes about 10-12 harmful species [Childebaev, 2002]. In this regard, in the conducted studies, simulation was carried out on a set of these species for digital mapping of potential places of their distribution.
According to scientists [Le Gall et al., 2019], ENM reflects the distribution of species with rough spatial resolution, in most cases based on abiotic and edaphic conditions. In this case, although there may be differences between the species of the non-gregarious locust complex in their requirements for environmental factors, in the authors' opinion, they are not significant and it is not advisable to conduct a separate analysis for each species. The authors adhere to this opinion by considering this issue more from the practical side as specialists in plant protection, that is, as applied science. On the other hand, for fundamental researchers, the question of conducting such studies for each species separately remains open.
If the considered study is compared with other works conducted before, it can be said that similar studies in this area were carried out by scientists [Malakhov et al., [Cressman, 2013;Kimathi et al., 2020]. The authors of this paper also conducted studies on ENM of non-gregarious locust pests, the bioecological features of which are radically different from the above-mentioned species. However, in general, all the principles and approaches, as well as the analysis of input data for simulation, were carried out according to a similar scheme as that of other researchers.
In phytosanitary monitoring and forecasting of the development and spread of harmful organisms, a gradual transition from the accepted classical methods to more modern approaches is necessary, since it will allow the creation of digital visualization of the spread of the studied phytophages for operational decision-making on crop protection. These results of studies on the simulation of favorable habitats and the potential spread of non-gregarious locust pests may allow prioritizing the areas for risk assessment, monitoring, and early warning measures for the development and spread of pests.

CONCLUSIONS
Summing up the results of this study, the following conclusions were drawn. If the area of the attack obtained with the realized niche simulation (SDM) is considered, it can be seen that according to the gradation of the probability of occurrence with these pests, high indicators are assigned to the Pavlodar and Kostanay regions (the majority of the studied zone falls within 69.9-100%). This is followed by the Akmola region, where the model predicts the probability of occurrence in most cases in the range of 42.2-78.6%. The North Kazakhstan region can be considered a region with minimal exposure to attack by non-gregarious locusts, where the probability of occurrence in most cases ranges from 22.7 to 36.6%. The proposed model can be used as a modern alternative to logical models of long-term forecasting since it can predict the probability of the spread of phytophages within the landscape of agricultural areas. According to the launched ENM model, high indicators of the attack area are allocated to the central and northern parts of the Pavlodar region. Here, ENM in the majority of the territory shows the probability of an attack of non-gregarious locusts in zones I and II with a ratio of 1:1 (the zones are equal) in a slightly arid, moderately warm agroclimatic zone. In the southern part of the Kostanay region, the ENM model predicts the probability of an occurrence in zones I and II with a ratio of 1:2 in the majority of the territory (zone II being the dominant one) in the moderately arid warm agro-climatic zone of this region. In the southern and southeastern parts of the Akmola region, the model predicts the probability of an occurrence in zones I and II with a ratio of 1:3 (zone II is overdominant) in the slightly humid, moderately warm agro-climatic zone of the region. Zones I and II are not observed in the North Kazakhstan region. The presented ENM can be used as a long-term forecast of the probability of the spread of non-gregarious locust pests in advance for several years, since this simulation does not take into account the attachment of the studied objects to the limits of the landscape where they were found, but considers the more extensive parameters of the territories that they can theoretically inhabit. According to the results of the verification of the conducted studies, the forecast values of the models were quite satisfactory, and the correlation coefficient of climatic parameters was 0.63:0.66. It is worth noting that the data set on environmental factors was adequate for the studied objects. In the future, similar studies can be conducted throughout Kazakhstan to obtain a complete digital map of preferred locations for the spread of non-gregarious locust pests to adequately plan plant protection products.