Spring Row Crops Productivity Prediction Using Normalized Difference Vegetation Index

The results of statistical modelling for the yields prediction of spring row crops, namely, maize, sorghum and soybean, depending on the values of the remotely sensed normalized difference vegetation index (NDVI) at critical stages of the crops growth and development were presented. The spatial NDVI data obtained from the Sentinel-2 satellite were used to create the models. Quadratic regression analysis was applied to develop the yielding models based on true yield data of the crops obtained in the period of 2017 and 2018 at the experimental field of the Institute of Irrigated Agriculture of NAAS, Ukraine. The results of statistical modelling revealed that the method is suitable for precise yield prediction, and the best stages for NDVI screening and use in this purpose are different for the studied crops. The best accuracy of prediction could be obtained at the stage of tasselling (VT) or silking (R1) for maize (the mean absolute percentage error MAPE is 8.75%); at the stage of second trifoliate (V2) for soybean (MAPE is 3.75%), and at the stage of half bloom (S6) for sorghum (MAPE is 17.62%). The yield predictions by NDVI are reliable at a probability level of 95% (p < 0.05).


INTRODUCTION
Remote sensing of environment is a rapidly developing and valuable branch of modern science. Currently, remote sensing is used in the geoinformation systems (GIS) technologies for mapping, better management of land, water and other natural resources, ecological monitoring, modeling and forecasting and, of course, it is also an important constituent of modern systems of precise agriculture, where remote sensing provides specialists with the data that is further used into the decision support systems for better management of the agricultural land [Kustas and Norman, 1996;Herold et al., 2002;Rogan and Chen, 2004]. Remote sensing provides great opportunities for fast and precise evaluation of crops vegetation status in order to make the required corrections to agrotechnology and reach the maximum productivity of the agricultural land [Liaghat and Balasundram, 2010;Mulla, 2013]. In order to perform these functions, different vegetation indices, which are calculated on the basis of the reflectance features of different land surfaces, are applied.
One of the most widely implemented vegetation indices is Normalized Difference Vegetation Index (NDVI), which was first mentioned by Rouse et al. (1974). This index was the first spatially derived one to be successfully applied for distinguishing the vegetation cover and its conditions. The index values are calculated by the Eq. (1): where: a nir is the reflective infrared range of the spectrum, a vis is the visible red range of the spectrum [Carlson and Ripley, 1997].
However, the modern implementation of NDVI is not just limited to the detection of vegetation and description of its conditions. On the on the reflectance peculiarities of green leaves, it is an indirect index that testifies about the potential photosynthetic activity of vegetation, and, as a result, it can be used to obtain reliable knowledge about the potential productivity of agrocenoses. The linkage between NDVI and the volumes of absorbed photosynthetically active radiation (PAR) is direct and almost linear [Gamon et al., 1995]. Therefore, a strong connection between NDVI and PAR enables to find out a link between NDVI and yielding capacity, as the latter is directly dependent on the volumes of PAR, which is effectively used by crops [Zhu et al., 2010;Raines, 2011].
The goal of our study was to determine the connection between the values of remotely sensed NDVI and yields of the studied spring row crops in order to provide precise early predictions of their productivity.

MATERIALS AND METHODS
The definition of the connection between NDVI values and yields was performed using quadratic regression analysis under the implementation of Cramer's rule [Gong et al., 2002]. The inputs for the development of equations were true values of spring row crops (maize, sorghum and soybean) and corresponding values of spatial NDVI obtained from Sentinel-2 and Sentinel-1 combined imaginary at main stages of the crop growth, namely: V2 (second trifoliate) and R2 (full bloom) for soybean [McWilliams et al., 1999]; S3 (growing point differentiation) and S6 (half bloom) for sorghum [Roozeboom and Prasad, 2019]; VT (tasselling) and R1 (silking) for maize [Nafziger, 2013]. The true yields of the studied crops were obtained through the entire combine harvesting of the plots located at the experimental field of the Institute of Irrigated Agriculture of NAAS (within the square built by the key points with coordinates: 46°44'36.5"N 32°42'07.0"E; 46°44'39.5"N 32°42'32.0"E; 46°44'33.3"N 32°42'33.7"E; 46°44'30.3"N 32°42'08.5"E) in 2017-2018. The yields were calculated at the standard moisture content in grain (14% for maize, 13.5% for sorghum and 12% for soybean). The coefficient of yields variation (CV) was calculated as a ratio of standard deviation (SD) to mean value [Everitt and Skrondal, 2010]. The yields were linked to the corresponding values of NDVI and the data were processed using Microsoft Excel 365 software at the

RESULTS AND DISCUSSION
While analyzing the NDVI values at two stages of grain corn development a remarkable feature was observed: the values of index at both stages of the crop growth (VT and R1) were similar, resulting in a comparable level of yield forecasting accuracy ( Table 1).
The regression analysis allowed determining the strength of the connection between the NDVI values and maize yields by the rule of thumb, which is very high and positive with a coefficient of correlation R = 0.9906, and coefficient of determination R 2 = 0.9813 [Mukaka, 2012]. Quadratic Eq. (2) describing the linkage between the spatial index values and yields of the crop is: where: y is the yield of maize in t ha -1 , and x is the value of NDVI at VT or R1 stage.
Approximation of the regression model and calculation of MAPE that is less than 10%, proved the high accuracy and reliability of the forecasting grain corn yield by the NDVI values [Moreno et al., 2013].
As for other studied crops, the NDVI values corresponding to the different stages of their growth differed, providing unequal accuracy of yield predictions. Thus, the least accuracy of quadratic regression model for yield forecasting was for sorghum ( Table 2).
The greatest discrepancy between the predicted and true yields was observed at the S3 stage -22.01% on average. This makes the yielding model for this stage just a reasonable forecasting, which should not be implemented as a guidance and cannot be used in precision agriculture [Moreno et al., 2013]. Quadratic regression Eq. for S3 stage looks like (3): where: y is the yield of sorghum in t ha -1 , and x is the value of NDVI at S3 stage. The coefficient of correlation R for this model is 0.8809, R 2 is 0.7760, which still is a high positive correlation according to the rule of thumb [Mukaka, 2012].
The sorghum yield prediction using NDVI at S6 stage has higher accuracy with an average MAPE of 17.62% that is a good forecasting [Moreno et al., 2013]. Quadratic regression Eq. of the model is (4) (4) where: y is the yield of sorghum in t ha -1 , and x is the value of NDVI at S6 stage. The coefficient of correlation for the model R is 0.9298, and R 2 is 0.8645, respectively, testifying about a high positive correlation according to the rule of thumb [Mukaka, 2012].
The regression analysis of the relationship between NDVI and soybean yields showed the highest level of correspondence between these parameters at the V2 stage of the crop, when MAPE of the quadratic model averaged to 3.75% which testifies about the very high accuracy of yield forecast according to Moreno et al. (2013) (Table 3).
The forecasting model could be expressed as Eq. (5): = −0.221 × 2 + 9.220 × − 2.338 (5) where: y is the yield of soybean in t ha -1 , and x is the value of NDVI at V2 stage. Coefficient of correlation R for this model is 0.9914, R 2 is 0.9829, which is a very high positive correlation according to the rule of thumb [Mukaka, 2012].
The quadratic regression model for soybean yields at R2 stage is less accurate with MAPE averaged to 10.16%, however, this value also certifies about the possibility of precise productivity prediction for the crop [Moreno et al., 2013]. The model could be described by the Eq. (6): where: y is the yield of soybean in t ha -1 , and x is the value of NDVI at R2 stage. Coefficient of correlation R for this model is 0.9377, R 2 is 0.8793, which is a very high positive correlation according to the rule of thumb [Mukaka, 2012].
Our results testify that it is possible to predict the crop yields by the NDVI values with a relatively high accuracy, which exceeds 90% for grain corn and soybean, and is just above 80% for sorghum. Lower accuracy of the sorghum yield forecasting was attributed to the higher variation of the input NDVI data used in our study: CV for S3 stage was the highest among the studied crops and reached 0.23, while the tendency towards an increase of the forecasting model performance under lower NDVI fluctuations was observed (the closest prediction was obtained at the lowest CV of NDVI -0.16 at V2 stage of soybean).
Another study on the soybean yield prediction by the NDVI values also claimed that there is a strong non-linear dependence of the crop productivity on NDVI that is proven by the value of adjusted R 2 reaching 0.721 under the implementation of flexible Fourier transform modelling method [Xu and Katchova, 2019]. The results obtained by Bolton and Friedl (2013), where the accuracy of soybean yield forecasting using MODIS NDVI data were very close to the above-mentioned soybean yield prediction, being slightly lesser than in our study (R 2 reached 0.69). Another recent study also found out that the NDVI values have a positive correlation with maize and soybean yields and might be implemented as inputs for the yield prediction [Johnson, 2014]. The NDVI data has also been proven to be efficiently used for large-scale grain corn yield prediction using the regression models based on long-term datasets: the method provided reliable results 6-8 weeks in advance from the harvesting period [Nagy et al., 2018]. The regression analysis of corn yield linked to the NDVI time series values revealed a strong dependence of the crop on the NDVI at the pre-silking period, enabling to predict the possible yield losses due to unfavourable conditions in this period [Wang et al., 2016]. There is a study reporting about a very high accuracy of an empirical pair model "maize yield -NDVI at flowering stage" that provided just 3% discrepancy from the true yields [Fernandez-Ordoñez and Soria-Ruiz, 2017]. The study devoted to the determination of the maize yields depending on the NDVI sensed by unmanned aerial vehicles at different stages of the crop growth showed that the best yield prediction performance was obtained under the implementation of the R2 stage NDVI values as inputs [Maresma et al., 2020], while our study showed that the model performance is good at R1 stage. Some scientists also reported a strong dependence of the 'NDVI -maize yield" prediction model on the plant density of the crop [de Olivera et al., 2019], while this factor was not taken into account in our research. As for  Lykhovyd, 2020]. We consider this approach reasonable in some cases, in particular, when it is difficult to obtain highly reliable prediction using single vegetation index as for sorghum in our study, because the introduction of additional crop indices may significantly improve the modelling performance. Besides, implementation of modern better computation techniques can also be useful for the enhancement of the yielding prediction models [Stas et al., 2016;Tiwari and Shukla, 2020]. However, complicated computations through artificial neural networks (ANN) sometimes do not show performance, which is much better than of regression analysis: the ANN NDVI-based model of sugarcane yield prediction had R 2 of 0.61 that cannot be considered as a highly reliable forecast [Fernandes et al., 2017].

CONCLUSIONS
The statistical analysis of the yields of three spring row crops, namely, grain corn, sorghum and soybean, in the connection to the NDVI values obtained from the Sentinel-2 remote sensing imaginary at critical stages of the studied crops growth proved a high positive correlation between the spatial vegetation index and their productivity. By means of the quadratic regression analysis performed under the implementation of Cramer's rule, it was determined that the NDVI-based models for early yield prediction are suitable for precise yield forecasting at the probability level of 95% (p < 0.05). The values of MAPE for the best prediction models are: 8.75% for grain corn, 17.62% for sorghum, 3.75% for soybean, respectively. Therefore, NDVI should be used as a tool for early yield forecasting both for the scientific and practical needs.
Considering the results of our study and the above-mentioned reports of other scientific groups, it was concluded that notwithstanding the fact of a huge number of studies devoted to the yield simulation by spatial vegetation indices, greater knowledge on the technique of their application in precision agriculture is required to provide the scientifically substantiated recommendations for practitioners.