Machine learning (ML) approaches are used more and more widely in biodiversity monitoring. In particular, an important application is the problem of predicting biodiversity indicators such as species abundance, species occurrence or species richness, based on predictor sets containing, e.g., climatic and anthropogenic factors. Considering the impressive number of different ML methods available in the litterature and the pace at which they are being published, it is crucial to develop uniform evaluation procedures, to allow the production of sound and fair empirical studies. However, defining fair evaluation procedures is challenging: because well-documented, intrinsic properties of biodiversity indicators such as their zero-inflation and over-dispersion, it is not trivial to design good sampling schemes for cross-validation nor good evaluation metrics. Indeed, the classical Mean Squared Error (MSE) fails to capture subtle differences in the performance of different methods, particularly in terms of prediction of very small, or very large values (e.g., zero counts or large counts). In this report, we illustrate this phenomenon by comparing ten statistical and machine learning models on the task of predicting waterbirds abundance in the North-African area, based on geographical, meteorological and spatio-temporal factors. Our results highlight that differnte off-the-shelf evaluation metrics and cross-validation sampling approaches yield drastically different rankings of the metrics, and fail to capture interpretable conclusions.
翻译:在生物多样性监测中,越来越广泛地使用机器学习方法(ML),特别是,一个重要的应用问题是根据含有气候和人为因素的预测数据集预测物种丰度、物种发生情况或物种丰富程度等生物多样性指标的问题。考虑到在垃圾方面现有的不同ML方法数量之多及其出版速度之快,必须制定统一的评估程序,以便能够产生合理和公平的实证研究。然而,界定公平的评估程序具有挑战性:因为生物多样性指标,例如其零膨胀和过度分散等,有详细记载的内在特性,因此,根据不同方法的地理、气象和标准解释,设计用于交叉校验或良好评估指标的良好抽样计划并非无关紧要。事实上,典型的中位错误(MSE)未能捕捉到不同方法绩效的细微差异,特别是在预测非常小或非常大(例如零计或大量计)的捕获值方面。在本报告中,我们通过比较十种统计和机器学习模型模型来说明这一现象,以预测北非地区水鸟丰度的任务,其结果是零膨胀和超分散的,根据不同地理、气象和标准推理算结果的深度推算结果,以不同的结果来说明。