向大型公民科学数据集推广多物种多物种占用模型 (Scaling multi-species occupancy models to large citizen science datasets)

Citizen science datasets can be very large and promise to improve species distribution modelling, but detection is imperfect, risking bias when fitting models. In particular, observers may not detect species that are actually present. Occupancy models can estimate and correct for this observation process, and multi-species occupancy models exploit similarities in the observation process, which can improve estimates for rare species. However, the computational methods currently used to fit these models do not scale to large datasets. We develop approximate Bayesian inference methods and use graphics processing units (GPUs) to scale multi-species occupancy models to very large citizen science data. We fit multi-species occupancy models to one month of data from the eBird project consisting of 186,811 checklist records comprising 430 bird species. We evaluate the predictions on a spatially separated test set of 59,338 records, comparing two different inference methods -- Markov chain Monte Carlo (MCMC) and variational inference (VI) -- to occupancy models fitted to each species separately using maximum likelihood. We fitted models to the entire dataset using VI, and up to 32,000 records with MCMC. VI fitted to the entire dataset performed best, outperforming single-species models on both AUC (90.4% compared to 88.7%) and on log likelihood (-0.080 compared to -0.085). We also evaluate how well range maps predicted by the model agree with expert maps. We find that modelling the detection process greatly improves agreement and that the resulting maps agree as closely with expert maps as ones estimated using high quality survey data. Our results demonstrate that multi-species occupancy models are a compelling approach to model large citizen science datasets, and that, once the observation process is taken into account, they can model species distributions accurately.

翻译：公民科学数据集可能非常庞大,而且有望改进物种分布模型,但检测方法不完善,在设计模型时可能存在偏差。特别是,观察者可能无法探测实际存在的物种。观察模型可以估计和纠正这一观察过程,多物种占用模型利用观察过程中的相似之处,这可以改善稀有物种的估计数。然而,目前用于适应这些模型的计算方法并不与大型数据集相适应。我们开发了近似贝叶氏推断方法,并使用图形处理器(GPUs)将多物种占用模型(GPUs)与非常大的公民科学数据数据数据相匹配。我们将多物种占用模型模型模型与最接近的模型(我们发现模型和变异性模型(VI)与最接近。我们把整个模型安装到全套数据模型使用VI,并更新到EBird 包括430个鸟类记录。我们对空间分离的测试数据集的预测值为59,338记录,一旦将两种不同的推算方法 -- Mark链 Monte Carlo(MC) 模型和变价计算方法(我们发现模型和变价计算方法(VI) -- 使用最接近的模型可以使用最接近的模型,我们最接近的模型。我们最接近于每个物种的模型的模型的模型。我们把整个的模型的模型的模型装的模型装模型装模型与比重的模型比重的模型比重的模型比重的模型比重的模型比重的模型比重数据记录。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日