Deep learning is used in computer vision problems with important applications in several scientific fields. In ecology for example, there is a growing interest in deep learning for automatizing repetitive analyses on large amounts of images, such as animal species identification. However, there are challenging issues toward the wide adoption of deep learning by the community of ecologists. First, there is a programming barrier as most algorithms are written in Python while most ecologists are versed in R. Second, recent applications of deep learning in ecology have focused on computational aspects and simple tasks without addressing the underlying ecological questions or carrying out the statistical data analysis to answer these questions. Here, we showcase a reproducible R workflow integrating both deep learning and statistical models using predator-prey relationships as a case study. We illustrate deep learning for the identification of animal species on images collected with camera traps, and quantify spatial co-occurrence using multispecies occupancy models. Despite average model classification performances, ecological inference was similar whether we analysed the ground truth dataset or the classified dataset. This result calls for further work on the trade-offs between time and resources allocated to train models with deep learning and our ability to properly address key ecological questions with biodiversity monitoring. We hope that our reproducible workflow will be useful to ecologists and applied statisticians. All material (source of the Rmarkdown notebook and auxiliary files) is available from https://github.com/oliviergimenez/computo-deeplearning-occupany-lynx.
翻译:深层学习用于计算机视觉问题,在一些科学领域应用了重要的应用。例如,在生态学方面,人们越来越有兴趣深入学习,对大量图像进行重复性分析,如动物物种识别等进行自动化分析。然而,在生态学家界广泛采用深层学习方面,存在着一些具有挑战性的问题。首先,由于大多数算法是在Python书写的,而大多数生态学家则在R.中流传,因此,在生态学中深层学习的最近应用侧重于计算方面和简单任务,而没有解决根本的生态问题,也没有进行统计数据分析来回答这些问题。在这里,我们展示了一种将深层学习和统计模型结合起来的可复制性工作流程,同时使用食肉类和易食性关系作为案例研究。我们展示了在通过摄像陷阱收集的图像中识别动物物种的深层学习,并用多层物种占用模型使用空间共振荡模型进行量化。尽管平均的分类表现,但生态推断与我们分析地面真相数据集或分类数据集是相似的。这一结果要求我们进一步研究时间与资源之间的交易和模拟,我们分配用于深层生物统计学数据库,以便学习我们的关键数据库,我们运用所有关键的统计学家将运用。