The ongoing biodiversity crysis calls for accurate estimation of animal density and abundance to identify, for example, sources of biodiversity decline and effectiveness of conservation interventions. Camera traps together with abundance estimation methods are often employed for this purpose. The necessary distances between camera and observed animal are traditionally derived in a laborious, fully manual or semi-automatic process. Both approaches require reference image material, which is both difficult to acquire and not available for existing datasets. In this study, we propose a fully automatic approach to estimate camera-to-animal distances, based on monocular depth estimation (MDE), and without the need of reference image material. We leverage state-of-the-art relative MDE and a novel alignment procedure to estimate metric distances. We evaluate the approach on a zoo scenario dataset unseen during training. We achieve a mean absolute distance estimation error of only 0.9864 meters at a precision of 90.3% and recall of 63.8%, while completely eliminating the previously required manual effort for biodiversity researchers. The code will be made available.
翻译:目前的生物多样性密码要求准确估计动物的密度和丰度,以便例如确定生物多样性的下降和养护干预措施的有效性的来源。为此目的,经常使用照相机陷阱和丰度估计方法。摄像头和被观察动物之间的必要距离传统上是用一个艰苦、完全人工或半自动的过程得出的。两种方法都需要参考图像材料,既难以获得,也不具备现有数据集。在这项研究中,我们提议采用完全自动的方法,根据单眼深度估计(MDE)来估计摄影机到动物的距离,而不需要参考图像材料。我们利用最先进的相对MDE和新的调整程序来估计距离。我们评估在训练期间看不见的动物园景象数据集的方法。我们只实现90.3%的绝对距离估计误差0.9864米,回顾63.8%,同时完全消除以前为生物多样性研究人员所需的人工工作。代码将予公布。