The manual processing and analysis of videos from camera traps is time-consuming and includes several steps, ranging from the filtering of falsely triggered footage to identifying and re-identifying individuals. In this study, we developed a pipeline to automatically analyze videos from camera traps to identify individuals without requiring manual interaction. This pipeline applies to animal species with uniquely identifiable fur patterns and solitary behavior, such as leopards (Panthera pardus). We assumed that the same individual was seen throughout one triggered video sequence. With this assumption, multiple images could be assigned to an individual for the initial database filling without pre-labeling. The pipeline was based on well-established components from computer vision and deep learning, particularly convolutional neural networks (CNNs) and scale-invariant feature transform (SIFT) features. We augmented this basis by implementing additional components to substitute otherwise required human interactions. Based on the similarity between frames from the video material, clusters were formed that represented individuals bypassing the open set problem of the unknown total population. The pipeline was tested on a dataset of leopard videos collected by the Pan African Programme: The Cultured Chimpanzee (PanAf) and achieved a success rate of over 83% for correct matches between previously unknown individuals. The proposed pipeline can become a valuable tool for future conservation projects based on camera trap data, reducing the work of manual analysis for individual identification, when labeled data is unavailable.
翻译:手动处理和分析来自观察相机的视频是费时的,包括从过滤误触发的镜头到识别和重新识别个体的多个步骤。在本研究中,我们开发了一个流程来自动分析来自观察相机的视频,以识别个体而无需手动交互。这个流程适用于毛皮花纹唯一可识别且独居行为的动物物种,例如豹(Panthera pardus)。我们假设同一物种在一个触发的视频序列中被观察到。在这种假设下,多个图像可以分配给某个个体进行初始数据库填充,而无需预先标记。流程基于计算机视觉和深度学习中已经成熟的组件,尤其是卷积神经网络(CNN)和尺度不变特征变换(SIFT)特征。我们还实现了额外的组件以替代否则需要人类交互的情况。基于视频素材帧之间的相似性,形成代表个体的簇,绕开了未知总体群的开放集问题。流程在由泛非计划组织收集的豹视频数据集上进行了测试,并实现了超过83%的正确匹配未知个体。所提出的流程可以成为基于相机陷阱数据的未来保护项目的有价值的工具,在标记数据不可用的情况下减少手动分析个体识别的工作量。