We propose the first metric learning system for the recognition of great ape behavioural actions. Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild and demonstrates that the utilisation of an explicit DensePose-C chimpanzee body part segmentation stream effectively complements traditional RGB appearance and optical flow streams. We evaluate system variants with different feature fusion techniques and long-tail recognition approaches. Results and ablations show performance improvements of ~12% in top-1 accuracy over previous results achieved on the PanAf-500 dataset containing 180,000 manually annotated frames across nine behavioural actions. Furthermore, we provide a qualitative analysis of our findings and augment the metric learning system with long-tail recognition techniques showing that average per class accuracy -- critical in the domain -- can be improved by ~23% compared to the literature on that dataset. Finally, since our embedding spaces are constructed as metric, we provide first data-driven visualisations of the great ape behavioural action spaces revealing emerging geometry and topology. We hope that the work sparks further interest in this vital application area of computer vision for the benefit of endangered great apes.
翻译:我们提出了第一个识别巨猿行为的衡量学习系统。 我们提议的三流嵌入结构在野外直接拍摄的摄像陷阱视频上工作, 并表明对显性DensePose- C 黑猩猩身体部分流的利用有效地补充了传统的 RGB 外观和光学流流。 我们用不同的特性融合技术和长尾识别方法来评价系统变体。 结果和推算显示,PanAf-500数据集中包含有180 000个人工附加说明框架的18万个手动数据集成的性能改进了12%。 此外, 我们提供了对调查结果的质量分析,并以长尾识别技术来增强衡量系统,显示与该数据集的文献相比,每类平均精度 -- -- -- 关键在域内 -- -- 可以通过~23%的方法加以改进。 最后, 由于我们的嵌入空间是作为衡量的,我们首次提供了以数据驱动为动力的、揭示正在形成的几何和地形的巨猿行为空间的视觉。 我们希望, 工作能够进一步激发对计算机视野这一重要应用领域的兴趣, 从而带来濒危险大猩猩的利益。