Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop and deploy deep learning models that require zero additional human labels and model training. SuperAnimal allows video inference on over 45 species with only two global classes of animal pose models. If the models need fine-tuning, we show SuperAnimal models are 10$\times$ more data efficient and outperform prior transfer learning approaches. Moreover, we provide a new video-adaptation method to perform unsupervised refinement of videos, and we illustrate the utility of our model in behavioral classification. Collectively, this presents a data-efficient, plug-and-play solution for behavioral analysis.
翻译:行为量化对神经科学、兽医学和动物保护努力等应用至关重要。行为分析的一个关键步骤是首先提取动物的相关关键点,即姿态估计。然而,可靠的关键点推断目前需要领域知识和手动标记来构建监督模型。我们提出了一系列技术创新,使一种名为SuperAnimal的新方法能够开发和部署深度学习模型,而无需额外的人工标签和模型训练。SuperAnimal可以对45多种物种进行视频推断,仅使用两个全球动物关键点模型。如果需要微调模型,我们展示了SuperAnimal模型具有更高的数据效率,并且优于之前的迁移学习方法。此外,我们提供了一种新的视频适应方法来执行无监督的视频细化,并且我们展示了我们的模型在行为分类中的实用性。总的来说,这提供了一种数据效率高、可插拔的行为分析解决方案。