Accurate animal pose estimation is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. Previous works only focus on specific animals while ignoring the diversity of animal species, limiting the generalization ability. In this paper, we propose AP-10K, the first large-scale benchmark for mammal animal pose estimation, to facilitate the research in animal pose estimation. AP-10K consists of 10,015 images collected and filtered from 23 animal families and 54 species following the taxonomic rank and high-quality keypoint annotations labeled and checked manually. Based on AP-10K, we benchmark representative pose estimation models on the following three tracks: (1) supervised learning for animal pose estimation, (2) cross-domain transfer learning from human pose estimation to animal pose estimation, and (3) intra- and inter-family domain generalization for unseen animals. The experimental results provide sound empirical evidence on the superiority of learning from diverse animals species in terms of both accuracy and generalization ability. It opens new directions for facilitating future research in animal pose estimation. AP-10k is publicly available at https://github.com/AlexTheBad/AP10K.
翻译:准确的动物构成估计是了解动物行为的一个必要步骤,并有可能有益于许多下游应用,例如野生动物保护。以前的工作仅侧重于特定动物,而忽视动物物种的多样性,限制了一般化能力。我们在本文件中提议AP-10K,这是哺乳动物的第一个大规模基准,目的是便利动物构成估计的研究。AP-10K由23个动物家庭和54种物种所收集和过滤的10,015幅图像组成,这些图像经过分类等级和高质量关键点说明的手工标记和检查。根据AP-10K,我们以代表为基准,提出了以下三个方面的估计模型:(1) 监督动物姿势估计学习,(2) 从人类姿势估计到动物构成估计的跨部转移学习,(3) 看不见动物在家庭内部和家族间范围进行的一般化。实验结果为从不同动物物种中学习的精度和一般化能力提供了可靠的经验证据。它为今后动物构成估计的研究提供了新的方向。 AP-10k公布于https://github.com/Alex-Bad/AP10K。