Existing works on 2D pose estimation mainly focus on a certain category, e.g. human, animal, and vehicle. However, there are lots of application scenarios that require detecting the poses/keypoints of the unseen class of objects. In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. To achieve this goal, we formulate the pose estimation problem as a keypoint matching problem and design a novel CAPE framework, termed POse Matching Network (POMNet). A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms. Experiments show that our method outperforms other baseline approaches by a large margin. Codes and data are available at https://github.com/luminxu/Pose-for-Everything.
翻译:2D 现有作品的估算主要侧重于某个类别,例如人类、动物和飞行器。然而,有许多应用设想方案需要检测不可见物体类别的构成/关键点。在本文件中,我们介绍了CAPE(CAPE)的任务,其目的是创建一个能够检测任何一类物体的构成的构成的构成估计模型,仅给几个具有关键点定义的样本。为了实现这一目标,我们将构成的估计问题作为一个关键点匹配问题,并设计了一个称为POSe匹配网络(POMNet)的新的CAPE框架。提议了一个基于变压器的基点互动模块(KIM),以捕捉不同关键点之间的相互作用以及支持和查询图像之间的关系。我们还引入了多类Pose(MP-100)数据集,这是一个由100个对象类别组成的数据集,包含20K以上实例,并且为开发CAPE算法设计得非常完善。实验显示,我们的方法以大边缘比其他基线方法要优。在 httpss/GIOS/FRANSU/FRANSUForums。