Supervised keypoint localization methods rely on large manually labeled image datasets, where objects can deform, articulate, or occlude. However, creating such large keypoint labels is time-consuming and costly, and is often error-prone due to inconsistent labeling. Thus, we desire an approach that can learn keypoint localization with fewer yet consistently annotated images. To this end, we present a novel formulation that learns to localize semantically consistent keypoint definitions, even for occluded regions, for varying object categories. We use a few user-labeled 2D images as input examples, which are extended via self-supervision using a larger unlabeled dataset. Unlike unsupervised methods, the few-shot images act as semantic shape constraints for object localization. Furthermore, we introduce 3D geometry-aware constraints to uplift keypoints, achieving more accurate 2D localization. Our general-purpose formulation paves the way for semantically conditioned generative modeling and attains competitive or state-of-the-art accuracy on several datasets, including human faces, eyes, animals, cars, and never-before-seen mouth interior (teeth) localization tasks, not attempted by the previous few-shot methods. Project page: https://xingzhehe.github.io/FewShot3DKP/}{https://xingzhehe.github.io/FewShot3DKP/
翻译:监督的关键点定位方法依赖于大规模手动标注的图像数据集,在这些数据集中,物体可能会变形、运动或遮挡。但是,创建这样大规模的关键点标签耗时耗费,并且由于标注不一致而往往出现错误。因此,我们需要一种能够学习关键点定位的方法,使用少量但标注一致的图像。为此,我们提出了一种新的方法,学习定位语义一致的关键点定义,甚至对于遮挡区域和不同的物体类别也都能适用。我们使用少量的用户标记的2D图像作为输入示例,并通过使用更大的未标记数据集进行自我监督来扩展数据量。与无监督方法不同,少量样本将作为物体定位的语义形状约束。此外,我们引入3D几何感知约束来提高关键点的准确定位。我们的通用公式为语义相关的生成建模铺平了道路,并在多个数据集上达到了竞争性或最先进的准确性,包括人脸、眼睛、动物、汽车和以前未曾尝试过的牙内(牙齿)定位任务。项目页面:{https://xingzhehe.github.io/FewShot3DKP/}