Today's most popular approaches to keypoint detection involve very complex network architectures that aim to learn holistic representations of all keypoints. In this work, we take a step back and ask: Can we simply learn a local keypoint representation from the output of a standard backbone architecture? This will help make the network simpler and more robust, particularly if large parts of the object are occluded. We demonstrate that this is possible by looking at the problem from the perspective of representation learning. Specifically, the keypoint kernels need to be chosen to optimize three types of distances in the feature space: Features of the same keypoint should be similar to each other, while differing from those of other keypoints, and also being distinct from features from the background clutter. We formulate this optimization process within a framework, which we call CoKe, which includes supervised contrastive learning. CoKe needs to make several approximations to enable representation learning process on large datasets. In particular, we introduce a clutter bank to approximate non-keypoint features, and a momentum update to compute the keypoint representation while training the feature extractor. Our experiments show that CoKe achieves state-of-the-art results compared to approaches that jointly represent all keypoints holistically (Stacked Hourglass Networks, MSS-Net) as well as to approaches that are supervised by detailed 3D object geometry (StarMap). Moreover, CoKe is robust and performs exceptionally well when objects are partially occluded and significantly outperforms related work on a range of diverse datasets (PASCAL3D+, MPII, ObjectNet3D).
翻译:今天最受欢迎的关键点探测方法涉及非常复杂的网络结构,目的是学习所有关键点的整体表达方式。 在这项工作中,我们退后一步询问:我们能否简单地从标准主干架构的输出中学习一个本地关键点代表? 这将有助于使网络更简单、更强, 特别是当该对象的大部分部分被隐蔽时。 我们从演示学习的角度来观察问题, 证明这是可能的。 具体地说, 关键点内核需要选择来优化空间特征中三种类型的距离: 同一关键点的特性应该彼此相似, 而与其他关键点的特性不同, 并且与其他关键主干线结构的特性不同。 我们在一个框架内设计这个优化进程, 我们称之为 CoKe, 包括监督对比学习。 COKe需要做几个近似点, 以便能够在大型数据集上进行演示。 特别是, 我们引入一个精密的库来估计非关键点的特性, 并更新关键点的表达方式, 在对地平面数据提取器中, 我们的实验显示Ceural- droad- dreal 。