In this paper, we propose a method for keypoint discovery from a 2D image using image-level supervision. Recent works on unsupervised keypoint discovery reliably discover keypoints of aligned instances. However, when the target instances have high viewpoint or appearance variation, the discovered keypoints do not match the semantic correspondences over different images. Our work aims to discover keypoints even when the target instances have high viewpoint and appearance variation by using image-level supervision. Motivated by the weakly-supervised learning approach, our method exploits image-level supervision to identify discriminative parts and infer the viewpoint of the target instance. To discover diverse parts, we adopt a conditional image generation approach using a pair of images with structural deformation. Finally, we enforce a viewpoint-based equivariance constraint using the keypoints from the image-level supervision to resolve the spatial correlation problem that consistently appears in the images taken from various viewpoints. Our approach achieves state-of-the-art performance for the task of keypoint estimation on the limited supervision scenarios. Furthermore, the discovered keypoints are directly applicable to downstream tasks without requiring any keypoint labels.
翻译:在本文中, 我们提出一种方法, 使用图像级别监督从 2D 图像中发现关键点。 最近关于未监督的关键点发现的工作可靠地发现了匹配实例的关键点。 但是, 当目标事件具有高视角或外观差异时, 发现的关键点与不同图像的语义对应不匹配。 我们的工作旨在发现关键点, 即使目标事件具有高视角和外观差异, 使用图像级别监督。 我们的方法受到受监督不力的学习方法的驱动, 我们的方法利用图像级别监督来确定歧视部分并推断目标实例的观点。 为了发现不同部分, 我们采用有条件的图像生成方法, 使用带有结构变形的一对图像。 最后, 我们使用图像级别监督的关键点实施基于观点的变异性限制, 以解决从不同角度持续出现在图像中的空间相关性问题。 我们的方法在有限监督情景中实现了关键点评估任务的最新表现。 此外, 发现的关键点直接适用于下游任务, 不需要任何关键点标签 。