Existing unsupervised methods for keypoint learning rely heavily on the assumption that a specific keypoint type (e.g. elbow, digit, abstract geometric shape) appears only once in an image. This greatly limits their applicability, as each instance must be isolated before applying the method-an issue that is never discussed or evaluated. We thus propose a novel method to learn Task-agnostic, UnSupervised Keypoints (TUSK) which can deal with multiple instances. To achieve this, instead of the commonly-used strategy of detecting multiple heatmaps, each dedicated to a specific keypoint type, we use a single heatmap for detection, and enable unsupervised learning of keypoint types through clustering. Specifically, we encode semantics into the keypoints by teaching them to reconstruct images from a sparse set of keypoints and their descriptors, where the descriptors are forced to form distinct clusters in feature space around learned prototypes. This makes our approach amenable to a wider range of tasks than any previous unsupervised keypoint method: we show experiments on multiple-instance detection and classification, object discovery, and landmark detection-all unsupervised-with performance on par with the state of the art, while also being able to deal with multiple instances.
翻译:关键点学习的现有未经监督的方法在很大程度上依赖于一种假设,即特定关键点类型(例如肘、数字、抽象几何形状)在图像中只出现一次。这极大地限制了其适用性,因为每个实例在应用从未讨论或评估的方法问题之前必须孤立,因此我们提出了一个创新的方法来学习能够处理多个实例的无监督的无监督关键点(TUSK) 。为了实现这一点,我们使用一个单一的热映射仪来探测每个特定关键点类型,并允许通过集群在不受监督的情况下学习关键点类型。具体地说,我们通过教它们从一小串关键点及其标语重塑图像的方法,让描述器被迫在学习的原型周围的特征空间形成不同的集群。为了实现这一点,我们的方法比以往任何未经监督的关键点类型都更适合更广泛的任务范围:我们用一个单一的热映射仪进行检测,通过集群集进行不受监督地学习关键点类型。具体地将语义输入关键点类型。我们用多个关键点的实验,同时进行多级的辨识和标志性测试,同时进行多级的测试,同时进行多级测试。