Knowledge about the locations of keypoints of an object in an image can assist in fine-grained classification and identification tasks, particularly for the case of objects that exhibit large variations in poses that greatly influence their visual appearance, such as wild animals. However, supervised training of a keypoint detection network requires annotating a large image dataset for each animal species, which is a labor-intensive task. To reduce the need for labeled data, we propose to learn simultaneously keypoint heatmaps and pose invariant keypoint representations in a semi-supervised manner using a small set of labeled images along with a larger set of unlabeled images. Keypoint representations are learnt with a semantic keypoint consistency constraint that forces the keypoint detection network to learn similar features for the same keypoint across the dataset. Pose invariance is achieved by making keypoint representations for the image and its augmented copies closer together in feature space. Our semi-supervised approach significantly outperforms previous methods on several benchmarks for human and animal body landmark localization.
翻译:有关图像中对象关键点位置的知识有助于细微分类和识别任务,特别是对于在外形上出现巨大差异、对其视觉外观有重大影响的物体,如野生动物而言。然而,要对关键点检测网络进行监督培训,需要说明每个动物物种的大型图像数据集,这是一项劳动密集型任务。为减少对标签数据的需求,我们提议使用一小套贴标签的图象以及一套较大的未贴标签图像,同时以半监督的方式学习关键点图谱,并以半监督的方式提出无变动关键点表示。关键点表示与一个语义关键点一致性限制一起学习,迫使关键点检测网络为各数据集的同一关键点学习类似特征。通过在地貌空间对图像进行关键点表示及其增缩版更加接近实现差异。我们的半超强方法大大优于先前在人类和动物体体标定地化的若干基准上采用的方法。