Understanding objects in terms of their individual parts is important, because it enables a precise understanding of the objects' geometrical structure, and enhances object recognition when the object is seen in a novel pose or under partial occlusion. However, the manual annotation of parts in large scale datasets is time consuming and expensive. In this paper, we aim at discovering object parts in an unsupervised manner, i.e., without ground-truth part or keypoint annotations. Our approach builds on the intuition that objects of the same class in a similar pose should have their parts aligned at similar spatial locations. We exploit the property that neural network features are largely invariant to nuisance variables and the main remaining source of variations between images of the same object category is the object pose. Specifically, given a training image, we find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps. The average of the aligned feature maps serves as pseudo ground-truth annotation for a supervised training of the deep network backbone. During inference, part detection is simple and fast, without any extra modules or overheads other than a feed-forward neural network. Our experiments on several datasets from different domains verify the effectiveness of the proposed method. For example, we achieve 37.8 mAP on VehiclePart, which is at least 4.2 better than previous methods.
翻译:了解物体的个别部分很重要, 因为它能够准确理解物体的几何结构, 并且当物体在新的外观或部分外观中出现时, 提高物体的识别度。 然而, 大规模数据集中部件的人工注释耗费时间且费用昂贵。 在本文中, 我们的目标是以不受监督的方式发现物体的部件, 即没有地面真实部分或关键点说明。 我们的方法建立在直觉上, 即同一类物体在类似外观上的物体的部位应该与类似空间位置的部位保持一致。 我们利用神经网络功能基本上不易产生干扰变量的属性, 以及同一物体类别图像之间变化的主要剩余来源是对象。 具体地说, 我们从一个培训图像中找到一组类似的图像, 显示同一物体类别中发生的情况, 即没有地面真实部分或关键点说明。 校正地图的平均值, 用于对深度网络主干线进行监管训练的假地面- 。 试想期间, 部分的测试方法是在前方空间中进行简单且不采用不同的方式, 。 在前方域域中, 任何前方的实验中, 我们的轨道上的任何实验是快速的实验, 。 。 。