It is natural to represent objects in terms of their parts. This has the potential to improve the performance of algorithms for object recognition and segmentation but can also help for downstream tasks like activity recognition. Research on part-based models, however, is hindered by the lack of datasets with per-pixel part annotations. This is partly due to the difficulty and high cost of annotating object parts so it has rarely been done except for humans (where there exists a big literature on part-based models). To help address this problem, we propose PartImageNet, a large, high-quality dataset with part segmentation annotations. It consists of $158$ classes from ImageNet with approximately $24,000$ images. PartImageNet is unique because it offers part-level annotations on a general set of classes including non-rigid, articulated objects, while having an order of magnitude larger size compared to existing part datasets (excluding datasets of humans). It can be utilized for many vision tasks including Object Segmentation, Semantic Part Segmentation, Few-shot Learning and Part Discovery. We conduct comprehensive experiments which study these tasks and set up a set of baselines. The dataset and scripts are released at https://github.com/TACJu/PartImageNet.
翻译:自然而然地代表物体的部位。 这有可能改善物体识别和分解算法的性能, 但也可有助于下游任务, 如活动识别等。 但是, 半成像模型的研究受到半成像部分注释缺乏数据集的阻碍。 这部分是由于注释对象部分的难度和高成本, 因而很少做到, 除了人( 在有非成像模型的大文献的情况下)。 为了帮助解决这一问题, 我们提议了 PartimageNet, 是一个包含分解说明的大型高质量数据集。 它由图像网的158美元类组成, 图像网的图像约为24 000美元。 PartimageNet是独一无二的, 因为它为包括非硬化、 表达的物体在内的一般类集提供了部分的分成说明, 而与现有的部分数据集( 不包括人的数据集) 相比, 其规模更大。 它可以用于许多视觉任务, 包括对象分解、 Semmantic 部分分解、 少量的学习和部分分解。 我们在图像网内进行全面实验, 包括这些任务和脚本。