The iMet 2020 dataset is a valuable resource in the space of fine-grained art attribution recognition, but we believe it has yet to reach its true potential. We document the unique properties of the dataset and observe that many of the attribute labels are noisy, more than is implied by the dataset description. Oftentimes, there are also semantic relationships between the labels (e.g., identical, mutual exclusion, subsumption, overlap with uncertainty) which we believe are underutilized. We propose an approach to cleaning and structuring the iMet 2020 labels, and discuss the implications and value of doing so. Further, we demonstrate the benefits of our proposed approach through several experiments. Our code and cleaned labels are available at https://github.com/sunniesuhyoung/iMet2020cleaned.
翻译:iMet 2020 数据集是微小艺术属性识别空间中的宝贵资源,但我们认为它尚未达到其真正潜力。我们记录了数据集的独特性,并观察到许多属性标签比数据集描述所隐含的更吵闹,比数据集描述所隐含的更吵闹。通常,标签之间也存在语义关系(例如,相同、相互排斥、子虚构、与不确定性重叠),我们认为这些关系没有得到充分利用。我们提出了清理和构建iMet 2020 标签的方法,并讨论了这样做的影响和价值。此外,我们还通过若干实验展示了我们拟议方法的好处。我们的代码和清洁标签可在https://github.com/sunnieuhyoung/iMet2020清洁网站查阅。