Computer Vision is driven by the many datasets which can be used for training or evaluating novel methods. However, each dataset has different set of class labels, visual definition of classes, images following a specific distribution, annotation protocols, etc. In this paper we explore the automatic discovery of visual-semantic relations between labels across datasets. We want to understand how the instances of a certain class in a dataset relate to the instances of another class in another dataset. Are they in an identity, parent/child, overlap relation? Or is there no link between them at all? To find relations between labels across datasets, we propose methods based on language, on vision, and on a combination of both. Our methods can effectively discover label relations across datasets and the type of the relations. We use these results for a deeper inspection on why instances relate, find missing aspects of a class, and use our relations to create finer-grained annotations. We conclude that label relations cannot be established by looking at the names of classes alone, as they depend strongly on how each of the datasets was constructed.
翻译:计算机视野是由许多数据集驱动的,这些数据集可用于培训或评价新方法。然而,每个数据集有不同的类标签、分类的视觉定义、根据特定分布、批注协议等的图像等。在本文件中,我们探讨了在跨数据集的标签之间自动发现视觉-语系关系的问题。我们想了解数据集中某一类的例子与另一个数据集中另一类的例子有何关系。它们是身份、父/子、重叠关系?还是它们之间没有任何联系?要找到跨数据集的标签之间的关系,我们根据语言、视觉和两者的组合提出方法。我们的方法可以有效地发现跨数据集和关系类型之间的标签关系。我们利用这些结果来更深入地检查有关情况的原因,发现某一类缺失的方面,并利用我们的关系来创建细微的注释。我们的结论是,标签关系不能通过仅仅查看分类的名称来建立,因为它们在很大程度上取决于每个数据集是如何构建的。