Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on multiple datasets becomes a method of choice towards strong generalization in usual scenes and graceful performance degradation in edge cases. Unfortunately, different datasets often have incompatible labels. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings, manholes etc. Furthermore, many datasets have overlapping labels. For instance, pickups are labeled as trucks in VIPER, cars in Vistas, and vans in ADE20k. We address this challenge by considering labels as unions of universal visual concepts. This allows seamless and principled learning on multi-domain dataset collections without requiring any relabeling effort. Our method achieves competitive within-dataset and cross-dataset generalization, as well as ability to learn visual concepts which are not separately labeled in any of the training datasets. Experiments reveal competitive or state-of-the-art performance on two multi-domain dataset collections and on the WildDash 2 benchmark.
翻译:深层监督模型具有吸收大量培训数据的前所未有的能力。 因此, 多数据集培训成为在通常场景中大力推广和在边缘情况中优雅性能退化的一种选择方法。 不幸的是, 不同的数据集往往有不相容的标签。 例如, 城市景路类子子集所有驾驶表面, Vistas 则为道路标识、 孔洞等分别分类。 此外, 许多数据集都有重叠的标签。 例如, 在VIPER中, 卡车被贴上标签, Vistas 的汽车和 ADE20k 的面包车被贴上标签。 我们通过将标签视为通用视觉概念的结合来应对这一挑战。 这使得能够无缝和有原则地学习多域数据集的收藏,而无需再加标签。 我们的方法可以实现竞争性的内数据集和交叉数据集的概括,以及学习在任何培训数据集中没有单独标签的视觉概念的能力。 实验显示两个多域数据集收藏和WildDash 2 基准的竞争性或最新表现。