We explore semantic segmentation beyond the conventional, single-dataset homogeneous training and bring forward the problem of Heterogeneous Training of Semantic Segmentation (HTSS). HTSS involves simultaneous training on multiple heterogeneous datasets, i.e. datasets with conflicting label spaces and different (weak) annotation types from the perspective of semantic segmentation. The HTSS formulation exposes deep networks to a larger and previously unexplored aggregation of information that can potentially enhance semantic segmentation in three directions: i) performance: increased segmentation metrics on seen datasets, ii) generalization: improved segmentation metrics on unseen datasets, and iii) knowledgeability: increased number of recognizable semantic concepts. To research these benefits of HTSS, we propose a unified framework, that incorporates heterogeneous datasets in a single-network training pipeline following the established FCN standard. Our framework first curates heterogeneous datasets to bring them into a common format and then trains a single-backbone FCN on all of them simultaneously. To achieve this, it transforms weak annotations, which are incompatible with semantic segmentation, to per-pixel labels, and hierarchizes their label spaces into a universal taxonomy. The trained HTSS models demonstrate performance and generalization gains over a wide range of datasets and extend the inference label space entailing hundreds of semantic classes.
翻译:我们探索的是常规的、单一的数据集同质培训之外的语义分解。 我们探索的是超越常规的、单一的数据集同质培训之外的语义分解。 HTSS 的配方将深度网络暴露在更大和以前未曾探索的信息汇总中,从而有可能加强语义分解的三个方向:(一) 性能:在可见数据集上增加分解指标,二) 概括化:改进在隐蔽数据集上的分解指标,以及(三) 可知性:更多可识别的语义概念。为研究HTSS的这些好处,我们提出了一个统一框架,将混杂数据集纳入既定的FCN标准之后的单一网络培训管道中。我们的框架首先将各种数据集成的分类模型,以将其纳入共同格式,然后对全部数据进行单背式的FCN 培训。为了实现这一点,它把经过培训的数百个空格概念分立的分级转换为Slaimalalalalal 。