Ensemble of predictions is known to perform better than individual predictions taken separately. However, for tasks that require heavy computational resources, \textit{e.g.} semantic segmentation, creating an ensemble of learners that needs to be trained separately is hardly tractable. In this work, we propose to leverage the performance boost offered by ensemble methods to enhance the semantic segmentation, while avoiding the traditional heavy training cost of the ensemble. Our self-ensemble framework takes advantage of the multi-scale features set produced by feature pyramid network methods to feed independent decoders, thus creating an ensemble within a single model. Similar to the ensemble, the final prediction is the aggregation of the prediction made by each learner. In contrast to previous works, our model can be trained end-to-end, alleviating the traditional cumbersome multi-stage training of ensembles. Our self-ensemble framework outperforms the current state-of-the-art on the benchmark datasets ADE20K, Pascal Context and COCO-Stuff-10K for semantic segmentation and is competitive on Cityscapes. Code will be available at github.com/WalBouss/SenFormer.
翻译:据知,预测的集合比单独预测的预测效果更好。 但是,对于需要大量计算资源的任务, 语义分割法很难实现。 在这项工作中, 我们提议利用混合方法提供的性能提升来增强语义分割, 同时避免共同体的传统繁重培训成本。 我们的自我组合框架利用了地貌金字塔网络方法为独立解码器提供食物而设定的多尺度功能, 从而在单一模型中创建了共通点。 与共通点相似, 最后预测是每个学习者所作的预测的汇总。 与以往的工程相比, 我们的模式可以接受端到端的训练, 减轻传统的累赘多阶段培训。 我们的自我组合框架超越了当前在ADE20K、 帕斯卡背景和 CO- Buggs- 10 标准数据集上设定的状态, 将可在城市/ 立标码Scoal 10 上竞争 。