This report describes the winning solution to the Robust Vision Challenge (RVC) semantic segmentation track at ECCV 2022. Our method adopts the FAN-B-Hybrid model as the encoder and uses SegFormer as the segmentation framework. The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy. All the original labels are projected to a 256-class unified label space, and the model is trained using a cross-entropy loss. Without significant hyperparameter tuning or any specific loss weighting, our solution ranks the first place on all the testing semantic segmentation benchmarks from multiple domains (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, and WildDash 2). The proposed method can serve as a strong baseline for the multi-domain segmentation task and benefit future works. Code will be available at https://github.com/lambert-x/RVC_Segmentation.
翻译:本报告描述了ECCV 2022 的强视挑战语义分解轨(RVC) 语义分解轨的胜利解决方案。 我们的方法是将FAN-B-Hybrid模型用作编码器,并将SegFormer用作分解框架。模型在由来自9个数据集(ADE20K、Cityscaps、Maply Vistas、ScanNet、VIPER、WardDash 2、IDD、BDD和COCO)的图像组成的综合数据集(包括来自9个数据集(ADE20K、Cityscover、Mapillary Vistasas、ScranNet、VID、和WirdDash2)的图像上接受培训。所有原标签都投向256级的统一标签空间,而该模型则使用交叉的湿度损失来培训。不进行重大的超参数调整或任何具体的损重计,我们的解决方案在多个域(ADEDE20K、城市景、Maply Vistas、Scly Vistas、Screbut/Rimment)的所有测试语分解基准上排第一位。 将在 http/Rget/Rget/Rgetmentation.