We propose to harness the potential of simulation for the semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any data of target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with the styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on the generalization from GTA and SYNTHIA to Cityscapes, BDDS and Mapillary; and our method achieves superior results over the state-of-the-art techniques. Remarkably, our generalization results are on par with or even better than those obtained by state-of-the-art simulation-to-real domain adaptation methods, which access the target domain data at training time.
翻译:我们建议利用模拟潜力,对现实世界的自我驱动场景进行语义分解,以域内一般化方式进行模拟; 分区网络在没有任何目标域数据的情况下接受培训,并在无形目标域进行测试; 为此,我们建议采用域随机化和金字塔一致性的新方法,以学习具有高度普遍性的模型; 首先,我们提议利用辅助数据集将合成图像与真实图像的风格随机化,以便有效地学习域内差异表征; 第二,我们进一步在不同“螺旋”图像和图像中加强金字塔的一致性,以便分别学习域内差异和规模内差异性特征; 就GTA和SYNTHIA到城市景区、BDDS和Maply的通用进行广泛的实验; 我们的方法在最新技术的视觉外观方面取得了优异效果。 值得注意的是,我们的一般化结果与通过最新模拟到现实域域适应方法获得的目标域数据相同,甚至比这些方法更好。