Although recent semantic segmentation methods have made remarkable progress, they still rely on large amounts of annotated training data, which are often infeasible to collect in the autonomous driving scenario. Previous works usually tackle this issue with Unsupervised Domain Adaptation (UDA), which entails training a network on synthetic images and applying the model to real ones while minimizing the discrepancy between the two domains. Yet, these techniques do not consider additional information that may be obtained from other tasks. Differently, we propose to exploit self-supervised monocular depth estimation to improve UDA for semantic segmentation. On one hand, we deploy depth to realize a plug-in component which can inject complementary geometric cues into any existing UDA method. We further rely on depth to generate a large and varied set of samples to Self-Train the final model. Our whole proposal allows for achieving state-of-the-art performance (58.8 mIoU) in the GTA5->CS benchmark benchmark. Code is available at https://github.com/CVLAB-Unibo/d4-dbst.
翻译:虽然最近的语义分解方法取得了显著进展,但它们仍然依赖大量附加说明的培训数据,而这些数据在自主驱动情况下往往无法收集。以往的工作通常以无人监督的域适应(UDA)来解决这个问题,这要求对合成图像网络进行培训,并将模型应用于真实领域,同时尽量减少这两个领域之间的差异。然而,这些技术并不考虑从其他任务中可能获得的额外信息。不同的是,我们提议利用自我监督的单眼深度估计来改进语义分解的UDA。一方面,我们部署深度来实现插件,可以给任何现有的UDA方法注入互补的几何指针。我们进一步依靠深度来生成大量和多样的样本,以进行最后模型的自我定位。我们的整个提案允许在GTA5->CS基准基准中实现最先进的性能(58.8 mIoU) 。我们可在https://github.com/CVLAB-Unib/d4-dbst查阅代码。