Image-based learning methods for autonomous vehicle perception tasks require large quantities of labelled, real data in order to properly train without overfitting, which can often be incredibly costly. While leveraging the power of simulated data can potentially aid in mitigating these costs, networks trained in the simulation domain usually fail to perform adequately when applied to images in the real domain. Recent advances in domain adaptation have indicated that a shared latent space assumption can help to bridge the gap between the simulation and real domains, allowing the transference of the predictive capabilities of a network from the simulation domain to the real domain. We demonstrate that a twin VAE-based architecture with a shared latent space and auxiliary decoders is able to bridge the sim2real gap without requiring any paired, ground-truth data in the real domain. Using only paired, ground-truth data in the simulation domain, this architecture has the potential to generate perception tasks such as depth and segmentation maps. We compare this method to networks trained in a supervised manner to indicate the merit of these results.
翻译:自动车辆感知任务所需的基于图像的学习方法需要大量贴标签的、真实的数据,以便适当培训而不过分,这往往成本极高。虽然利用模拟数据的力量可能有助于降低这些费用,但模拟领域培训的网络在应用到真实领域的图像时通常无法充分发挥作用。最近领域适应方面的进展表明,共享的潜在空间假设有助于弥合模拟与真实域之间的差距,从而能够将网络的预测能力从模拟域转移到真实域。我们证明,基于VAE的双对面结构,具有共同的潜在空间和辅助解密器,能够弥合模拟领域存在的双面差距,而不需要任何对齐的地面真相数据。在模拟领域仅使用对齐的地面真相数据,这一结构有可能产生诸如深度和分层地图等感知任务。我们将这一方法与以监督方式培训的网络进行比较,以显示这些结果的优点。