Generating large-scale synthetic data in simulation is a feasible alternative to collecting/labelling real data for training vision-based deep learning models, albeit the modelling inaccuracies do not generalize to the physical world. In this paper, we present a domain-invariant representation learning (DIRL) algorithm to adapt deep models to the physical environment with a small amount of real data. Existing approaches that only mitigate the covariate shift by aligning the marginal distributions across the domains and assume the conditional distributions to be domain-invariant can lead to ambiguous transfer in real scenarios. We propose to jointly align the marginal (input domains) and the conditional (output labels) distributions to mitigate the covariate and the conditional shift across the domains with adversarial learning, and combine it with a triplet distribution loss to make the conditional distributions disjoint in the shared feature space. Experiments on digit domains yield state-of-the-art performance on challenging benchmarks, while sim-to-real transfer of object recognition for vision-based decluttering with a mobile robot improves from 26.8 % to 91.0 %, resulting in 86.5 % grasping accuracy of a wide variety of objects. Code and supplementary details are available at https://sites.google.com/view/dirl
翻译:模拟中生成大规模合成数据是收集/标签真实数据的一个可行的替代办法,用于为基于愿景的深学习模型收集/标签真实数据,用于培训基于愿景的深层次学习模型,尽管建模不准确并不向物理世界普及。在本文件中,我们提出了一个域-异位代表学习(DIRL)算法,以利用少量真实数据使深层模型适应物理环境。现有的方法只能通过调整各域的边际分布和假设有条件分布为域-异域,来减缓共变变化,而有条件分布在实际情景中会导致模糊的转移。我们提议将边际(投入域)和有条件(产出标签)分布联合对齐,以缓解对立式学习在跨区域之间的共变和有条件变化,并将它与三重分布损失结合起来,使条件分布在共享功能空间中脱钩。对数字域的实验在具有挑战性的基准上产生状态-优异性性性表现,同时将基于视像的物体识别和移动机器人的物体识别从26.8%提高到91.0%和有条件的(产出标签)分布式的分布式目标从86.5%/精确度改进为86.5/可获取的版本。