A robot's deployment environment often involves perceptual changes that differ from what it has experienced during training. Standard practices such as data augmentation attempt to bridge this gap by augmenting source images in an effort to extend the support of the training distribution to better cover what the agent might experience at test time. In many cases, however, it is impossible to know test-time distribution-shift a priori, making these schemes infeasible. In this paper, we introduce a general approach, called Invariance Through Latent Alignment (ILA), that improves the test-time performance of a visuomotor control policy in deployment environments with unknown perceptual variations. ILA performs unsupervised adaptation at deployment-time by matching the distribution of latent features on the target domain to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of challenging adaptation scenarios, including changes in lighting conditions, the content in the scene, and camera poses. We present results on calibrated control benchmarks in simulation -- the distractor control suite -- and a physical robot under a sim-to-real setup.
翻译:机器人的部署环境往往会发生与培训期间不同的感知变化。 标准做法,例如数据增强试图弥合这一差距,方法是增加源图像,以努力扩大培训分布的支持范围,更好地覆盖代理人在测试时间可能经历的情况。 但是,在许多情况下,不可能将测试时间分布的设置事先了解为试验时间分配,使这些计划变得不可行。 在本文中,我们引入了一种一般方法,称为“不易通过晚间对齐”(ILA),它改进了对流机控制政策的测试时间性能,在存在未知概念变异的部署环境中。 ILA通过将目标领域潜在特征的分布与代理人以前的经验相匹配,而不依赖配对数据,在部署时进行不受监督的适应性调整。 尽管我们简单地表明,这一想法导致各种具有挑战性的适应情景的改进,包括照明条件的改变、现场内容和摄像器的构成。 我们介绍了在模拟中校准控制基准 -- 分散器控制套件 -- 和在模拟中物理机器人的模拟装置下进行的结果。