The goal of this work is to address the recent success of domain randomization and data augmentation for the sim2real setting. We explain this success through the lens of causal inference, positioning domain randomization and data augmentation as interventions on the environment which encourage invariance to irrelevant features. Such interventions include visual perturbations that have no effect on reward and dynamics. This encourages the learning algorithm to be robust to these types of variations and learn to attend to the true causal mechanisms for solving the task. This connection leads to two key findings: (1) perturbations to the environment do not have to be realistic, but merely show variation along dimensions that also vary in the real world, and (2) use of an explicit invariance-inducing objective improves generalization in sim2sim and sim2real transfer settings over just data augmentation or domain randomization alone. We demonstrate the capability of our method by performing zero-shot transfer of a robot arm reach task on a 7DoF Jaco arm learning from pixel observations.
翻译:这项工作的目标是解决最近为模拟环境而实现的域随机化和数据增强的成功。我们通过因果推断、定位域随机化和数据增强作为环境干预的镜头来解释这一成功,这种干预鼓励不切实际的特征。这种干预包括视觉扰动,对奖赏和动态没有影响。这鼓励学习算法对这些类型的变异具有活力,并学会关注解决任务的真正因果机制。这种连接导致两个主要发现:(1) 对环境的扰动不一定是现实的,而只是显示在现实世界中也各不相同的维度的差异,以及(2) 使用明确的不变化诱导目标来改善光是数据增强或域随机化的im2im2真实传输环境。我们展示了我们的方法能力,即进行零射的机器人手臂传输,在7DoF Jaco 手臂上学习像素观测的结果。