As learning-based approaches progress towards automating robot controllers design, transferring learned policies to new domains with different dynamics (e.g. sim-to-real transfer) still demands manual effort. This paper introduces SimGAN, a framework to tackle domain adaptation by identifying a hybrid physics simulator to match the simulated trajectories to the ones from the target domain, using a learned discriminative loss to address the limitations associated with manual loss design. Our hybrid simulator combines neural networks and traditional physics simulaton to balance expressiveness and generalizability, and alleviates the need for a carefully selected parameter set in System ID. Once the hybrid simulator is identified via adversarial reinforcement learning, it can be used to refine policies for the target domain, without the need to collect more data. We show that our approach outperforms multiple strong baselines on six robotic locomotion tasks for domain adaptation.
翻译:随着以学习为基础的方法在机器人控制器设计自动化方面取得进展,将学到的政策转移到具有不同动态的新领域(例如模拟到实际转移)仍然需要人工工作。本文件介绍了SimGAN,这是一个通过确定混合物理模拟器,将模拟轨道与目标领域模拟轨迹相匹配,利用学到的歧视性损失来解决与人工损失设计有关的限制的框架。我们的混合模拟器将神经网络和传统的物理学模拟器结合起来,以平衡表情性和通用性,并减轻系统ID中精心选择的参数的需要。一旦通过对抗性强化学习确定混合模拟器,就可以用来改进目标领域的政策,而无需收集更多的数据。我们表明,我们的方法超越了用于领域适应的六项机器人移动任务的多重强基线。