Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world. Current approaches to tackle this problem, such as domain randomization, require prior knowledge and engineering to determine how much to randomize system parameters in order to learn a policy that is robust to sim-to-real transfer while also not being too conservative. We propose a method for automatically tuning simulator system parameters to match the real world using only raw RGB images of the real world without the need to define rewards or estimate state. Our key insight is to reframe the auto-tuning of parameters as a search problem where we iteratively shift the simulation system parameters to approach the real-world system parameters. We propose a Search Param Model (SPM) that, given a sequence of observations and actions and a set of system parameters, predicts whether the given parameters are higher or lower than the true parameters used to generate the observations. We evaluate our method on multiple robotic control tasks in both sim-to-sim and sim-to-real transfer, demonstrating significant improvement over naive domain randomization. Project videos and code at https://yuqingd.github.io/autotuned-sim2real/
翻译:由于模拟模拟器无法准确捕捉真实世界的动态和视觉特性的“真实差距”,模拟器模拟器无法准确捕捉真实世界的动态和视觉特性,模拟器所培训的政策在转移到真实世界时往往会失败。目前解决这一问题的方法,例如域随机化,需要事先掌握知识和工程,以确定系统参数随机化的程度,以便学习一种在不过分保守的情况下能够模拟到真实传输的强力政策。我们提出了一个自动调整模拟系统参数的方法,使之与真实世界相匹配,只使用真实世界的原始 RGB 图像,而无需界定奖赏或估计状态。我们的关键洞察力是重新定义参数的自动调控,将其作为一个搜索问题,我们反复将模拟系统参数转换到现实世界系统参数的接近。我们建议了一个搜索参数模型,根据一系列的观察和行动以及一套系统参数,预测给定参数是否高于或低于生成观测所用的真实参数。我们评估了在模拟到和模拟到现实的转移中多机器人控制任务的方法。我们的主要洞察力是,在正向现实2号/正态上展示了在天体域上的重大改进。