When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. In this paper, we propose Bayesian Domain Randomization (BayRn), a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning given sparse data from the real-world target domain. BayRn uses Bayesian optimization to search the space of source domain distribution parameters such that this leads to a policy which maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach in sim-to-sim as well as in sim-to-real experiments, comparing against three baseline methods on two robotic tasks. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.
翻译:当学习机器人控制的政策时,所需要的真实世界数据通常非常昂贵,难以获取,因此模拟学习是一种受欢迎的策略。不幸的是,由于模拟和现实(称为“现实差距”)之间的不匹配,这种策略往往不能转移到真实世界。在培训期间,根据域参数的分布对物理模拟器(源域)进行随机排序,以便获得能够克服现实差距的更稳健的政策,大多数域随机化方法从固定分布中抽取域参数。在模拟到真实可转移的背景下,这种解决方案并不理想,因为由于模拟和现实(称为“现实差距 ” ) 之间的不匹配,这种政策往往不会被转移到真实世界。此外,一个固定随机化方法的假设是事先了解域参数的不确定性。在这个文件中,我们建议Bayesian Domain 随机化(BayRn), 一种黑箱模拟到真实的算法,在学习来自真实世界目标域域的稀疏数据时,可以有效地解决域参数分布问题。BayRn使用一种经过培训的政策, 将真实的比对真实政策进行最优化, 从而在搜索前空间域域域域域的排序中, 显示我们最精确的定位, 最精确的校正的排序, 。