Soft robots are becoming extremely popular thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies with: i) a higher robustness w.r.t. environmental changes; ii) a higher affordability of learned policies when the target model differs significantly from the training model; iii) a higher effectiveness of the policy, which can even autonomously learn to exploit the environment to increase the robot capabilities (environmental constraints exploitation). Moreover, we introduce a novel algorithmic extension of previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide results on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.
翻译:软体机器人由于其内在的接触和适应安全而变得极为受欢迎。然而,自由度的潜在无限数量使得其模型化是一项艰巨的任务,在许多情况下,只有大致的描述。由于模型和真实平台之间存在巨大的领域差距,这一挑战使得基于强化学习(RL)的方法在现实的情景下部署时效率低下。在这项工作中,我们首次展示了多域随机化(DR)如何通过增强RL政策解决这一问题,其方法是:i) 更高强度的环境变化;ii) 当目标模型与培训模型大不相同时,学习的政策更负担得起;iii) 政策的效力更高,甚至可以自主地学习利用环境来增加机器人的能力(环境限制开发)。此外,我们引入了一种新颖的适应性域随机化方法,用于自动推断变形物体的动态参数。我们提供了关于四种不同任务和两种软机器人设计的结果,为未来对封闭式软机器人控制加强学习的研究开辟了有趣的视角。</s>