Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors' knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25\% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system.
翻译:模拟中的学习政策在培训机器人控制器时会减少人类的努力。 对于更适应性更强、更安全、更难准确模型和控制的软机器人来说,这尤其具有希望。 模拟差距是成功将政策从模拟转化为真正的机器人的主要障碍。 可以应用系统识别来缩小这一差距,但传统的识别方法需要大量手工调整。 由数据驱动的替代方法可以直接从数据中调和动态模型,但数据往往很饥饿,这也包含人类收集数据的努力。 这项工作提议了一个数据驱动的、 端到端的不同模拟器, 重点是令人兴奋但具有挑战性的紧张性机器人领域。 根据作者所知,这是将政策从模拟成功转换到真正的机器人的主要障碍。 系统可以合理简化、 数据驱动的模拟器可以直接从数据中学习大约的动态。 动态必须足够精确到生成政策可转换到地面系统。 作为0bar 方向的第一个步骤, 当前的工作将可区分的物理定位的物理定位数据传输到地面, 需要的物理定位系统需要的精确性, 需要的精确度系统需要的精确度 。 用于2- 方向的精确度的系统, 用于2- 方向的精确度的系统, 正在操作的实地学习的系统需要的实地, 需要的精确度, 。