Reinforcement learning (RL) enables an agent to learn by trial and error while interacting with a dynamic environment. Traditionally, RL is used to learn and predict Euclidean robotic manipulation skills like positions, velocities, and forces. However, in robotics, it is common to have non-Euclidean data like orientation or stiffness, and neglecting their geometric nature can adversely affect learning performance and accuracy. In this paper, we propose a novel framework for RL by using Riemannian geometry, and show how it can be applied to learn manipulation skills with a specific geometric structure (e.g., robot's orientation in the task space). The proposed framework is suitable for any policy representation and is independent of the algorithm choice. Specifically, we propose to apply policy parameterization and learning on the tangent space, then map the learned actions back to the appropriate manifold (e.g., the S3 manifold for orientation). Therefore, we introduce a geometrically grounded pre- and post-processing step into the typical RL pipeline, which opens the door to all algorithms designed for Euclidean space to learn from non-Euclidean data without changes. Experimental results, obtained both in simulation and on a real robot, support our hypothesis that learning on the tangent space is more accurate and converges to a better solution than approximating non-Euclidean data.
翻译:强化学习 (RL) 使代理商能够在与动态环境互动时通过试验和错误学习。 传统上, RL 用于学习和预测欧洲精密机器人操纵技能, 如位置、 速度和力量。 然而, 在机器人中, 通常的做法是拥有非欧洲精密机器人数据, 如定向或僵硬, 忽视其几何性质 会对学习绩效和准确性产生不利影响。 在本文中, 我们通过使用里伊曼尼的几何方法为RL提出一个新的框架, 并展示如何应用它来学习特定几何结构( 例如, 机器人在任务空间的定向) 的操纵技能。 拟议的框架适合任何政策代表, 并且独立于算法选择。 具体地说, 我们提议在相近的空间应用政策参数化和学习, 然后将学到的动作映射回适当的管道( 如, S3 用于定向的 ) 。 因此, 我们向典型的 RL 管道引入一个基于地貌的预和后处理步骤, 它将打开为 Euclideidean 空间设计的非算法的大门, 在不进行精确的模型的模拟中学习数据模拟结果, 。