基于模型的前瞻强化学习用于手内操作 (Model-Based Lookahead Reinforcement Learning for in-hand manipulation)

In-Hand Manipulation, as many other dexterous tasks, remains a difficult challenge in robotics by combining complex dynamic systems with the capability to control and manoeuvre various objects using its actuators. This work presents the application of a previously developed hybrid Reinforcement Learning (RL) Framework to In-Hand Manipulation task, verifying that it is capable of improving the performance of the task. The model combines concepts of both Model-Free and Model-Based Reinforcement Learning, by guiding a trained policy with the help of a dynamic model and value-function through trajectory evaluation, as done in Model Predictive Control. This work evaluates the performance of the model by comparing it with the policy that will be guided. To fully explore this, various tests are performed using both fully-actuated and under-actuated simulated robotic hands to manipulate different objects for a given task. The performance of the model will also be tested for generalization tests, by changing the properties of the objects in which both the policy and dynamic model were trained, such as density and size, and additionally by guiding a trained policy in a certain object to perform the same task in a different one. The results of this work show that, given a policy with high average reward and an accurate dynamic model, the hybrid framework improves the performance of in-hand manipulation tasks for most test cases, even when the object properties are changed. However, this improvement comes at the expense of increasing the computational cost, due to the complexity of trajectory evaluation.

翻译：手内操作与许多其他灵巧任务一样，在机器人学中仍是一个艰巨的挑战，因为它将复杂的动态系统与利用执行器控制和操纵各类物体的能力相结合。本研究将先前开发的混合强化学习框架应用于手内操作任务，验证了该框架能够提升任务性能。该模型结合了无模型与基于模型的强化学习概念，通过轨迹评估（如模型预测控制中所采用的方法），借助动态模型和价值函数来引导训练好的策略。本研究通过将该模型与待引导的策略进行比较来评估其性能。为全面探究，我们使用全驱动和欠驱动模拟机械手对给定任务中的不同物体进行操作，进行了多项测试。此外，通过改变策略和动态模型训练时所用物体的属性（如密度和尺寸），以及引导针对特定物体训练的策略在不同物体上执行相同任务，对模型的泛化能力进行了测试。结果表明，在给定高平均奖励策略和精确动态模型的前提下，该混合框架在大多数测试案例中提升了手内操作任务的性能，即使物体属性发生变化时亦然。然而，由于轨迹评估的复杂性，这种性能提升是以增加计算成本为代价的。