走向以强化学习方式进行人的双人操作的双人不相相相相相相相相相相相相调动 (Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning)

Achieving human-level dexterity is an important open problem in robotics. However, tasks of dexterous hand manipulation, even at the baby level, are challenging to solve through reinforcement learning (RL). The difficulty lies in the high degrees of freedom and the required cooperation among heterogeneous agents (e.g., joints of fingers). In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects. Specifically, tasks in Bi-DexHands are designed to match different levels of human motor skills according to cognitive science literature. We built Bi-DexHands in the Issac Gym; this enables highly efficient RL training, reaching 30,000+ FPS by only one single NVIDIA RTX 3090. We provide a comprehensive benchmark for popular RL algorithms under different settings; this includes Single-agent/Multi-agent RL, Offline RL, Multi-task RL, and Meta RL. Our results show that the PPO type of on-policy algorithms can master simple manipulation tasks that are equivalent up to 48-month human babies (e.g., catching a flying object, opening a bottle), while multi-agent RL can further help to master manipulations that require skilled bimanual cooperation (e.g., lifting a pot, stacking blocks). Despite the success on each single task, when it comes to acquiring multiple manipulation skills, existing RL algorithms fail to work in most of the multi-task and the few-shot learning settings, which calls for more substantial development from the RL community. Our project is open sourced at https://github.com/PKU-MARL/DexterousHands.

翻译：实现人类层面的宽度是机器人中一个重要的开放问题。然而, 即使是在婴儿层面, 伸缩式手控的任务也很难通过强化学习( RL) 来解决。困难在于高自由度以及不同代理商( 例如手指联合)之间所需的合作。在此研究中, 我们建议采用双模脱脂手基准( Bi- DexHands ), 模拟器, 包括两只带有数十种双体操作任务和数千个目标对象的细手。具体地说, 双体手操纵的任务, 即使是在婴儿层面, 根据认知科学文献, 设计出不同层次的人类运动技能。我们在 Issac Gym 中建立了双体手操作; 这使得高效率的RL培训, 仅用一个 NVDIA RTX 3090 来达到 30L 。我们为不同环境下的流行的 RL 算法提供了一个全面的基准; 包括单体/ Multi- 试管的多层操作器。这包括 Outine RL、 Outrient Rtal、 Mul Rtal RL 和Met RL 等等任务, 任务的设计设计中, 在最简单的 RL 手操作中, 我们的手操作中, 最高级的手动的手动的手动操作需要的每个的手法, 和ML 等的手法, 手法, 需要在48 等的手法, 我们的手法, 的手法要求的手法, 需要在每48式操作法, 手法, 和最高级操作式的手法, 的手的手法, 的手法, 需要的每个的操作法, 我们的手法, 需要的手法, 我们的手法要求的手法, 的手法, 的手法, 的手法, 的手法, 的手法要求的手法, 的每个的手法, 的手法, 的手法要求的每个的每个的手法, 的手法, 的手法, 的手法, 的手法, 的手法, 的手法, 需要一个等的手法, 需要的手法, 的手法, 需要的