In this paper, we present a novel method for achieving dexterous manipulation of complex objects, while simultaneously securing the object without the use of passive support surfaces. We posit that a key difficulty for training such policies in a Reinforcement Learning framework is the difficulty of exploring the problem state space, as the accessible regions of this space form a complex structure along manifolds of a high-dimensional space. To address this challenge, we use two versions of the non-holonomic Rapidly-Exploring Random Trees algorithm; one version is more general, but requires explicit use of the environment's transition function, while the second version uses manipulation-specific kinematic constraints to attain better sample efficiency. In both cases, we use states found via sampling-based exploration to generate reset distributions that enable training control policies under full dynamic constraints via model-free Reinforcement Learning. We show that these policies are effective at manipulation problems of higher difficulty than previously shown, and also transfer effectively to real robots. Videos of the real-hand demonstrations can be found on the project website: https://sbrl.cs.columbia.edu/
翻译:在本文中,我们提出了一个实现对复杂物体进行极速操纵的新方法,同时不使用被动支持表面而保护该物体。我们假设,在加强学习框架内培训此类政策的一个关键困难是难以探索问题状态空间,因为这一空间的可进入区域形成一个复杂结构,沿着高维空间的方块形成一个复杂的结构。为了应对这一挑战,我们使用两种版本的非超光速探索随机树算法;一种版本比较笼统,但需要明确使用环境的过渡功能,而第二种版本则使用操纵特定运动障碍来提高抽样效率。在这两种情况下,我们使用通过取样勘探发现的国家来重新设定分布,以便能够在完全动态限制下,通过无模型强化学习实现培训控制政策。我们表明,这些政策对于处理比以往显示的难度更高的操纵问题十分有效,而且有效地转让给真正的机器人。在项目网站上可以找到真实演示的视频:https://sbrl.ccolumbia.edu./edu。</s>