We introduce a sample-efficient method for learning state-dependent stiffness control policies for dexterous manipulation. The ability to control stiffness facilitates safe and reliable manipulation by providing compliance and robustness to uncertainties. Most current reinforcement learning approaches to achieve robotic manipulation have exclusively focused on position control, often due to the difficulty of learning high-dimensional stiffness control policies. This difficulty can be partially mitigated via policy guidance such as imitation learning. However, expert stiffness control demonstrations are often expensive or infeasible to record. Therefore, we present an approach to learn Stiffness Control from Augmented Position control Experiences (SCAPE) that bypasses this difficulty by transforming position control demonstrations into approximate, suboptimal stiffness control demonstrations. Then, the suboptimality of the augmented demonstrations is addressed by using complementary techniques that help the agent safely learn from both the demonstrations and reinforcement learning. By using simulation tools and experiments on a robotic testbed, we show that the proposed approach efficiently learns safe manipulation policies and outperforms learned position control policies and several other baseline learning algorithms.
翻译:我们引入了一种样本高效的方法,用于学习以国家为依存的严格控制政策,以便巧妙地操纵; 控制僵硬的能力通过提供合规和稳健的不确定性,促进安全可靠的操纵; 目前多数用于实现机器人操纵的强化学习方法完全侧重于定位控制,这往往是由于难以学习高维的僵硬控制政策。 这种困难可以通过模拟学习等政策指导来部分缓解; 然而, 专家僵硬控制示范往往费用昂贵或难以记录。 因此, 我们提出了一个方法,从强化位置控制经验(SCAPE)中学习僵硬控制,通过将定位控制演示转化为近似、次于最佳的僵硬控制演示,绕过这一困难。 然后, 扩大演示的不优化性通过辅助技术来解决, 帮助代理从演示和强化学习中安全地学习。 我们通过在机器人测试台上使用模拟工具和实验, 表明拟议方法有效地学习安全操纵政策,并符合超常学习的定位控制政策和其他一些基线学习算法。