Dexterous in-hand manipulation for a multi-fingered anthropomorphic hand is extremely difficult because of the high-dimensional state and action spaces, rich contact patterns between the fingers and objects. Even though deep reinforcement learning has made moderate progress and demonstrated its strong potential for manipulation, it is still faced with certain challenges, such as large-scale data collection and high sample complexity. Especially, for some slight change scenes, it always needs to re-collect vast amounts of data and carry out numerous iterations of fine-tuning. Remarkably, humans can quickly transfer learned manipulation skills to different scenarios with little supervision. Inspired by human flexible transfer learning capability, we propose a novel dexterous in-hand manipulation progressive transfer learning framework (PTL) based on efficiently utilizing the collected trajectories and the source-trained dynamics model. This framework adopts progressive neural networks for dynamics model transfer learning on samples selected by a new samples selection method based on dynamics properties, rewards and scores of the trajectories. Experimental results on contact-rich anthropomorphic hand manipulation tasks show that our method can efficiently and effectively learn in-hand manipulation skills with a few online attempts and adjustment learning under the new scene. Compared to learning from scratch, our method can reduce training time costs by 95%.
翻译:熟练手部操作对于多指人形手来说非常困难,因为存在高维状态和动作空间,手指和物体之间存在复杂的接触模式。虽然深度强化学习已经取得了一定的进展并展现了它在操作方面的潜力,但它仍然面临一些挑战,例如数据收集的规模和高样本复杂性。特别是对于一些轻微变化的场景,它总是需要重新收集大量数据并进行大量的微调迭代。值得注意的是,人类可以快速将学习到的操作技能转移到不同的场景中,几乎不需要监督。受人类灵活的迁移学习能力启发,我们提出了一个新颖的渐进式迁移学习框架(PTL),用于人形手中的熟练手部操作,它基于收集的轨迹和源训练的动力学模型的有效利用。该框架采用基于动力学特性、奖励和轨迹评分进行新样本选择的逐步神经网络进行动力学模型迁移学习。对接触丰富的人类手部操作任务的实验结果表明,我们的方法可以在几次在线尝试和调整学习下有效地学习手部操作技能。与从头开始学习相比,我们的方法可以将培训时间成本降低95%。