Task automation of surgical robot has the potentials to improve surgical efficiency. Recent reinforcement learning (RL) based approaches provide scalable solutions to surgical automation, but typically require extensive data collection to solve a task if no prior knowledge is given. This issue is known as the exploration challenge, which can be alleviated by providing expert demonstrations to an RL agent. Yet, how to make effective use of demonstration data to improve exploration efficiency still remains an open challenge. In this work, we introduce Demonstration-guided EXploration (DEX), an efficient reinforcement learning algorithm that aims to overcome the exploration problem with expert demonstrations for surgical automation. To effectively exploit demonstrations, our method estimates expert-like behaviors with higher values to facilitate productive interactions, and adopts non-parametric regression to enable such guidance at states unobserved in demonstration data. Extensive experiments on $10$ surgical manipulation tasks from SurRoL, a comprehensive surgical simulation platform, demonstrate significant improvements in the exploration efficiency and task success rates of our method. Moreover, we also deploy the learned policies to the da Vinci Research Kit (dVRK) platform to show the effectiveness on the real robot. Code is available at https://github.com/med-air/DEX.
翻译:外科机器人的任务自动化具有提高外科手术效率的潜力。最近的强化学习(RL)方法为外科自动化提供了可扩展的解决方案,但通常需要广泛收集数据,以便在没有事先知识的情况下解决一项任务。这个问题被称为探索挑战,通过向外科机器人提供专家演示可以缓解这一挑战。然而,如何有效利用演示数据提高探索效率仍是一个尚未解决的挑战。在这项工作中,我们引入了演示引导推算法(DEX),这是一种高效强化学习算法,目的是通过手术自动化专家演示克服探索问题。为了有效地利用演示,我们的方法估计具有更高价值的专家类似行为,以促进生产性互动,并采用非参数回归法,以便在演示数据中未观测到的州提供这种指导。关于SurRoL(一个综合外科模拟平台)的1万美元外科操作任务的广泛实验,显示了我们方法的探索效率和任务成功率的显著提高。此外,我们还在达芬奇研究工具包(dVRK)平台上应用了学习到的政策,以显示真实机器人的效能。 https://github. EX.m。