通过多机构加强学习,合作协助机器人外科手术 (Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning)

Paul Maria Scheikl,Balázs Gyenes,Tornike Davitashvili,Rayan Younis,André Schulze,Beat P. Müller-Stich,Gerhard Neumann,Martin Wagner,Franziska Mathis-Ullrich

from arxiv, Accepted at the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

Cognitive cooperative assistance in robot-assisted surgery holds the potential to increase quality of care in minimally invasive interventions. Automation of surgical tasks promises to reduce the mental exertion and fatigue of surgeons. In this work, multi-agent reinforcement learning is demonstrated to be robust to the distribution shift introduced by pairing a learned policy with a human team member. Multi-agent policies are trained directly from images in simulation to control multiple instruments in a sub task of the minimally invasive removal of the gallbladder. These agents are evaluated individually and in cooperation with humans to demonstrate their suitability as autonomous assistants. Compared to human teams, the hybrid teams with artificial agents perform better considering completion time (44.4% to 71.2% shorter) as well as number of collisions (44.7% to 98.0% fewer). Path lengths, however, increase under control of an artificial agent (11.4% to 33.5% longer). A multi-agent formulation of the learning problem was favored over a single-agent formulation on this surgical sub task, due to the sequential learning of the two instruments. This approach may be extended to other tasks that are difficult to formulate within the standard reinforcement learning framework. Multi-agent reinforcement learning may shift the paradigm of cognitive robotic surgery towards seamless cooperation between surgeons and assistive technologies.

翻译：在机器人辅助外科手术中,认知合作援助有可能提高护理质量,尽量减少侵入性干预; 外科手术的自动化任务有望减少外科医生的精神疲劳和疲劳; 在这项工作中,多剂强化学习被证明是强有力的,通过与人类团队成员对熟识的政策进行配对,向分配转变转化; 多剂政策直接从模拟图像培训到控制多种工具,以最小侵入性清除胆囊手术的次级任务为目的,从模拟图像培训到控制多种工具; 对这些代理进行单独评估并与人合作,以表明他们是否适合成为自主助理; 与人类团队相比,拥有人工剂的混合小组在考虑完成时间(44.4%至71.2%短)和碰撞次数(44.7%至98.0%短)方面表现得更好; 然而,在人工剂控制下,路径长度增加(11.4%至33.5%长); 多剂对学习问题的配方配方优于这一外科子任务的单剂配方,因为两种工具的顺序学习。这种方法可能扩大到其他任务难以在标准强化型外科外科外科手术框架内制定无缝合作。