未知任务中 Haptic 共享控制的深强化学习 (Deep Reinforcement Learning for Haptic Shared Control in Unknown Tasks)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Recent years have shown a growing interest in using haptic shared control (HSC) in teleoperated systems. In HSC, the application of virtual guiding forces decreases the user's control effort and improves execution time in various tasks, presenting a good alternative in comparison with direct teleoperation. HSC, despite demonstrating good performance, opens a new gap: how to design the guiding forces. For this reason, the challenge lies in developing controllers to provide the optimal guiding forces for the tasks that are being performed. This work addresses this challenge by designing a controller based on the deep deterministic policy gradient (DDPG) algorithm to provide the assistance, and a convolutional neural network (CNN) to perform the task detection, called TAHSC (Task Agnostic Haptic Shared Controller). The agent learns to minimize the time it takes the human to execute the desired task, while simultaneously minimizing their resistance to the provided feedback. This resistance thus provides the learning algorithm with information about which direction the human is trying to follow, in this case, the pick-and-place task. Diverse results demonstrate the successful application of the proposed approach by learning custom policies for each user who was asked to test the system. It exhibits stable convergence and aids the user in completing the task with the least amount of time possible.

翻译：近些年来,人们越来越有兴趣在电信操作系统中使用偶然的共享控制(HSC),在HSC中,虚拟指导力量的应用减少了用户的控制努力,改进了各种任务的执行时间,与直接的远程操作相比,这提供了一个很好的替代方法。HSC尽管表现良好,但还是开辟了新的差距:如何设计指导力量。为此,挑战在于如何开发控制器,为正在执行任务提供最佳的指导力量。这项工作通过设计一个基于深度确定性政策梯度(DPG)算法的控制器来应对这一挑战,以提供协助,并建立一个革命神经网络来进行任务探测,称为TAHASSC(TASK Agnostic Haptic 共享控制器),该控制器学会最大限度地减少人执行预期任务所需的时间,同时尽量减少他们对所提供的反馈的阻力。因此,这种阻力为学习关于人类努力遵循的方向的信息提供了算法,在此情况下,选择和定位任务的任务。变换的结果表明,通过为每个用户学习标准化政策,成功地应用拟议的方法,在最稳定的时间上,每个用户都要求测试最不稳定的任务测试。