While classic control theory offers state of the art solutions in many problem scenarios, it is often desired to improve beyond the structure of such solutions and surpass their limitations. To this end, residual policy learning (RPL) offers a formulation to improve existing controllers with reinforcement learning (RL) by learning an additive "residual" to the output of a given controller. However, the applicability of such an approach highly depends on the structure of the controller. Often, internal feedback signals of the controller limit an RL algorithm to adequately change the policy and, hence, learn the task. We propose a new formulation that addresses these limitations by also modifying the feedback signals to the controller with an RL policy and show superior performance of our approach on a contact-rich peg-insertion task under position and orientation uncertainty. In addition, we use a recent Cartesian impedance control architecture as the control framework which can be available to us as a black-box while assuming no knowledge about its input/output structure, and show the difficulties of standard RPL. Furthermore, we introduce an adaptive curriculum for the given task to gradually increase the task difficulty in terms of position and orientation uncertainty. A video showing the results can be found at https://youtu.be/SAZm_Krze7U .
翻译:虽然经典控制理论在许多问题情景中提供了最先进的解决方案,但通常希望改进这些解决方案的结构,超越这些解决方案的结构,超越其局限性。为此,残余政策学习(RPL)提供了一种提法,通过学习某个控制器输出的“剩余”添加剂“残余”来改进现有控制器,但这种方法的适用性在很大程度上取决于控制器的结构。通常,控制器的内部反馈信号限制RL算法,以适当改变政策,从而了解任务。我们提出了一个新的提法,通过修改给控制器的反馈信号,以RL政策解决这些限制,并显示我们在定位和方向不确定的情况下,在接触-富固的插入任务上表现优异。此外,我们使用最近的卡泰斯阻力控制架构作为控制框架,我们可以利用它作为黑箱,同时假设对它的投入/产出结构一无所知,并显示标准RPL的难度。此外,我们为特定任务引入了适应性课程,以逐步增加定位和方向不确定性方面的任务难度。AVE7K显示结果。