Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent's proficiency in the task increases.
翻译:强化学习(RL)在机器人控制中被广泛采用。尽管取得了许多成功,但一个长期存在的主要问题是数据效率极低。一个解决办法是互动反馈,这证明大大加快了RL的速度。结果,有多种不同的战略,但主要是在离散的网格世界和小规模的最佳控制情景上进行测试。在文献中,对于哪些反馈频率是最佳的,或何时反馈最有利,没有共识。为了解决这些差异,我们孤立机器人任务中反馈频率的影响,并量化其持续状态和行动空间。实验包括不同复杂程度的机器人操纵器臂的反动运动学习。我们表明,报告似乎相互矛盾的现象发生在不同的复杂程度。此外,我们的结果表明,不存在单一理想的反馈频率。相反,反馈频率应该随着代理人对任务的熟练程度的提高而改变。