This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, they are analogous to the depth-first and breadth-first search algorithms in graph theory. This paper, therefore, first designs two bonuses for each of them. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonuses contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.
翻译:本文介绍了一种新颖的方法,将内在奖金添加到以任务为导向的奖励功能上,以便有效地促进强化学习搜索。虽然迄今为止已经设计了各种奖金,但是它们与图表理论中的深度和广度第一搜索算法相似。因此,本文件首先设计了每种奖金的两种奖金。然后,在迭代深化搜索的启发下,对设计奖金适用了一种超常收益表表,因为人们知道,迭代深化搜索可以继承两种搜索算法的优势。所拟议的方法有望通过逐步探索未知的状态,使代理人能够有效地在更深的州达成最佳解决方案。 在三种移动式任务中,奖励密集,三种简单任务微薄,显示两种奖金类型有助于不同任务的业绩改进。此外,通过将它们与拟议的收益表相结合,所有任务都可以以高性能完成。