In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude.
翻译:在许多加强学习的实际应用(RL)中,执行行动需要消耗某些在每种情况下都是不可耗尽的资源类型。典型的应用包括有限的能量机器人控制以及消耗物品的视频游戏。在使用非消耗性资源的任务中,我们观察到软性行为者评论家等受欢迎的RL方法的取样效率低下。主要原因是,它们往往耗尽资源的速度很快,因此由于缺乏资源,随后的勘探受到严重限制。为了应对这一挑战,我们首先将上述问题正式确定为资源限制的强化学习,然后提出一个新的资源意识勘探奖金(RAEB),以合理利用资源。RAEB的一个吸引人的特点是,它可以大大减少不必要的资源消耗试验,同时有效地鼓励代理人探索不受注意的状态。实验表明,拟议的RAEB大大超越了资源限制的强化学习环境中的先进勘探战略,使抽样效率提高到一定的规模。