Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this paper, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action generated along with one RL method can be reused by another RL method for training, while the second one, transferability, means that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first generate macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the generated macro actions.
翻译:常规强化学习(RL)通常在每个时段决定适当的原始动作。 但是,通过使用适当的宏观动作(定义为原始动作的顺序),一个代理人能够绕过中间状态到更远的状态并便利其学习程序。我们希望调查的问题是,宏观行动可能具有哪些相关的有利属性。在本文件中,我们公开了宏观动作的可恢复性和可转移性。第一个属性(可恢复性)意味着与一种RL方法一起产生的宏观动作可以被另一个RL方法用于培训,而第二个是可转移性,意味着可以在不同奖励环境的类似环境中,为培训代理人使用宏观动作。在实验中,我们首先与RL方法一起产生宏观动作。然后我们提供一套分析,以揭示生成的宏观动作的可恢复性和可转移性。