The aim of reinforcement learning (RL) is to allow the agent to achieve the final goal. Most RL studies have focused on improving the efficiency of learning to achieve the final goal faster. However, the RL model is very difficult to modify an intermediate route in the process of reaching the final goal. That is, the agent cannot be under control to achieve other sub-goals in the existing studies. If the agent can go through the sub-goals on the way to the destination, the RL can be applied and studied in various fields. In this study, I propose a methodology to achieve the user-defined sub-goals as well as the final goal using memory editing. The memory editing is performed to generate various sub-goals and give an additional reward to the agent. In addition, the sub-goals are separately learned from the final goal. I set two simple environments and various scenarios in the test environments. As a result, the agent almost successfully passed the sub-goals as well as the final goal under control. Moreover, the agent was able to be induced to visit the novel state indirectly in the environments. I expect that this methodology can be used in the fields that need to control the agent in a variety of scenarios.
翻译:强化学习(RL)的目标是使代理商能够实现最终目标。大多数RL研究侧重于提高学习效率,以便更快地实现最终目标。然而,RL模型很难在达到最终目标的过程中修改中间路线。也就是说,代理商无法控制在现有研究中实现其他次级目标。如果代理商能够在前往目的地的路上通过次级目标,那么RL可以应用于各个领域并研究。在这个研究中,我提出了一个方法,以实现用户定义的次级目标以及利用记忆编辑实现最终目标。进行存储编辑是为了产生各种次级目标,并给代理商额外奖励。此外,次级目标与最终目标分开学习。我设置了两种简单的环境和测试环境中的各种情景。结果就是,代理商几乎成功地通过了次级目标以及所控制的最后目标。此外,我还提议了一种方法,可以间接地在环境中访问新的状态。我期望,这一方法可以在各种代理商需要控制的领域中使用。