The aim of path planning is to reach the goal from starting point by searching for the route of an agent. In the path planning, the routes may vary depending on the number of variables such that it is important for the agent to reach various goals. Numerous studies, however, have dealt with a single goal that is predefined by the user. In the present study, I propose a novel reinforcement learning framework for a fully controllable agent in the path planning. To do this, I propose a bi-directional memory editing to obtain various bi-directional trajectories of the agent, in which the behavior of the agent and sub-goals are trained on the goal-conditioned RL. As for moving the agent in various directions, I utilize the sub-goals dedicated network, separated from a policy network. Lastly, I present the reward shaping to shorten the number of steps for the agent to reach the goal. In the experimental result, the agent was able to reach the various goals that have never been visited by the agent in the training. We confirmed that the agent could perform difficult missions such as a round trip and the agent used the shorter route with the reward shaping.
翻译:路径规划的目的是通过寻找一个代理人的路线,从起点就达到目标。在路径规划中,路线可能因变量数量不同而不同,因此对于代理人达到各种目标非常重要。然而,许多研究涉及的是用户预先确定的一个单一目标。在本研究中,我提议为完全可控的代理人在路径规划中提供一个新的强化学习框架。为了做到这一点,我提议进行双向记忆编辑,以获得代理人的各种双向轨迹。在该轨迹规划中,代理人和次级目标的行为会受到有目标的RL的培训。关于向不同方向移动代理人,我利用与政策网络分离的子目标专用网络。最后,我介绍为缩短代理人达到目标的步骤数目而形成的奖励。在实验结果中,代理人能够达到培训中从未访问过的各种目标。我们确认,代理人可以执行象往返旅行那样的困难任务,代理人使用较短的路线来塑造奖赏。