Deep reinforcement learning approaches have been a popular method for visual navigation tasks in the computer vision and robotics community of late. In most cases, the reward function has a binary structure, i.e., a large positive reward is provided when the agent reaches goal state, and a negative step penalty is assigned for every other state in the environment. A sparse signal like this makes the learning process challenging, specially in big environments, where a large number of sequential actions need to be taken to reach the target. We introduce a reward shaping mechanism which gradually adjusts the reward signal based on distance to the goal. Detailed experiments conducted using the AI2-THOR simulation environment demonstrate the efficacy of the proposed approach for object-goal navigation tasks.
翻译:深层强化学习方法是计算机视觉和机器人界近来最受欢迎的视觉导航任务方法,在大多数情况下,奖励功能具有二元结构,即当代理人达到目标状态时提供大量正面奖励,对环境中的每一个其他国家给予负级惩罚。这种微小的信号使得学习过程具有挑战性,特别是在大环境中,需要采取大量的相继行动才能达到目标。我们引入了一个奖赏塑造机制,根据距离目标的距离逐步调整奖赏信号。使用AI2-THOR模拟环境进行的详细实验展示了目标目标导航任务的拟议方法的有效性。