In this paper, we present an active vision method using a deep reinforcement learning approach for a humanoid soccer-playing robot. The proposed method adaptively optimises the viewpoint of the robot to acquire the most useful landmarks for self-localisation while keeping the ball into its viewpoint. Active vision is critical for humanoid decision-maker robots with a limited field of view. To deal with an active vision problem, several probabilistic entropy-based approaches have previously been proposed which are highly dependent on the accuracy of the self-localisation model. However, in this research, we formulate the problem as an episodic reinforcement learning problem and employ a Deep Q-learning method to solve it. The proposed network only requires the raw images of the camera to move the robot's head toward the best viewpoint. The model shows a very competitive rate of 80% success rate in achieving the best viewpoint. We implemented the proposed method on a humanoid robot simulated in Webots simulator. Our evaluations and experimental results show that the proposed method outperforms the entropy-based methods in the RoboCup context, in cases with high self-localisation errors.
翻译:在本文中,我们展示了一种主动的视觉方法,使用一种深度强化学习方法来为人类足球游戏机器人学习。 提议的方法适应性地选择了机器人的观点, 以便获得最有用的自我定位标志, 同时又将球保留在自己的观点中。 主动的视觉对于视觉有限的人造决策者机器人至关重要。 为了处理积极的视觉问题, 先前已经提出了几种基于概率的昆虫方法, 这些方法高度取决于自我定位模型的准确性。 然而, 在这项研究中, 我们把问题发展成一个偶发强化学习问题, 并使用深Q学习方法来解决它。 拟议的网络只需要相机的原始图像才能将机器人的头移到最佳的观点中。 该模型显示在取得最佳观点方面非常有80%的成功率。 我们实施了在Webots模拟模型中模拟的人类机器人的拟议方法。 我们的评估和实验结果显示, 在高自我定位错误的情况下, 所提议的方法比RoboCup 环境中的基于加密的方法要好。