Deep neural networks (DNN) can approximate value functions or policies for reinforcement learning, which makes the reinforcement learning algorithms more powerful. However, some DNNs, such as convolutional neural networks (CNN), cannot extract enough information or take too long to obtain enough features from the inputs under specific circumstances of reinforcement learning. For example, the input data of Google Research Football, a reinforcement learning environment which trains agents to play football, is the small map of players' locations. The information is contained not only in the coordinates of players, but also in the relationships between different players. CNNs can neither extract enough information nor take too long to train. To address this issue, this paper proposes a deep q-learning network (DQN) with a graph neural network (GNN) as its model. The GNN transforms the input data into a graph which better represents the football players' locations so that it extracts more information of the interactions between different players. With two GNNs to approximate its local and target value functions, this DQN allows players to learn from their experience by using value functions to see the prospective value of each intended action. The proposed model demonstrated the power of GNN in the football game by outperforming other DRL models with significantly fewer steps.
翻译:深心神经网络(DNN)可以对强化学习的功能或政策进行价值评估,从而使强化学习算法更加强大。 然而,一些DNN,如神经神经网络(CNN),无法在强化学习的具体情况下从投入中提取足够的信息或花太长的时间获得足够的特征。例如,Google Research Fool的输入数据是一个强化学习环境,可以训练足球运动员踢球,这是一个强化学习环境,是球员位置的小型地图。信息不仅包含在球员的坐标上,而且包含在不同球员之间的关系上。CNN既不能提取足够的信息,也不能花费太长的时间来训练。为解决这一问题,本文件建议建立一个深q学习网络,以图形神经网络(GNNN)为模型。GNN将输入数据转换成一个图表,更好地代表足球运动员的位置,从而获得更多关于球员之间互动的信息。由于两个GNNN可以接近其本地和目标价值功能,DQN可以让球员通过使用价值功能学习他们的经验,从而看到每个预期的行动的潜在价值。 提议的GNNNM模型显示GDR 以较低的游戏模式显示GNDR 。