Deep RL approaches build much of their success on the ability of the deep neural network to generate useful internal representations. Nevertheless, they suffer from a high sample-complexity and starting with a good input representation can have a significant impact on the performance. In this paper, we exploit the fact that the underlying Markov decision process (MDP) represents a graph, which enables us to incorporate the topological information for effective state representation learning. Motivated by the recent success of node representations for several graph analytical tasks we specifically investigate the capability of node representation learning methods to effectively encode the topology of the underlying MDP in Deep RL. To this end we perform a comparative analysis of several models chosen from 4 different classes of representation learning algorithms for policy learning in grid-world navigation tasks, which are representative of a large class of RL problems. We find that all embedding methods outperform the commonly used matrix representation of grid-world environments in all of the studied cases. Moreoever, graph convolution based methods are outperformed by simpler random walk based methods and graph linear autoencoders.
翻译:深 RL 方法在深度神经网络生成有用的内部陈述的能力上取得了许多成功,但是,它们具有高度的样本复杂性,从良好的投入代表开始,可以对业绩产生重大影响。在本文中,我们利用以下事实:基底的Markov 决策程序(MDP)代表一个图表,使我们能够将地形信息纳入有效的国家代表性学习。我们特别调查了节点代表学习方法的能力,以有效地将深RL 中MDP的原始地形编码。为此,我们比较分析从四个不同类别的代表学习方法中选定的若干模型,用于在网络世界导航任务中进行政策学习,这些模型代表了一大类RL问题。我们发现,所有嵌入的方法都超越了所有研究案例中通用的网格世界环境矩阵代表。更多的是,基于图形变异的方法以更简单的随机行走法和图式直线式自动解算法为外形。