Neural algorithmic reasoning studies the problem of learning algorithms with neural networks, especially with graph architectures. A recent proposal, XLVIN, reaps the benefits of using a graph neural network that simulates the value iteration algorithm in deep reinforcement learning agents. It allows model-free planning without access to privileged information about the environment, which is usually unavailable. However, XLVIN only supports discrete action spaces, and is hence nontrivially applicable to most tasks of real-world interest. We expand XLVIN to continuous action spaces by discretization, and evaluate several selective expansion policies to deal with the large planning graphs. Our proposal, CNAP, demonstrates how neural algorithmic reasoning can make a measurable impact in higher-dimensional continuous control settings, such as MuJoCo, bringing gains in low-data settings and outperforming model-free baselines.
翻译:神经算法推理研究神经网络的学习算法问题,特别是图形结构。最近的一项提案,XLVIIN, 收获了使用图形神经网络以模拟深重增强学习剂中的值迭代算法的好处。它允许在无法获得关于环境的特许信息的情况下进行无模式规划,而没有获得关于环境的特许信息,而这种信息通常是没有的。然而,XLVIIN只支持离散的行动空间,因此不能应用于现实世界感兴趣的大多数任务。我们通过分解将XLVIIN扩大到连续行动空间,并评估若干有选择的扩展政策来处理大型规划图。我们的提案, CPAP, 展示了神经算法推理如何在高维持续控制环境中产生可衡量的影响, 比如 MujoCo, 带来低数据环境的收益, 以及超过模型的基线。