Can we learn how to explore unknown spaces efficiently? To answer this question, we study the problem of Online Graph Exploration, the online version of the Traveling Salesperson Problem. We reformulate graph exploration as a reinforcement learning problem and apply Direct Future Prediction (Dosovitskiy and Koltun, 2017) to solve it. As the graph is discovered online, the corresponding Markov Decision Process entails a dynamic state space, namely the observable graph and a dynamic action space, namely the nodes forming the graph's frontier. To the best of our knowledge, this is the first attempt to solve online graph exploration in a data-driven way. We conduct experiments on six data sets of procedurally generated graphs and three real city road networks. We demonstrate that our agent can learn strategies superior to many well known graph traversal algorithms, confirming that exploration can be learned.
翻译:我们能否学习如何有效探索未知空间呢?为了回答这个问题,我们研究了在线图表探索问题,即在线版《旅行销售商问题》。我们重新将图表探索作为强化学习问题进行重新配置,并应用直接未来预测(Dosovitskiy和Koltun,2017年)来解决该问题。随着该图在网上被发现,相应的Markov决策程序需要动态状态空间,即可观测图和动态行动空间,即构成图形前沿的节点。据我们所知,这是首次尝试以数据驱动的方式解决在线图表探索。我们实验了六套按程序生成的图表和三个真正的城市道路网络。我们证明我们的代理可以学习优于许多众所周知的图表穿行算法的战略,确认可以学习探索。