We tackle the problem of goal-directed graph construction: given a starting graph, a budget of modifications, and a global objective function, the aim is to find a set of edges whose addition to the graph achieves the maximum improvement in the objective (e.g., communication efficiency). This problem emerges in many networks of great importance for society such as transportation and critical infrastructure networks. We identify two significant shortcomings with present methods. Firstly, they focus exclusively on network topology while ignoring spatial information; however, in many real-world networks, nodes are embedded in space, which yields different global objectives and governs the range and density of realizable connections. Secondly, existing RL methods scale poorly to large networks due to the high cost of training a model and the scaling factors of the action space and global objectives. In this work, we formulate this problem as a deterministic MDP. We adopt the Monte Carlo Tree Search framework for planning in this domain, prioritizing the optimality of final solutions over the speed of policy evaluation. We propose several improvements over the standard UCT algorithm for this family of problems, addressing their single-agent nature, the trade-off between the costs of edges and their contribution to the objective, and an action space linear in the number of nodes. We demonstrate the suitability of this approach for improving the global efficiency and attack resilience of a variety of synthetic and real-world networks, including Internet backbone networks and metro systems. Our approach obtains a 24% improvement in these metrics compared to UCT on the largest networks tested and scalability superior to previous methods.
翻译:我们处理以目标为方向的图形构建问题:根据一个起始图,一个修改预算预算,以及一个全球目标功能,目标是找到一组边缘,这些边缘在图中增加的优势在目标方面实现最大改进(例如通信效率),这个问题出现在交通和关键基础设施网络等社会非常重要的许多网络中。我们用目前的方法找出了两个重大缺陷。首先,它们完全侧重于网络地形学,而忽视了空间信息;然而,在许多现实世界网络中,节点位于空间,产生不同的全球目标,并制约着可实现连接的范围和密度。第二,现有的RL方法在大型网络中规模过低,因为培训模型的费用以及行动空间和全球目标的扩大因素很高。在这项工作中,我们将这一问题发展成一种威慑性MDP。我们采用了蒙特卡洛树搜索框架来规划这一领域,将最终解决方案的最佳性置于政策评估的速度之上。我们建议对这一类问题的UCT算法进行一些改进,解决其单一代理性质,并规范其可实现可实现的连接连接性。第二,现有RL方法的可靠性方法对大型网络来说,因为培训模式的成本以及行动空间空间空间空间空间空间空间空间空间空间空间的扩大程度的成本和程度对于我们的目标的准确性与程度的贡献。