From logistics to the natural sciences, combinatorial optimisation on graphs underpins numerous real-world applications. Reinforcement learning (RL) has shown particular promise in this setting as it can adapt to specific problem structures and does not require pre-solved instances for these, often NP-hard, problems. However, state-of-the-art (SOTA) approaches typically suffer from severe scalability issues, primarily due to their reliance on expensive graph neural networks (GNNs) at each decision step. We introduce ECORD; a novel RL algorithm that alleviates this expense by restricting the GNN to a single pre-processing step, before entering a fast-acting exploratory phase directed by a recurrent unit. Experimentally, ECORD achieves a new SOTA for RL algorithms on the Maximum Cut problem, whilst also providing orders of magnitude improvement in speed and scalability. Compared to the nearest competitor, ECORD reduces the optimality gap by up to 73% on 500 vertex graphs with a decreased wall-clock time. Moreover, ECORD retains strong performance when generalising to larger graphs with up to 10000 vertices.
翻译:从物流到自然科学,图表的组合优化是无数现实世界应用的基础。强化学习(RL)在这种环境中表现出特别的希望,因为它能够适应具体问题的结构,不需要事先解决这些问题,通常是NP硬的问题。然而,最先进的方法通常面临严重的可缩放问题,主要是因为每个决定步骤都依赖昂贵的图形神经网络。我们引入了ECORD;一种新的RL算法,将GNN限制在单一的预处理阶段,然后进入由经常单位指导的快速反应探索阶段,从而减轻这一开支。实验性地,ECORD在最大截断问题上为RL算法实现新的SOTA,同时提供速度和可缩放的量级改进。与最近的调控器相比,ECRD将最佳性差距缩小到500个螺旋图的73%,并缩短了墙时钟。此外,ECRD在以10000顶的更大型图表为总体化时保持了强劲的性表现。