Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.
翻译:混合整数线性程序( MILPs) 的组合优化问题 被构建为混合整数线性程序( MILPs), 在一系列现实世界应用中, 普遍存在一系列的混合整数线性程序( RLPs) 。 粗略的分支和受约束的算法试图通过构建一个日益受限制的子问题搜索树来完全解决 MILP 。 实际上, 它的解算时间性能取决于疲软, 比如选择下一个变量来约束( “ 编程 ” ) 。 最近, 机器学习( MLL) 已成为一个有希望的分支模式。 但是, 先前的工作在应用强化学习( RL) 学习( RL ), 引用微薄的奖赏、 困难的500 和部分易腐蚀性变量作为重大挑战。 相反, 导致MLL 方法在模拟学习( IIL) 高级手工艺, 排除发现新政策, 需要昂贵的数据标签。 在这项工作中, 我们建议追溯地将RL 的搜索树变成一个简单的分流方法,, 将搜索树在下一个方向上, 我们使代理能够从更精确的校内学习到更精确的校内 。