In line with the growing trend of using machine learning to help solve combinatorial optimisation problems, one promising idea is to improve node selection within a mixed integer programming (MIP) branch-and-bound tree by using a learned policy. Previous work using imitation learning indicates the feasibility of acquiring a node selection policy, by learning an adaptive node searching order. In contrast, our imitation learning policy is focused solely on learning which of a node's children to select. We present an offline method to learn such a policy in two settings: one that comprises a heuristic by committing to pruning of nodes; one that is exact and backtracks from a leaf to guarantee finding the optimal integer solution. The former setting corresponds to a child selector during plunging, while the latter is akin to a diving heuristic. We apply the policy within the popular open-source solver SCIP, in both heuristic and exact settings. Empirical results on five MIP datasets indicate that our node selection policy leads to solutions significantly more quickly than the state-of-the-art precedent in the literature. While we do not beat the highly-optimised SCIP state-of-practice baseline node selector in terms of solving time on exact solutions, our heuristic policies have a consistently better optimality gap than all baselines, if the accuracy of the predictive model is sufficient. Further, the results also indicate that, when a time limit is applied, our heuristic method finds better solutions than all baselines in the majority of problems tested. We explain the results by showing that the learned policies have imitated the SCIP baseline, but without the latter's early plunge abort. Our recommendation is that, despite the clear improvements over the literature, this kind of MIP child selector is better seen in a broader approach using learning in MIP branch-and-bound tree decisions.
翻译:随着利用机器学习帮助解决组合优化问题的趋势不断增长,一个有希望的想法是通过使用学习的政策,在混合整数编程(MIP)分支和树上改进节点选择。以前使用模仿学习的工作表明通过学习适应节点搜索顺序获得节点选择政策的可行性。相比之下,我们的模仿学习政策完全侧重于学习节点的孩子选择哪个。我们展示了一种脱机方法,在两种情况下学习这样的政策:一种是超速的节点;一种是精确的和反向的,从叶子中确保找到最佳整数解决方案。以前使用模仿学习的学习表明通过学习适应性节点的节点选择政策是否可行。在超速和精确的环境下,我们所选择的节点选择政策比当前最短的节点选择要快得多;在文献中,我们没有更精确地解释儿童选择选择方法,而后一种更精确的方法则是在快速的源码上应用开放源码 SCIP 。