Constraint programming is known for being an efficient approach for solving combinatorial problems. Important design choices in a solver are the branching heuristics, which are designed to lead the search to the best solutions in a minimum amount of time. However, developing these heuristics is a time-consuming process that requires problem-specific expertise. This observation has motivated many efforts to use machine learning to automatically learn efficient heuristics without expert intervention. To the best of our knowledge, it is still an open research question. Although several generic variable-selection heuristics are available in the literature, the options for a generic value-selection heuristic are more scarce. In this paper, we propose to tackle this issue by introducing a generic learning procedure that can be used to obtain a value-selection heuristic inside a constraint programming solver. This has been achieved thanks to the combination of a deep Q-learning algorithm, a tailored reward signal, and a heterogeneous graph neural network architecture. Experiments on graph coloring, maximum independent set, and maximum cut problems show that our framework is able to find better solutions close to optimality without requiring a large amounts of backtracks while being generic.
翻译:以高效解决组合问题的高效方法而著称限制编程。 解析器中的重要设计选择是分流结构学, 目的是引导在最短的时间内寻找最佳解决方案。 但是, 开发这些超自然学是一个耗时的过程, 需要特定问题的专门知识。 这一观察促使人们作出许多努力, 利用机器学习自动学习高效休养术, 而没有专家的干预。 在我们的知识中, 它仍然是一个开放的研究问题。 尽管文献中有一些通用的变量选择超自然学, 但通用价值选择超自然学的选择比较少。 在本文中, 我们提议通过引入一个通用学习程序来解决这一问题, 该程序可以用来获得一个价值选择超自然的抑制程序 。 这一点之所以得以实现, 是因为一种深层次的Q- 学习算法、 定制的奖赏信号和 多元的图形神经网络结构相结合。 在图形颜色、 最大独立设置和最大剪切问题上进行的实验表明, 我们的框架能够找到更接近优化的解决方案, 而不需要大量的后行法。