Backtracking search algorithms are often used to solve the Constraint Satisfaction Problem (CSP). The efficiency of backtracking search depends greatly on the variable ordering heuristics. Currently, the most commonly used heuristics are hand-crafted based on expert knowledge. In this paper, we propose a deep reinforcement learning based approach to automatically discover new variable ordering heuristics that are better adapted for a given class of CSP instances. We show that directly optimizing the search cost is hard for bootstrapping, and propose to optimize the expected cost of reaching a leaf node in the search tree. To capture the complex relations among the variables and constraints, we design a representation scheme based on Graph Neural Network that can process CSP instances with different sizes and constraint arities. Experimental results on random CSP instances show that the learned policies outperform classical hand-crafted heuristics in terms of minimizing the search tree size, and can effectively generalize to instances that are larger than those used in training.
翻译:后跟踪搜索算法通常用于解决限制满意度问题。 后跟踪搜索的效率在很大程度上取决于可变顺序偏差。 目前,最常用的休养术是基于专家知识手工制作的。 在本文中,我们建议采用深强化学习法自动发现新的可变顺序排列法,这些可更适合特定类别的CSP案例。 我们显示,直接优化搜索成本对靴索来说很难,并提议优化到达搜索树叶节点的预期成本。 为了捕捉变量和制约之间的复杂关系,我们设计了一个基于图形神经网络的代言法,可以处理不同尺寸和制约性的情况。 随机的CSP实例实验结果显示,在尽量减少搜索树的大小方面,所学的政策优于传统的手工艺超常。 我们可有效地概括到比培训中使用的更大的例子。