数据中采矿制约因素整数线性规划框架 (An Integer Linear Programming Framework for Mining Constraints from Data)

Structured output prediction problems (e.g., sequential tagging, hierarchical multi-class classification) often involve constraints over the output label space. These constraints interact with the learned models to filter infeasible solutions and facilitate in building an accountable system. However, although constraints are useful, they are often based on hand-crafted rules. This raises a question -- \emph{can we mine constraints and rules from data based on a learning algorithm?} In this paper, we present a general framework for mining constraints from data. In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem. Then, given the coefficients of the objective function and the corresponding solution, we mine the underlying constraints by estimating the outer and inner polytopes of the feasible set. We verify the proposed constraint mining algorithm in various synthetic and real-world applications and demonstrate that the proposed approach successfully identifies the feasible set at scale. In particular, we show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules. Our algorithm can also integrate with a neural network model to learn the hierarchical label structure of a multi-label classification task. Besides, we provide a theoretical analysis about the tightness of the polytopes and the reliability of the mined constraints.

翻译：结构化产出预测问题(例如,顺序标记、等级多级分类)往往涉及产出标签空间的制约。这些制约与为过滤不可行解决方案和促进建立问责系统而学习的模型相互作用,然而,尽管这些制约是有用的,但往往以手工设计的规则为基础。这提出了一个问题 -- -- 我们能否从基于学习算法的数据中解开限制和规则?}在本文件中,我们提出了一个从数据中解开采矿制约的一般框架。特别是,我们认为结构化产出预测的推论是一个整线性编程(ILP)问题。随后,鉴于目标功能的系数和相应的解决方案,我们通过估计可行数据集的外部和内部多面来消除潜在的制约。我们核实了各种合成和现实世界应用中拟议的限制采矿算法,并表明拟议方法成功地确定了规模化的可行设定。我们的方法可以解决9x9 苏多调解谜谜和从实例中最小的跨树问题,而无需提供基本规则。我们的算法还可以与神经性网络结合,通过估计可行的成套多面结构结构分析,我们又可以提供高层次标签。