Identifying discrete patterns in binary data is an important dimensionality reduction tool in machine learning and data mining. In this paper, we consider the problem of low-rank binary matrix factorisation (BMF) under Boolean arithmetic. Due to the hardness of this problem, most previous attempts rely on heuristic techniques. We formulate the problem as a mixed integer linear program and use a large scale optimisation technique of column generation to solve it without the need of heuristic pattern mining. Our approach focuses on accuracy and on the provision of optimality guarantees. Experimental results on real world datasets demonstrate that our proposed method is effective at producing highly accurate factorisations and improves on the previously available best known results for 15 out of 24 problem instances.
翻译:在机器学习和数据挖掘中,识别二元数据中的离散模式是一个重要的维度减少工具。在本文中,我们考虑了在布林算术中低级别二元矩阵因子化的问题。由于这一问题的难度,大多数先前的尝试都依赖于脂质技术。我们将这一问题作为一个混合整数线程序来表述,并使用大规模生成柱子的优化技术来解决这个问题,而无需使用超自然模式的采矿。我们的方法侧重于准确性和提供最佳性保障。关于真实世界数据集的实验结果表明,我们所提议的方法在为24个问题案例中的15个案例提供高度准确的因子化和改进以前已知的最佳结果方面是有效的。