Identifying discrete patterns in binary data is an important dimensionality reduction tool in machine learning and data mining. In this paper, we consider the problem of low-rank binary matrix factorisation (BMF) under Boolean arithmetic. Due to the NP-hardness of this problem, most previous attempts rely on heuristic techniques. We formulate the problem as a mixed integer linear program and use a large scale optimisation technique of column generation to solve it without the need of heuristic pattern mining. Our approach focuses on accuracy and on the provision of optimality guarantees. Experimental results on real world datasets demonstrate that our proposed method is effective at producing highly accurate factorisations and improves on the previously available best known results for 15 out of 24 problem instances.
翻译:在机器学习和数据挖掘中,识别二元数据中的离散模式是一个重要的维度减少工具。在本文中,我们考虑了在布林算术中低级别二元矩阵因子化的问题。由于这一问题的NP-硬性,大多数先前的尝试都依赖超自然技术。我们将这一问题作为一个混合的线性程序来描述,并使用大规模生成柱子的优化技术来解决这个问题,而不需要超自然模式采矿。我们的方法侧重于准确性和提供最佳性保证。关于真实世界数据集的实验结果表明,我们所提议的方法能够有效地产生高度准确的因子化,并改进24个问题案例中15个案例以前已知的最佳结果。