Boolean matrix factorization (BMF) approximates a given binary input matrix as the product of two smaller binary factors. Unlike binary matrix factorization based on standard arithmetic, BMF employs the Boolean OR and AND operations for the matrix product, which improves interpretability and reduces the approximation error. It is also used in role mining and computer vision. In this paper, we first propose algorithms for BMF that perform alternating optimization (AO) of the factor matrices, where each subproblem is solved via integer programming (IP). We then design different approaches to further enhance AO-based algorithms by selecting an optimal subset of rank-one factors from multiple runs. To address the scalability limits of IP-based methods, we introduce new greedy and local-search heuristics. We also construct a new C++ data structure for Boolean vectors and matrices that is significantly faster than existing ones and is of independent interest, allowing our heuristics to scale to large datasets. We illustrate the performance of all our proposed methods and compare them with the state of the art on various real datasets, both with and without missing data, including applications in topic modeling and imaging.


翻译:布尔矩阵分解(BMF)将给定的二元输入矩阵近似表示为两个较小二元因子的乘积。与基于标准算术的二元矩阵分解不同,BMF在矩阵乘积运算中采用布尔OR与AND操作,这提升了可解释性并降低了近似误差。该方法亦应用于角色挖掘与计算机视觉领域。本文首先提出一种BMF算法,通过交替优化(AO)因子矩阵实现,其中每个子问题均通过整数规划(IP)求解。随后,我们设计了多种方法以进一步优化基于AO的算法,通过从多次运行中筛选最优的秩一因子子集实现。为应对基于IP方法的可扩展性限制,我们引入了新的贪心与局部搜索启发式策略。同时,我们构建了一种新的C++数据结构用于处理布尔向量与矩阵,其性能显著优于现有方案,具备独立研究价值,使得我们的启发式方法能够扩展至大规模数据集。我们展示了所有提出方法的性能,并在多种真实数据集(包括含缺失数据与完整数据的情形)上与当前最优技术进行比较,涵盖主题建模与图像处理等应用场景。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员