Boolean matrix factorization (BMF) approximates a given binary input matrix as the product of two smaller binary factors. Unlike binary matrix factorization based on standard arithmetic, BMF employs the Boolean OR and AND operations for the matrix product, which improves interpretability and reduces the approximation error. It is also used in role mining and computer vision. In this paper, we first propose algorithms for BMF that perform alternating optimization (AO) of the factor matrices, where each subproblem is solved via integer programming (IP). We then design different approaches to further enhance AO-based algorithms by selecting an optimal subset of rank-one factors from multiple runs. To address the scalability limits of IP-based methods, we introduce new greedy and local-search heuristics. We also construct a new C++ data structure for Boolean vectors and matrices that is significantly faster than existing ones and is of independent interest, allowing our heuristics to scale to large datasets. We illustrate the performance of all our proposed methods and compare them with the state of the art on various real datasets, both with and without missing data, including applications in topic modeling and imaging.


翻译:布尔矩阵分解(BMF)将给定的二元输入矩阵近似表示为两个较小二元因子的乘积。与基于标准算术的二元矩阵分解不同,BMF采用布尔OR和AND运算进行矩阵乘积计算,这提升了可解释性并降低了近似误差。该方法亦应用于角色挖掘与计算机视觉领域。本文首先提出了基于因子矩阵交替优化(AO)的BMF算法,其中每个子问题通过整数规划(IP)求解。随后,我们设计了多种方法以进一步提升基于AO的算法性能,包括从多次运行结果中选取最优的秩一因子子集。为应对基于IP方法在可扩展性上的局限,我们引入了新的贪心与局部搜索启发式算法。同时,我们构建了一种用于布尔向量与矩阵的新型C++数据结构,其速度显著优于现有方案,具有独立研究价值,使得我们的启发式算法能够扩展至大规模数据集。我们在多种真实数据集上(包括含缺失数据与完整数据的情形)展示了所有提出方法的性能,并与当前最优技术进行了对比,涵盖主题建模与图像处理等应用场景。

0
下载
关闭预览

相关内容

Top
微信扫码咨询专知VIP会员