Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very recently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an efficient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the following. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$ is a prime, and an integer $r$, our objective is to find a matrix $B$ over the same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix Factorization.
翻译:布尔母体质化( BMF ) 旨在寻找一个特定二进制矩阵的近似值, 因为它是两个低级二进制基体的布尔产物。 二进制数据在许多领域普遍存在, 以二进制基体表示的数据在医学、 自然语言处理、 生物信息学、 计算机图形中很常见。 不幸的是, BMF 计算起来很困难, 并且使用超脂的算法来计算布尔因子化。 最近, 理论突破是由两个研究小组独立获得的。 Ban et al. (SODA 2019) 和 Fomin et al. ( Trans. Algorithms 2020) 显示, BMF 承认一个高效的多元时间接近计划( EPTMFTAS ) 。 然而, 尽管理论重要性很大, 运行时间高度的双倍依赖使得这些算法在实践中无法执行。 激励我们工作的主要研究问题是 BMF 的理论进步是否能导致实际的算法。 我们的主要概念贡献在于以下。 EPTAS- AL $ 美元 ALalalalalalalalalalalalalalalalalalalalalalal ex exalal ex us ex us exalation 。