Nonnegative matrix factorization (NMF) often relies on the separability condition for tractable algorithm design. Separability-based NMF is mainly handled by two types of approaches, namely, greedy pursuit and convex programming. A notable convex NMF formulation is the so-called self-dictionary multiple measurement vectors (SD-MMV), which can work without knowing the matrix rank a priori, and is arguably more resilient to error propagation relative to greedy pursuit. However, convex SD-MMV renders a large memory cost that scales quadratically with the problem size. This memory challenge has been around for a decade, and a major obstacle for applying convex SD-MMV to big data analytics. This work proposes a memory-efficient algorithm for convex SD-MMV. Our algorithm capitalizes on the special update rules of a classic algorithm from the 1950s, namely, the Frank-Wolfe (FW) algorithm. It is shown that, under reasonable conditions, the FW algorithm solves the noisy SD-MMV problem with a memory cost that grows linearly with the amount of data. To handle noisier scenarios, a smoothed group sparsity regularizer is proposed to improve robustness while maintaining the low memory footprint with guarantees. The proposed approach presents the first linear memory complexity algorithmic framework for convex SD-MMV based NMF. The method is tested over a couple of unsupervised learning tasks, i.e., text mining and community detection, to showcase its effectiveness and memory efficiency.
翻译:非负式矩阵因子化(NMF)往往依赖于可移植算法设计所需的分离性条件(NMF) 。 以分离性为基础的NMF主要通过两种方法处理,即贪婪的追逐和 convex 编程。 一种显著的NMF 配方是所谓的自典多度测量矢量(SD-MMV),它可以在不了解矩阵的先行等级的情况下发挥作用,而且可以说比贪婪的追逐更能适应错误的传播。 然而, comvex SD- MMV 算法使得存储成本高,可以与问题大小相交替。 这种记忆性挑战已经存在了十年之久,而且对于对大数据分析应用 convex SD- MMV 是一个重大障碍。 这项工作提出了一种所谓的自定义高效的多度量的多度测量矢量计算矢量计算(SDMF- MMV ) 。 我们的算法利用了1950年代经典算法的特殊更新规则,即弗兰克- Wolfe(FW) 算法。 显示, FW 的算法在合理的条件下解决SDMD- MMD- MD- MMV 的存储问题, 的比重的比重的比重的比重的比重的比重的比重的比重, 。 以直线性成本成本成本以直线性成本成本化的计算法逐渐化的计算法, 以直线性成本化的计算法逐渐递算法逐渐递算法以直径递算法以不断进率性计算法以不断进率增长式计算法,, 以正常式计算法则以正常式计算法, 以正常的缩法则以SMFMFMFMFMFMFMFMFMFMFMFMF 。 。 质化的缩法 格式化的缩法质化的缩法质化法 和SDMFMDMFMFMFMF 和平质化的缩算法, 和平的缩算法 和平质化的递算法 以SDMFMFMFMFIFIFIFMFIFI 质化法 格式化法