Learning the joint probability of random variables (RVs) is the cornerstone of statistical signal processing and machine learning. However, direct nonparametric estimation for high-dimensional joint probability is in general impossible, due to the curse of dimensionality. Recent work has proposed to recover the joint probability mass function (PMF) of an arbitrary number of RVs from three-dimensional marginals, leveraging the algebraic properties of low-rank tensor decomposition and the (unknown) dependence among the RVs. Nonetheless, accurately estimating three-dimensional marginals can still be costly in terms of sample complexity, affecting the performance of this line of work in practice in the sample-starved regime. Using three-dimensional marginals also involves challenging tensor decomposition problems whose tractability is unclear. This work puts forth a new framework for learning the joint PMF using only pairwise marginals, which naturally enjoys a lower sample complexity relative to the third-order ones. A coupled nonnegative matrix factorization (CNMF) framework is developed, and its joint PMF recovery guarantees under various conditions are analyzed. Our method also features a Gram--Schmidt (GS)-like algorithm that exhibits competitive runtime performance. The algorithm is shown to provably recover the joint PMF up to bounded error in finite iterations, under reasonable conditions. It is also shown that a recently proposed economical expectation maximization (EM) algorithm guarantees to improve upon the GS-like algorithm's output, thereby further lifting up the accuracy and efficiency. Real-data experiments are employed to showcase the effectiveness.
翻译:随机变量(RVs)的共同概率学习是统计信号处理和机器学习的基石。然而,由于维度的诅咒,对高维联合概率的直接非参数估计一般是不可能的。最近的工作提议从三维边缘中回收任意数量的RVs的联概率质量功能(PMF),利用低级高压分解的代数特性和(已知的)RVs之间的依赖性。尽管如此,准确估计三维边点在抽样复杂性方面可能仍然成本高昂,影响抽样-稳定制度中这一实际工作轨迹的准确性。使用三维边点也涉及挑战性强度或分解问题,其可拉动性不明确。这项工作提出了一个新的框架,用于学习联合PMF(PMF),仅使用双向边点,其样本复杂性自然比第三级更低。同时开发了非负面矩阵化框架,并在各种条件下展示了PMFMF联合回收的保证。我们的方法也以类似Gram-S-Schralalal-assalalalalal 和Simal-imal-Silvical imal-lavical-s lavical-lavical-lavical-lavical-lavical-s lavical-s-lavical-s-lavical-lavical-s lavical-lavical-I) 也显示了一种可持续到一个可恢复性性性业绩的最近演算法。它展示了一种稳定性性性业绩,最近演算法。它展示了一种可展示了一种可恢复性平算法。在最近性-SLIMLIMFIMLIMLIMLIMLIMLIMLIMLIMLIMLIFAx。最近演算法。