Large-scale undirected weighted networks are usually found in big data-related research fields. It can naturally be quantified as a symmetric high-dimensional and incomplete (SHDI) matrix for implementing big data analysis tasks. A symmetric non-negative latent-factor-analysis (SNL) model is able to efficiently extract latent factors (LFs) from an SHDI matrix. Yet it relies on a constraint-combination training scheme, which makes it lack flexibility. To address this issue, this paper proposes an unconstrained symmetric nonnegative latent-factor-analysis (USNL) model. Its main idea is two-fold: 1) The output LFs are separated from the decision parameters via integrating a nonnegative mapping function into an SNL model; and 2) Stochastic gradient descent (SGD) is adopted for implementing unconstrained model training along with ensuring the output LFs nonnegativity. Empirical studies on four SHDI matrices generated from real big data applications demonstrate that an USNL model achieves higher prediction accuracy of missing data than an SNL model, as well as highly competitive computational efficiency.
翻译:大型非定向加权网络通常是在与数据有关的大研究领域发现的。 它自然可以量化为用于执行大数据分析任务的对称高维和不完整(SHDI)矩阵。 一个对称非负潜在因素分析(SNL)模型能够有效地从SHDI矩阵中提取潜在因素(LFs)。 但是,它依赖于一个约束组合培训计划,这使它缺乏灵活性。为了解决这个问题,本文件建议采用一个不受约束的对称非负潜在因素分析(USNL)模型。它的主要想法有两个方面:1) 输出LF通过将非负性绘图功能纳入SNL模型而与决定参数分离;2) 采用随机梯度梯度(SGD) 来实施不受限制的模型培训,同时确保产出LFs不具有任意性。关于实际大数据应用产生的4个SHDI矩阵的“经验性”研究表明,USNL模型的缺失数据的预测准确性高于SNL模型,作为高度竞争性的计算。