不同基因数据标准化和平滑的双重核心承温系数化 (Regularized and Smooth Double Core Tensor Factorization for Heterogeneous Data)

We introduce a general tensor model suitable for data analytic tasks for {\em heterogeneous} datasets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups. To capture such complex structures, a double core tensor (DCOT) factorization model is introduced together with a family of smoothing loss functions. By leveraging the proposed smoothing function, the model accurately estimates the model factors, even in the presence of missing entries. A linearized ADMM method is employed to solve regularized versions of DCOT factorizations, that avoid large tensor operations and large memory storage requirements. Further, we establish theoretically its global convergence, together with consistency of the estimates of the model parameters. The effectiveness of the DCOT model is illustrated on several real-world examples including image completion, recommender systems, subspace clustering and detecting modules in heterogeneous Omics multi-modal data, since it provides more insightful decompositions than conventional tensor methods.

翻译：我们引入了一种适用于 {em different} 数据集数据分析任务的一般抗拉模型,其中观测组内部有联合的低级结构,但不同组别之间也有歧视性结构。为了捕捉这种复杂的结构,我们引入了双核心的抗拉(DCOT)因子化模型,同时引入了平滑损失功能的组合。模型利用拟议的平滑功能,准确估计了模型要素,即使缺少条目。使用线性ADMM方法解决常规化的DCOT因子化,以避免大型的抗拉操作和大型的存储要求。此外,我们从理论上建立了全球趋同,同时确定了模型参数的估计数的一致性。DCOT模型的有效性在几个真实世界的例子中得到了说明,其中包括图像完成、推荐系统、子空间组合和多模式数据中的各种模型的探测模块,因为它提供的解析能力比常规的多式方法要高得多。