In this paper, we develop a method which we call OnlineGCP for computing the Generalized Canonical Polyadic (GCP) tensor decomposition of streaming data. GCP differs from traditional canonical polyadic (CP) tensor decompositions as it allows for arbitrary objective functions which the CP model attempts to minimize. This approach can provide better fits and more interpretable models when the observed tensor data is strongly non-Gaussian. In the streaming case, tensor data is gradually observed over time and the algorithm must incrementally update a GCP factorization with limited access to prior data. In this work, we extend the GCP formalism to the streaming context by deriving a GCP optimization problem to be solved as new tensor data is observed, formulate a tunable history term to balance reconstruction of recently observed data with data observed in the past, develop a scalable solution strategy based on segregated solves using stochastic gradient descent methods, describe a software implementation that provides performance and portability to contemporary CPU and GPU architectures and integrates with Matlab for enhanced useability, and demonstrate the utility and performance of the approach and software on several synthetic and real tensor data sets.
翻译:在本文中,我们开发了一种方法,我们称之为在线GCP,用于计算流数据流的通用卡纳尼科聚合物(GCP)变分解。GCP不同于传统的卡纳尼科多元虫(CP)变分解法,因为它允许任意的客观功能,而CP模型试图将这种功能减少到最低程度。当所观测到的抗拉数据明显不是Gausian数据时,这一方法可以提供更适合和更可解释的模型。在流学中,分母数据逐渐被观测,而算法必须逐步更新GCP因子化,限制获得先前数据的机会。在这项工作中,我们将GCP形式化扩大到流环境,在观测到新的加纳多数据时,得出GCP优化问题,从而解决GCP的最优化问题,制定一个可累积的历史术语,以平衡最近观察到的数据的重建与过去观察到的数据之间的平衡,制定基于使用沙沙梯梯梯梯梯梯梯梯梯梯梯梯梯梯梯梯梯梯梯底分解的可扩展的解决方案,描述一种软件实施,为当代CPU和GPU结构提供性及可移动性能和可移动性,并与Matlab集集集集集相结合,并综合利用性方法,并展示各种高能软件。