Modern scientific studies often collect data sets in the forms of tensors, which call for innovative statistical analysis methods. In particular, there is a pressing need for tensor clustering methods to understand the heterogeneity in the data. We propose a tensor normal mixture model (TNMM) approach to enable probabilistic interpretation and computational tractability. Our statistical model leverages the tensor covariance structure to reduce the number of parameters for parsimonious modeling, and at the same time explicitly exploits the correlations for better variable selection and clustering. We propose a doubly-enhanced expectation-maximization (DEEM) algorithm to perform clustering under this model. Both the E-step and the M-step are carefully tailored for tensor data in order to account for statistical accuracy and computational cost in high dimensions. Theoretical studies confirm that DEEM achieves consistent clustering even when the dimension of each mode of the tensors grows at an exponential rate of the sample size. Numerical studies demonstrate favorable performance of DEEM in comparison to existing methods.
翻译:现代科学研究往往以粒子形式收集数据集,这要求采用创新的统计分析方法,特别是迫切需要采用强集法来理解数据中的异质性。我们建议采用强常态混合模型(TNMM)方法,以便能够进行概率解释和计算可移动性。我们的统计模型利用强变共变结构来减少偏差模型的参数数量,同时明确利用相关因素来更好地选择变量和组合。我们建议采用双倍增强的预期-最大化(DEEM)算法来进行该模型下的组合。E-级和M级均谨慎地为强量数据定制,以计高尺度的统计准确性和计算成本。理论研究证实DEEM即使每种模式的尺寸以抽样大小的指数速度增长,也实现了一致的组合。数量研究显示DEEM相对于现有方法的有利性表现。