Building models and methods for complex data is an important task for many scientific and application areas. Many modern datasets exhibit dependencies among observations as well as variables. This gives rise to the challenging problem of analyzing high-dimensional matrix-variate data with unknown dependence structures. To address this challenge, Kalaitzis et. al. (2013) proposed the Bigraphical Lasso (BiGLasso), an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. Subsequently, Greenewald, Zhou and Hero (GZH 2019) introduced a multiway tensor generalization of the BiGLasso estimator, known as the TeraLasso estimator. In this paper, we provide sharp rates of convergence in the Frobenius and operator norm for both BiGLasso and TeraLasso estimators for estimating inverse covariance matrices. This improves upon the rates presented in GZH 2019. In particular, (a) we strengthen the bounds for the relative errors in the operator and Frobenius norm by a factor of approximately $\log p$; (b) Crucially, this improvement allows for finite-sample estimation errors in both norms to be derived for the two-way Kronecker sum model. This closes the gap between the low single-sample error for the two-way model empirically observed in GZH 2019 and the theoretical bounds therein. The two-way regime is particularly significant since it is the setting of common and generic applications in practice. Normality is not needed in our proofs; instead, we consider subgaussian ensembles and derive tight concentration of measure bounds, using tensor unfolding techniques. The proof techniques may be of independent interest to the analysis of tensor-valued data.
翻译:在许多科学和应用领域中,构建复杂数据的模型和方法是一项重要的任务。许多现代数据集展示了观察和变量之间的依赖关系,这就引发了分析具有未知依赖结构的高维矩阵变量数据的挑战性问题。为了解决这个挑战,在2013年Kalaitzis等人提出了Bigraphical Lasso(BiGLasso),一种基于图的笛卡尔积,用于矩阵正态分布的精度矩阵的估计器。随后,Greenewald、Zhou和Hero(GZH 2019)介绍了BiGLasso估计器的多维张量泛化,称为TeraLasso估计器。在本文中,我们为BiGLasso和TeraLasso估计器提供了Frobenius范数和算子范数的锐利收敛率,用于估计逆协方差矩阵。这比GZH 2019中的速率有所提高。特别是,(a)我们将算子范数和Frobenius范数的相对误差的边界提高了约$\log p$的因素;(b)关键是,这种改进允许在两次Kronecker和模型的有限样本估计误差中推导出两个范数。这缩小了GZH 2019中被观察到的两向模型的低单样本误差和理论界之间的差距。因为两向区间在实践中是常见和通用应用的设置,所以具有特别的重要性。我们的证明中不需要正态性。相反,我们考虑亚高斯集合,并使用张量展开技术导出紧密的集中度估计。证明技术可能对张量值数据的分析具有独立的兴趣。