Building models and methods for complex data is an important task for many scientific and application areas. Many modern datasets exhibit dependencies among observations as well as variables. This gives rise to the challenging problem of analyzing high-dimensional matrix-variate data with unknown dependence structures. To address this challenge, Kalaitzis et. al. (2013) proposed the Bigraphical Lasso (BiGLasso), an estimator for precision matrices of matrix-normals based on the Cartesian product of graphs. Subsequently, Greenewald, Zhou and Hero (GZH 2019) introduced a multiway tensor generalization of the BiGLasso estimator, known as the TeraLasso estimator. In this paper, we provide sharp rates of convergence in the Frobenius and operator norm for both BiGLasso and TeraLasso estimators for estimating inverse covariance matrices. This improves upon the rates presented in GZH 2019. In particular, (a) we strengthen the bounds for the relative errors in the operator and Frobenius norm by a factor of approximately $\log p$; (b) Crucially, this improvement allows for finite-sample estimation errors in both norms to be derived for the two-way Kronecker sum model. This closes the gap between the low single-sample error for the two-way model empirically observed in GZH 2019 and the theoretical bounds therein. The two-way regime is particularly significant since it is the setting of common and generic applications in practice. Normality is not needed in our proofs; instead, we consider subgaussian ensembles and derive tight concentration of measure bounds, using tensor unfolding techniques. The proof techniques may be of independent interest to the analysis of tensor-valued data.
翻译:张量图形Lasso估计器的快速收敛率
Translated Abstract:
构建适用于复杂数据的模型和方法是许多科学和应用领域的重要任务。许多现代数据集展现出观测之间以及变量之间的依赖关系。这引出了分析未知依赖结构的高维矩阵变量数据的挑战性问题。为解决这个挑战,Kalaitzis等人(2013)提出了Bigraphical Lasso(BiGLasso)估计器,该估计器基于图形的笛卡尔积,用于矩阵正常分布的精度矩阵。随后,Greenewald,Zhou和Hero(GZH 2019)引入了一种名为TeraLasso估计器的多元张量广义BiGLasso估计器,用于估计逆协方差矩阵。在本文中,我们为BiGLasso和TeraLasso估计器提供Frobenius和算子范数的收敛速率,对于估计逆协方差矩阵来说是更加精确的,这比GZH 2019中的速率更好。特别地,(a)我们将算子和Frobenius范数的相对误差边界加强了约$\log p$的因子;(b)关键是,这种改进允许导出在两方Kronecker和模型中在两种范数下的有限样本估计误差。这填补了GZH 2019中看到的两方模型的低单样本误差和其中的理论界限之间的差异。两方区域特别重要,因为它是实践中常见的通用应用场景。我们的证明不需要正态性;相反,我们考虑子高斯模型,使用张量展开技术导出严密的测度集中边界。这种证明技巧对于张量值数据的分析可能具有独立的兴趣。