Efficient matrix trace estimation is essential for scalable computation of log-determinants, matrix norms, and distributional divergences. In many large-scale applications, the matrices involved are too large to store or access in full, making even a single matrix-vector (mat-vec) product infeasible. Instead, one often has access only to small subblocks of the matrix or localized matrix-vector products on restricted index sets. Hutch++ achieves optimal convergence rate but relies on randomized SVD and assumes full mat-vec access, making it difficult to apply in these constrained settings. We propose the Block-Orthonormal Stochastic Lanczos Quadrature (BOLT), which matches Hutch++ accuracy with a simpler implementation based on orthonormal block probes and Lanczos iterations. BOLT builds on the Stochastic Lanczos Quadrature (SLQ) framework, which combines random probing with Krylov subspace methods to efficiently approximate traces of matrix functions, and performs better than Hutch++ in near flat-spectrum regimes. To address memory limitations and partial access constraints, we introduce Subblock SLQ, a variant of BOLT that operates only on small principal submatrices. As a result, this framework yields a proxy KL divergence estimator and an efficient method for computing the Wasserstein-2 distance between Gaussians - both compatible with low-memory and partial-access regimes. We provide theoretical guarantees and demonstrate strong empirical performance across a range of high-dimensional settings.
翻译:高效的矩阵迹估计对于对数行列式、矩阵范数及分布散度的可扩展计算至关重要。在许多大规模应用中,所涉及的矩阵规模过大,无法完整存储或访问,甚至单次矩阵-向量(mat-vec)乘积也难以实现。通常只能获取矩阵的小型子块或受限索引集上的局部矩阵-向量乘积。Hutch++算法虽能达到最优收敛速率,但其依赖随机化SVD并假设具备完整的mat-vec访问能力,难以适用于此类受限场景。本文提出块正交随机Lanczos求积法(BOLT),该方法通过基于正交块探针与Lanczos迭代的更简洁实现,达到了与Hutch++相当的精度。BOLT建立在随机Lanczos求积法(SLQ)框架之上,该框架将随机探针与Krylov子空间方法相结合以高效逼近矩阵函数的迹,并在近平坦谱区域表现优于Hutch++。针对内存限制与部分访问约束,我们进一步提出子块SLQ——一种仅操作于小型主子矩阵的BOLT变体。该框架最终推导出代理KL散度估计器及计算高斯分布间Wasserstein-2距离的高效方法,二者均兼容低内存与部分访问场景。我们提供了理论保证,并在多种高维设定中展示了优异的实证性能。