The cumulative empirical spectral measure (CESM) $\Phi[\mathbf{A}] : \mathbb{R} \to [0,1]$ of a $n\times n$ symmetric matrix $\mathbf{A}$ is defined as the fraction of eigenvalues of $\mathbf{A}$ less than a given threshold, i.e., $\Phi[\mathbf{A}](x) := \sum_{i=1}^{n} \frac{1}{n} {\large\unicode{x1D7D9}}[ \lambda_i[\mathbf{A}]\leq x]$. Spectral sums $\operatorname{tr}(f[\mathbf{A}])$ can be computed as the Riemann--Stieltjes integral of $f$ against $\Phi[\mathbf{A}]$, so the task of estimating CESM arises frequently in a number of applications, including machine learning. We present an error analysis for stochastic Lanczos quadrature (SLQ). We show that SLQ obtains an approximation to the CESM within a Wasserstein distance of $t \: | \lambda_{\text{max}}[\mathbf{A}] - \lambda_{\text{min}}[\mathbf{A}] |$ with probability at least $1-\eta$, by applying the Lanczos algorithm for $\lceil 12 t^{-1} + \frac{1}{2} \rceil$ iterations to $\lceil 4 ( n+2 )^{-1}t^{-2} \ln(2n\eta^{-1}) \rceil$ vectors sampled independently and uniformly from the unit sphere. We additionally provide (matrix-dependent) a posteriori error bounds for the Wasserstein and Kolmogorov--Smirnov distances between the output of this algorithm and the true CESM. The quality of our bounds is demonstrated using numerical experiments.
翻译:累积的经验光谱量度 (CESM) $[mathbfn{mathbf{A}] :\ mathbb{R} 至[0,1美元] 美元乘以數據基值的分數, 即 $\\ mathbf{A} 美元[mathbf{A}] (x) =====sum ⁇ i=1} 离子 [mathb{R} {R} 至[0] :\mathb} 數據光谱度[0] 數據光量[0]x1D9} [lamda_i] [mathbbff{A}\leqx] 美元乘以數據基值的分數量, 可以用數據數據數據數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數數