We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at a rate that is nearly independent of explicit dimension dependence. Specifically, for a $d$-dimensional strongly log-concave and log-smooth target, the number of iterations for BBVI with a sub-Gaussian family to obtain a solution $\epsilon$-close to the global optimum has a dimension dependence of $\mathrm{O}(\log d)$. This is a significant improvement over the $\mathrm{O}(d)$ dependence of full-rank location-scale families. For heavy-tailed families, we prove a weaker $\mathrm{O}(d^{2/k})$ dependence, where $k$ is the number of finite moments of the family. Additionally, if the Hessian of the target log-density is constant, the complexity is free of any explicit dimension dependence. We also prove that our bound on the gradient variance, which is key to our result, cannot be improved using only spectral bounds on the Hessian of the target log-density.
翻译:我们证明,在给定均值场位置尺度变分族的情况下,使用重参数化梯度的黑盒变分推断(BBVI)的收敛速率几乎不依赖于显式的维度相关性。具体而言,对于一个$d$维强对数凹且对数光滑的目标分布,BBVI采用次高斯变分族获得$\epsilon$接近全局最优解所需的迭代次数,其维度相关性为$\mathrm{O}(\log d)$。这相较于全秩位置尺度变分族的$\mathrm{O}(d)$相关性是一个显著改进。对于重尾变分族,我们证明了一个较弱的$\mathrm{O}(d^{2/k})$相关性,其中$k$是该变分族的有限矩数量。此外,如果目标对数密度的Hessian矩阵是常数,则计算复杂度完全不依赖于任何显式的维度相关性。我们还证明了,我们关于梯度方差的界(这是我们结果的关键)无法仅通过目标对数密度Hessian矩阵的谱界来改进。