均值场黑盒变分推断的近乎维度无关收敛性 (Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference)

We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at a rate that is nearly independent of explicit dimension dependence. Specifically, for a $d$-dimensional strongly log-concave and log-smooth target, the number of iterations for BBVI with a sub-Gaussian family to obtain a solution $\epsilon$-close to the global optimum has a dimension dependence of $\mathrm{O}(\log d)$. This is a significant improvement over the $\mathrm{O}(d)$ dependence of full-rank location-scale families. For heavy-tailed families, we prove a weaker $\mathrm{O}(d^{2/k})$ dependence, where $k$ is the number of finite moments of the family. Additionally, if the Hessian of the target log-density is constant, the complexity is free of any explicit dimension dependence. We also prove that our bound on the gradient variance, which is key to our result, cannot be improved using only spectral bounds on the Hessian of the target log-density.

翻译：我们证明，在给定均值场位置尺度变分族的情况下，使用重参数化梯度的黑盒变分推断（BBVI）的收敛速率几乎不依赖于显式的维度相关性。具体而言，对于一个$d$维强对数凹且对数光滑的目标分布，BBVI采用次高斯变分族获得$\epsilon$接近全局最优解所需的迭代次数，其维度相关性为$\mathrm{O}(\log d)$。这相较于全秩位置尺度变分族的$\mathrm{O}(d)$相关性是一个显著改进。对于重尾变分族，我们证明了一个较弱的$\mathrm{O}(d^{2/k})$相关性，其中$k$是该变分族的有限矩数量。此外，如果目标对数密度的Hessian矩阵是常数，则计算复杂度完全不依赖于任何显式的维度相关性。我们还证明了，我们关于梯度方差的界（这是我们结果的关键）无法仅通过目标对数密度Hessian矩阵的谱界来改进。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日