We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an information theoretic limit for all these algorithms. We show that for every algorithm, there exists a well-conditioned strongly log-concave target density for which the distribution of points generated by the algorithm would be at least $\varepsilon$ away from the target in total variation distance if the number of gradient queries is less than $\Omega(\sigma^2 d/\varepsilon^2)$, where $\sigma^2 d$ is the variance of the stochastic gradient. Our lower bound follows by combining the ideas of Le Cam deficiency routinely used in the comparison of statistical experiments along with standard information theoretic tools used in lower bounding Bayes risk functions. To the best of our knowledge our results provide the first nontrivial dimension-dependent lower bound for this problem.
翻译:我们从强烈的对数密度 $mathbb{R ⁇ d$ 中考虑取样问题,并证明对所需日志密度的随机梯度查询数量的信息理论约束较低。一些流行的抽样算法(包括许多Markov链 Monte Carlo 方法)使用日志密度的随机梯度生成样本;我们的结果为所有这些算法设定了一个信息理论限制。我们显示,对于每一种算法来说,都有一种条件完善的强烈对数目标密度,为此,如果梯度查询数量低于$\Omega(gma_2 d/\varepsilon2美元),则算法生成的点的分布将至少离目标完全变异距离,如果梯度查询数量低于$\Omega(gma_2 d/\varepsilon%2美元),则使用美元作为所有这些算法梯度的差异。我们较低的界限是把统计实验中常用的勒卡姆缺陷概念与标准信息工具结合起来,在降低巴雅斯低风险功能中使用的测算工具结合起来。我们最可靠的知识层面提供了我们最可靠的结果。