We introduce kernel thinning, a simple algorithm for generating better-than-Monte-Carlo approximations to distributions $\mathbb{P}$ on $\mathbb{R}^d$. Given $n$ input points, a suitable reproducing kernel $\mathbf{k}$, and $\mathcal{O}(n^2)$ time, kernel thinning returns $\sqrt{n}$ points with comparable integration error for every function in the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-\frac{1}{2}}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} \sqrt{(\log n)^{d+1}\log\log n})$ for sub-exponential $\mathbb{P}$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-\frac14})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning.
翻译:我们引入了内核稀释值, 这是一种简单的算法, 用来生成比Monte{ Carlo} 近似值更好的 $\ mathbb{R ⁇ d$ 。 考虑到 $n$ 输入点, 合适的复制内核$\ mathbf{k} 美元, 以及$\ mathcal{O} (n\2) 美元, 内核稀释值返回 $\ sqrt{ $ 的相关正统复制内核空间中每个函数的可比整合错误 。 由于概率很高, 整合错误的最大差值是$\ mathbb{ Od_ p} (n\\\\ flamtalcal=1} macrealcal=lickral=lickral=lice@ blickral_ blickral_ pal_ blickral_lickral_ blickral_ pal_ bal_ pal_ bral_ bral_ bral_ pal_ bal_ bal_ pal_ bal_ ladeal_ bral_ bal_ bral_ bal_ bal_ bal_ ladeal_ ladeal_ a lexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx