The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply this technique to solve the problem of high-dimensional $k$-nearest neighbors, developing an algorithm which we prove is able to identify exact nearest neighbors with high probability. We show that under regularity assumptions on a dataset of $n$ points in $d$-dimensional space, the complexity of our algorithm scales logarithmically with the dimension of the data as $O\left((n+d)\log^2 \left(\frac{nd}{\delta}\right)\right)$ for error probability $\delta$, rather than linearly as in exact computation requiring $O(nd)$. We corroborate our theoretical results with numerical simulations, showing that our algorithm outperforms both exact computation and state-of-the-art algorithms such as kGraph, NGT, and LSH on real datasets.
翻译:著名的蒙特卡洛方法通过随机抽样估计了昂贵至计算的数量。 以Bandit为基础的蒙特卡洛优化是一种一般技术,通过适应性随机抽样计算许多此类昂贵至计算数量的最小值。 该技术将优化问题转换成统计估计问题, 然后通过多武装匪徒解决。 我们应用这一技术来解决高维$k$-最近邻居的问题, 开发一种算法, 我们证明它能够以高概率确定最接近的邻居。 我们用数字模拟来证实我们的理论结果, 我们用数字模拟来证实我们的算法在美元- 维空间中以美元计的数据集中, 我们算法尺度的复杂度与数据维度的维度如 $O\left(n+d)\log%2\ left( frac{nd-delta_right)\right) 差( $ ddelta$, 而不是精确计算需要$O(n)$的线性。 我们用数字模拟来证实我们的理论结果, 显示我们的算法在KGragra、 NGT和LGTH等真实数据上的精确计算和状态算法上, 。