Maximum Inner Product Search (MIPS) is a ubiquitous task in machine learning applications such as recommendation systems. Given a query vector and $n$ atom vectors in $d$-dimensional space, the goal of MIPS is to find the atom that has the highest inner product with the query vector. Existing MIPS algorithms scale at least as $O(\sqrt{d})$, which becomes computationally prohibitive in high-dimensional settings. In this work, we present BanditMIPS, a novel randomized MIPS algorithm whose complexity is independent of $d$. BanditMIPS estimates the inner product for each atom by subsampling coordinates and adaptively evaluates more coordinates for more promising atoms. The specific adaptive sampling strategy is motivated by multi-armed bandits. We provide theoretical guarantees that BanditMIPS returns the correct answer with high probability, while improving the complexity in $d$ from $O(\sqrt{d})$ to $O(1)$. We also perform experiments on four synthetic and real-world datasets and demonstrate that BanditMIPS outperforms prior state-of-the-art algorithms. For example, in the Movie Lens dataset ($n$=4,000, $d$=6,000), BanditMIPS is 20$\times$ faster than the next best algorithm while returning the same answer. BanditMIPS requires no preprocessing of the data and includes a hyperparameter that practitioners may use to trade off accuracy and runtime. We also propose a variant of our algorithm, named BanditMIPS-$\alpha$, which achieves further speedups by employing non-uniform sampling across coordinates. Finally, we demonstrate how known preprocessing techniques can be used to further accelerate BanditMIPS, and discuss applications to Matching Pursuit and Fourier analysis.
翻译:最大产品搜索( MIPS) 是像建议系统这样的机器学习应用中一个无处不在的任务 。 如果在 $d美元 的 空间里有查询矢量和 $n 原子矢量, MIPS 的目标是找到与查询矢量具有最高内产物的原子。 现有的 MIPS 算法规模至少为 $O( sqrt{d}) 美元, 在高维环境中, 它在计算上变得令人难以接受。 在这项工作中, 我们提出了 BanditMIPS, 其复杂性独立于美元。 BanditMIPS 估计每个原子的内产物的精度, 以子扫描坐标为坐标, 并适应性地评估更多关于更有希望的原子的坐标。 具体的适应性取样战略是多武装匪的动机。 我们提供理论保证 BanditMIPS 返回正确答案的概率至少是 $O (\ srt{d} 美元, 美元到 $O(1)美元 。 我们还在四个合成和真实的运行中进行实验, 并且显示 BandMIPS IMPS 之前的 数据分析需要 20MIS 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元, 美元 美元, 美元 美元 美元 美元 美元, 美元 美元 美元 美元 美元, 美元, 美元是 美元 美元 美元 美元 。