Top-k maximum inner product search (MIPS) is a central task in many machine learning applications. This paper extends top-k MIPS with a budgeted setting, that asks for the best approximate top-k MIPS given a limit of B computational operations. We investigate recent advanced sampling algorithms, including wedge and diamond sampling to solve it. Though the design of these sampling schemes naturally supports budgeted top-k MIPS, they suffer from the linear cost from scanning all data points to retrieve top-k results and the performance degradation for handling negative inputs. This paper makes two main contributions. First, we show that diamond sampling is essentially a combination between wedge sampling and basic sampling for top-k MIPS. Our theoretical analysis and empirical evaluation show that wedge is competitive (often superior) to diamond on approximating top-k MIPS regarding both efficiency and accuracy. Second, we propose a series of algorithmic engineering techniques to deploy wedge sampling on budgeted top-k MIPS. Our novel deterministic wedge-based algorithm runs significantly faster than the state-of-the-art methods for budgeted and exact top-k MIPS while maintaining the top-5 precision at least 80% on standard recommender system data sets.
翻译:顶部最大内部产品搜索( MIPS) 在许多机器学习应用程序中是一项核心任务。 本文扩展了顶部MIPS, 并设定了预算设置, 要求使用最接近最接近的顶部MIPS, 并限制B计算作业。 我们调查了最近先进的取样算法, 包括 wedge 和 钻石取样方法, 以解决这个问题。 虽然这些取样方法的设计自然支持了预算编列的最高一级MIPS, 但是在扫描所有数据点以检索最高一级结果和处理负投入的性能退化方面, 它们的线性成本是线性成本。 本文做出了两个主要贡献。 首先, 我们显示钻石取样基本上是将上部MIPS的混合取样和基本取样结合起来。 我们的理论分析和经验评估表明, 边部在效率及准确性上对顶层MIPS 的相似性能( 通常优于) 。 其次, 我们提出了一系列算法工程技术, 以在预算最高一级MIPS 上部署混合取样。 我们的新型确定性边基算算法比预算及精确度最低80 标准 。