k-Regret 最小化套件的高效算法 (Efficient Algorithms for k-Regret Minimizing Sets)

A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is the inner product of the item's attributes with a user's weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that k-regret minimization is NP-Complete for all dimensions d >= 3. This settles an open problem from Chester et al. [VLDB 2014], and resolves the complexity status of the problem for all d: the problem is known to have polynomial-time solution for d <= 2. In addition, we propose two new approximation schemes for regret minimization, both with provable guarantees, one based on coresets and another based on hitting sets. We also carry out extensive experimental evaluation, and show that our schemes compute regret-minimizing sets comparable in size to the greedy algorithm proposed in [VLDB 14] but our schemes are significantly faster and scalable to large data sets.

翻译：遗憾最小化设定 Q 是一个规模小得多的数据库 P 的缩略图, 使用户在 Q 返回答题中询问其分数不比完整数据集差得多。特别是, kregret 最小化套件具有以下属性: Q 中顶层-1 项得分与 P 中顶端- k 项得分之间的遗憾率最小化, 其中项目的得分是项目属性的内产物, 其重量( 参照) 矢量( 参考) 。问题在于如何在 Q 返回答题中找到一个单一的代表数据集 Q, 其遗憾率相对于所有可能的用户重量矢量矢量而言并不小。我们显示 kregret 最小化是所有维度的 NP- + 3 。这解决了切斯特等人的未解决的问题 [ VLDB 2014], 并解决了所有 d 问题的复杂性: 问题已知的是, d 问题有可多重- Q 。 2. 此外, 我们提出两个新的“ 最遗憾最小化” 计划,,, 既基于核心的保证, 和可比的亚缩化系统, 我们的实验性数据库也显示大规模。

相关内容

VLDB

关注 18

VLDB是面向数据管理和数据库研究人员、供应商、从业人员、应用程序开发人员等用户的重要国际年度论坛。VLDB 2019会议将以研究报告，教程，演示和研讨会为特色。由于它们是21世纪新兴应用程序的技术基石，因此它将涵盖数据管理，数据库和信息系统研究中的问题。官网地址：http://dblp.uni-trier.de/db/conf/vldb/

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【TPAMI2020】目标检测中的不平衡问题:综述论文，34页pdf

专知会员服务

55+阅读 · 2020年3月16日