A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is the inner product of the item's attributes with a user's weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that k-regret minimization is NP-Complete for all dimensions d >= 3. This settles an open problem from Chester et al. [VLDB 2014], and resolves the complexity status of the problem for all d: the problem is known to have polynomial-time solution for d <= 2. In addition, we propose two new approximation schemes for regret minimization, both with provable guarantees, one based on coresets and another based on hitting sets. We also carry out extensive experimental evaluation, and show that our schemes compute regret-minimizing sets comparable in size to the greedy algorithm proposed in [VLDB 14] but our schemes are significantly faster and scalable to large data sets.
翻译:遗憾最小化设定 Q 是一个规模小得多的数据库 P 的缩略图, 使用户在 Q 返回答题中询问其分数不比完整数据集差得多。 特别是, kregret 最小化套件具有以下属性: Q 中顶层-1 项得分与 P 中顶端- k 项得分之间的遗憾率最小化, 其中项目的得分是项目属性的内产物, 其重量( 参照) 矢量( 参考) 。 问题在于如何在 Q 返回答题中找到一个单一的代表数据集 Q, 其遗憾率相对于所有可能的用户重量矢量矢量而言并不小。 我们显示 kregret 最小化是所有维度的 NP- + 3 。 这解决了切斯特 等人 的未解决的问题 [ VLDB 2014], 并解决了所有 d 问题的复杂性: 问题已知的是, d 问题有可多重- Q 。 2. 此外, 我们提出两个新的“ 最遗憾最小化” 计划,,, 既基于核心的保证, 和可比的 亚缩化 系统, 我们的实验性数据库 也显示大规模 。