Assisting end users to identify desired results from a large dataset is an important problem for multi-criteria decision making. To address this problem, top-k and skyline queries have been widely adopted, but they both have inherent drawbacks, i.e., the user either has to provide a specific utility function or faces many results. The k-regret minimization query is proposed, which integrates the merits of top-k and skyline queries. Due to the NP-hardness of the problem, the k-regret minimization query is time consuming and the greedy framework is widely adopted. However, formal theoretical analysis of the greedy approaches for the quality of the returned results is still lacking. In this paper, we first fill this gap by conducting a nontrivial theoretical analysis of the approximation ratio of the returned results. To speed up query processing, a sampling-based method, StocPreGreed,, is developed to reduce the evaluation cost. In addition, a theoretical analysis of the required sample size is conducted to bound the quality of the returned results. Finally, comprehensive experiments are conducted on both real and synthetic datasets to demonstrate the efficiency and effectiveness of the proposed methods.
翻译:协助终端用户确定大型数据集的预期结果是多标准决策的一个重要问题。为了解决这一问题,已经广泛采纳了顶点和天线查询,但两者都有内在的缺点,即用户必须提供具体的实用功能或面临许多结果。提出了Kregret最小化查询,该查询结合了顶点和天线查询的优点。由于问题NP的难度,Kregret尽量减少查询耗时且广泛采用贪婪框架。然而,仍然缺乏关于对所返回结果的质量的贪婪方法的正式理论分析。在本文中,我们首先通过对所返回结果的近似率进行非三重理论分析来填补这一差距。为了加快查询处理,正在开发一种基于取样的方法,即StocpreGreed,以降低评价费用。此外,对所要求的抽样规模进行了理论分析,以限制所返回结果的质量。最后,对真实和合成数据组进行了全面试验,以显示所提议方法的效率和有效性。