Uncertainty arises naturally inmany application domains due to, e.g., data entry errors and ambiguity in data cleaning. Prior work in incomplete and probabilistic databases has investigated the semantics and efficient evaluation of ranking and top-k queries over uncertain data. However, most approaches deal with top-k and ranking in isolation and do represent uncertain input data and query results using separate, incompatible datamodels. We present an efficient approach for under- and over-approximating results of ranking, top-k, and window queries over uncertain data. Our approach integrates well with existing techniques for querying uncertain data, is efficient, and is to the best of our knowledge the first to support windowed aggregation. We design algorithms for physical operators for uncertain sorting and windowed aggregation, and implement them in PostgreSQL.We evaluated our approach on synthetic and real world datasets, demonstrating that it outperforms all competitors, and often produces more accurate results.
翻译:由于数据输入错误和数据清理的模糊性等原因,不确定性自然产生许多应用领域。在不完整和概率性数据库中,以往的工作已经调查了语义学和对关于不确定数据的排名和头等查询的高效评估。然而,大多数方法都单独处理头等和排位问题,并且确实代表了使用不同、不兼容的数据模型的不确定输入数据和查询结果。我们提出了一种有效的方法,用于对不确定数据进行排位、头等和窗口查询的低和过份结果。我们的方法与现有查询不确定数据的技术融为一体,是高效的,并且对我们的最佳了解是首先支持窗口集成的。我们设计了实物操作者的算法,用于不确定的排序和窗口集成,并在PostgreSQL中加以实施。我们评估了我们关于合成和真实世界数据集的方法,表明它超越了所有竞争者,并且往往产生更准确的结果。