Despite the retrieval effectiveness of queries being mutually independent of one another, the evaluation of query performance prediction (QPP) systems has been carried out by measuring rank correlation over an entire set of queries. Such a listwise approach has a number of disadvantages, notably that it does not support the common requirement of assessing QPP for individual queries. In this paper, we propose a pointwise QPP framework that allows us to evaluate the quality of a QPP system for individual queries by measuring the deviations between each prediction versus the corresponding true value, and then aggregating the results over a set of queries. Our experiments demonstrate that this new approach leads to smaller variances in QPP evaluations across a range of different target metrics and retrieval models.
翻译:尽管查询的检索有效性是相互独立的,但查询性能预测(QPP)系统的评估是通过测量整个查询集上的排名相关性来进行的。这种全局方法有许多缺点,特别是它不支持评估单个查询的 QPP 的常见需求。在本文中,我们提出了一种点评 QPP 框架,通过测量每个预测与相应真实值之间的偏差,然后在一组查询上汇总结果,以允许我们评估 QPP 系统对不同目标度量和检索模型的单个查询的质量。我们的实验证明,这种新方法导致 QPP 评估在一系列不同目标指标和检索模型下具有更小的方差。