System-oriented IR evaluations are limited to rather abstract understandings of real user behavior. As a solution, simulating user interactions provides a cost-efficient way to support system-oriented experiments with more realistic directives when no interaction logs are available. While there are several user models for simulated clicks or result list interactions, very few attempts have been made towards query simulations, and it has not been investigated if these can reproduce properties of real queries. In this work, we validate simulated user query variants with the help of TREC test collections in reference to real user queries that were made for the corresponding topics. Besides, we introduce a simple yet effective method that gives better reproductions of real queries than the established methods. Our evaluation framework validates the simulations regarding the retrieval performance, reproducibility of topic score distributions, shared task utility, effort and effect, and query term similarity when compared with real user query variants. While the retrieval effectiveness and statistical properties of the topic score distributions as well as economic aspects are close to that of real queries, it is still challenging to simulate exact term matches and later query reformulations.
翻译:系统导向的IR评价仅限于对实际用户行为的抽象理解。作为一种解决方案,模拟用户互动提供了一种成本效率高的方法,支持系统导向的实验,在没有互动日志的情况下提供更现实的指令。虽然模拟点击或结果列表互动有几种用户模型,但模拟查询的尝试很少,如果这些模拟能够复制真实查询的属性,则没有进行调查。在这项工作中,我们借助TREC测试收藏的模拟用户查询变量来验证模拟用户查询变量,并参照对相应主题进行的真正用户查询。此外,我们采用简单而有效的方法,比既定方法更好地复制真实查询。我们的评价框架验证关于检索性能、主题分数分布的可复制性、共享任务效用、努力和效果的模拟,以及与实际用户查询变量相似的查询术语。尽管专题分数分布的检索有效性和统计属性与实际查询的相似,以及经济方面接近于实际查询,但模拟准确术语的匹配和较晚的重新校正仍然具有挑战性。