对Moffat关于“在IR评价中发表有意义的声明:绘制跨规模评估措施图”的评论的答复 (Response to Moffat's Comment on "Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales")

Moffat recently commented on our previous work. Our work focused on how laying the foundations of our evaluation methodology into the theory of measurement can improve our knowledge and understanding of the evaluation measures we use in IR and how it can shed light on the different types of scales adopted by our evaluation measures; we also provided evidence, through extensive experimentation, on the impact of the different types of scales on the statistical analyses, as well as on the impact of departing from their assumptions. Moreover, we investigated, for the first time in IR, the concept of meaningfulness, i.e. the invariance of the experimental statements and inferences you draw, and proposed it as a way to ensure more valid and generalizabile results. Moffat's comments build on: (i) misconceptions about the representational theory of measurement, such as what an interval scale actually is and what axioms it has to comply with; (ii) they totally miss the central concept of meaningfulness. Therefore, we reply to Moffat's comments by properly framing them in the representational theory of measurement and in the concept of meaningfulness. All in all, we can only reiterate what we said several times: the goal of this research line is to theoretically ground our evaluation methodology - and IR is a field where it is extremely challenging to perform any theoretical advances - in order to aim for more robust and generalizable inferences - something we currently lack in the field. Possibly there are other and better ways to achieve this objective and these proposals could emerge from an open discussion in the field and from the work of others. On the other hand, reducing everything to a contrast on what is (or pretend to be) an interval scale or whether all or none evaluation measures are interval scales may be more a barrier from than a help in progressing towards this goal.

翻译：Moffat最近评论了我们以前的工作。我们的工作侧重于如何将我们评价方法的基础扎根于测量理论,从而增进我们对在IR中使用的评价措施的了解和理解,以及如何通过广泛的实验阐明我们的评价措施所采用的不同规模;我们还通过广泛的实验,就不同类型规模对统计分析的影响以及偏离其假设的影响提供了证据。此外,我们首次在IR中调查了有意义的概念,即对实验性陈述的偏差和你所绘制的推断,并提议将这一概念作为确保我们在IR中使用的评价措施的更有效和普遍化结果的一种方法。 Moffat的评论发展了:(一) 对衡量的表述性理论的误解,例如,不同类型规模对统计分析的影响,以及偏离其假设的影响。此外,我们在IR中第一次调查了有意义的概念,即,我们从衡量和实地概念的角度,不同程度的实验性陈述和推理学概念,提出了改进我们的认识和理解性结果,作为确保更有效和普遍化结果的一种方法。Moffat的评论发展了:(一)关于衡量方法的误解,实际上,多少尺度,从一个方向到其他方法,我们只是一个方向上的一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向是一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向一个方向一个方向,一个方向,一个方向,一个方向一个方向一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个

相关内容

关注 14

信息检索杂志（IR）为信息检索的广泛领域中的理论、算法分析和实验的发布提供了一个国际论坛。感兴趣的主题包括对应用程序（例如Web，社交和流媒体，推荐系统和文本档案）的搜索、索引、分析和评估。这包括对搜索中人为因素的研究、桥接人工智能和信息检索以及特定领域的搜索应用程序。官网地址：https://dblp.uni-trier.de/db/journals/ir/

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日