Moffat recently commented on our previous work. Our work focused on how laying the foundations of our evaluation methodology into the theory of measurement can improve our knowledge and understanding of the evaluation measures we use in IR and how it can shed light on the different types of scales adopted by our evaluation measures; we also provided evidence, through extensive experimentation, on the impact of the different types of scales on the statistical analyses, as well as on the impact of departing from their assumptions. Moreover, we investigated, for the first time in IR, the concept of meaningfulness, i.e. the invariance of the experimental statements and inferences you draw, and proposed it as a way to ensure more valid and generalizabile results. Moffat's comments build on: (i) misconceptions about the representational theory of measurement, such as what an interval scale actually is and what axioms it has to comply with; (ii) they totally miss the central concept of meaningfulness. Therefore, we reply to Moffat's comments by properly framing them in the representational theory of measurement and in the concept of meaningfulness. All in all, we can only reiterate what we said several times: the goal of this research line is to theoretically ground our evaluation methodology - and IR is a field where it is extremely challenging to perform any theoretical advances - in order to aim for more robust and generalizable inferences - something we currently lack in the field. Possibly there are other and better ways to achieve this objective and these proposals could emerge from an open discussion in the field and from the work of others. On the other hand, reducing everything to a contrast on what is (or pretend to be) an interval scale or whether all or none evaluation measures are interval scales may be more a barrier from than a help in progressing towards this goal.
翻译:Moffat最近评论了我们以前的工作。我们的工作侧重于如何将我们评价方法的基础扎根于测量理论,从而增进我们对在IR中使用的评价措施的了解和理解,以及如何通过广泛的实验阐明我们的评价措施所采用的不同规模;我们还通过广泛的实验,就不同类型规模对统计分析的影响以及偏离其假设的影响提供了证据。此外,我们首次在IR中调查了有意义的概念,即对实验性陈述的偏差和你所绘制的推断,并提议将这一概念作为确保我们在IR中使用的评价措施的更有效和普遍化结果的一种方法。 Moffat的评论发展了:(一) 对衡量的表述性理论的误解,例如,不同类型规模对统计分析的影响,以及偏离其假设的影响。此外,我们在IR中第一次调查了有意义的概念,即,我们从衡量和实地概念的角度,不同程度的实验性陈述和推理学概念,提出了改进我们的认识和理解性结果,作为确保更有效和普遍化结果的一种方法。Moffat的评论发展了:(一)关于衡量方法的误解,实际上,多少尺度,从一个方向到其他方法,我们只是一个方向上的一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向是一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向一个方向一个方向,一个方向,一个方向,一个方向一个方向一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个方向,一个