Unfortunately, the official English (sub)task results reported in the NTCIR-14 WWW-2, NTCIR-15 WWW-3, and NTCIR-16 WWW-4 overview papers are incorrect due to noise in the official qrels files; this paper reports results based on the corrected qrels files. The noise is due to a fatal bug in the backend of our relevance assessment interface. More specifically, at WWW-2, WWW-3, and WWW-4, two versions of pool files were created for each English topic: a PRI ("prioritised") file, which uses the NTCIRPOOL script to prioritise likely relevant documents, and a RND ("randomised") file, which randomises the pooled documents. This was done for the purpose of studying the effect of document ordering for relevance assessors. However, the programmer who wrote the interface backend assumed that a combination of a topic ID and a document rank in the pool file uniquely determines a document ID; this is obviously incorrect as we have two versions of pool files. The outcome is that all the PRI-based relevance labels for the WWW-2 test collection are incorrect (while all the RND-based relevance labels are correct), and all the RND-based relevance labels for the WWW-3 and WWW-4 test collections are incorrect (while all the PRI-based relevance labels are correct). This bug was finally discovered at the NTCIR-16 WWW-4 task when the first seven authors of this paper served as Gold assessors (i.e., topic creators who define what is relevant) and closely examined the disagreements with Bronze assessors (i.e., non-topic-creators; non-experts). We would like to apologise to the WWW participants and the NTCIR chairs for the inconvenience and confusion caused due to this bug.
翻译:不幸的是,在NTCIR-14 WWWW-2、NTCIR-15 WWW-3和NTCIR-16 WWW-4概览文件中报告的官方英文(子)任务结果不正确,因为正式的qrels文件有噪音;本文根据更正的qrels文件报告结果;由于我们相关评估接口后端有一个致命的错误,因此出现噪音。更具体地说,WWWWT-2、WWWWW-3和WWWWT-4,为每个英语主题创建了两个版本的集合文件:一个PR(优先化的)文件,它使用NTCIRPOOL脚本来优先处理可能相关的文件,以及一个RND(随机化的)评估集合文件的档案。这是为了研究相关评估者订购文件的效果。然而,在WWWWWT- 3 IMF 文件中,我们将一个主题ID和一个文件级的组合确定一个文件ID;这显然是不正确的,因为我们有两种版本的集合文件档案文件。结果,所有基于IP的标签标签的关联性标签与RWWWWW- 3 IM 测试的不相关,最后的序列与R- talder 和所有RV 都与R- test 。