T2Ranking：一份大规模的中文段落排序基准 (T2Ranking: A large-scale Chinese Benchmark for Passage Ranking)

from arxiv, This Resource paper has been accepted by the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/

翻译：段落排序包括两个阶段：段落检索和段落重新排序，对于信息检索(IR)领域的学术界和工业界来说，这两个主题都是重要且具有挑战性的。然而，常用的段落排序数据集通常集中在英语中。对于非英语场景，如汉语，现有数据集在数据规模、精细的关联注释和假阴性问题方面都受到限制。为解决这个问题，我们推出了T2Ranking，一份大规模的中文段落排序基准。包含超过30万个查询和超过200万个来自真实搜索引擎的独特段落。我们聘请专家注释人员为查询-段落对提供4个级别的分级关联分数（精细的），而不是二元关联判断（粗略的）。为了缓解假阴性问题，我们在执行关联注释时，特别是在测试集中，考虑更多的具有更高差异性的段落，以确保更精确的评估。除文本查询和段落数据外，还提供其他辅助资源，例如查询类型和段落生成文档的XML文件，以便进一步研究。为了评估数据集，实现了通常用于排名的模型，并在T2Ranking上作为基线进行测试。实验结果表明，T2Ranking具有挑战性，并且仍有改进空间。完整数据和所有代码可在https://github.com/THUIR/T2Ranking/上进行下载。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【CIKM2021】用户行为序列对比学习的上下文感知文档排序

专知会员服务

20+阅读 · 2021年8月30日

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日