CSFCube -- -- 计算机科学研究文章测试集,供面对面查询,按示例分列 (CSFCube -- A Test Collection of Computer Science Research Articles for Faceted Query by Example)

Query by Example is a well-known information retrieval task in which a document is chosen by the user as the search query and the goal is to retrieve relevant documents from a large collection. However, a document often covers multiple aspects of a topic. To address this scenario we introduce the task of faceted Query by Example in which users can also specify a finer grained aspect in addition to the input query document. We focus on the application of this task in scientific literature search. We envision models which are able to retrieve scientific papers analogous to a query scientific paper along specifically chosen rhetorical structure elements as one solution to this problem. In this work, the rhetorical structure elements, which we refer to as facets, indicate "background", "method", or "result" aspects of a scientific paper. We introduce and describe an expert annotated test collection to evaluate models trained to perform this task. Our test collection consists of a diverse set of 50 query documents, drawn from computational linguistics and machine learning venues. We carefully followed the annotation guideline used by TREC for depth-k pooling (k = 100 or 250) and the resulting data collection consists of graded relevance scores with high annotation agreement. The data is freely available for research purposes.

翻译：以例查询是一个众所周知的信息检索任务,其中用户选择了一份文件,作为搜索查询,目标是从大型收藏中检索相关文件。然而,文件通常涵盖一个专题的多个方面。为了应对这一假设,我们引入了面对面查询的示例任务,其中用户还可以在输入查询文件之外指定一个细微的细微的方面。我们侧重于在科学文献搜索中应用这一任务。我们设想了一些模型,这些模型能够检索科学论文,类似于查询的科学论文,并附有具体选择的理论结构要素,作为解决这个问题的一种解决办法。在这项工作中,我们称之为方方面面的文字结构要素,表示科学论文的“背景”、“方法”或“结果”方面。我们介绍并描述专家的附加说明的测试收藏,以评价经过培训完成这项任务的模型。我们的测试收藏由一套不同的50个查询文件组成,从计算语言和机器学习地点提取。我们仔细遵循了TREC用于深度聚合的注释准则(k=100或250),由此产生的数据结构要素构成可自由获取的高级相关数据。