Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called `seed studies', prior to query formulation. Seed studies help verify the effectiveness of a query prior to the full assessment of retrieved studies. Beyond this use of seeds, specific IR methods can exploit seed studies for guiding both automatic query formulation and new retrieval models. One major limitation of work to date is that these methods exploit `pseudo seed studies' through retrospective use of included studies (i.e., relevance assessments). However, we show pseudo seed studies are not representative of real seed studies used by information specialists. Hence, we provide a test collection with real world seed studies used to assist with the formulation of queries. To support our collection, we provide an analysis, previously not possible, on how seed studies impact retrieval and perform several experiments using seed-study based methods to compare the effectiveness of using seed studies versus pseudo seed studies. We make our test collection and the results of all of our experiments and analysis available at http://github.com/ielab/sysrev-seed-collection
翻译:由经过培训的信息专家完成的医学系统审查询问的配方是一项非常复杂的任务。复杂性来自对长期的布林询问的依赖,它表达了一个详细的研究问题。为了协助询问的配方,信息专家在配方之前使用一套称为“种子研究”的示范性文件。种子研究有助于在全面评估检索的研究之前核实查询的有效性。除了这种种子使用外,具体的IR方法可以利用种子研究来指导自动查询的配方和新的检索模式。迄今为止,一项主要的工作限制是,这些方法利用“假种种子研究”,追溯利用包括的研究(即相关性评估)来进行“假种种子研究”。然而,我们显示假种研究并不代表信息专家使用的真正种子研究。因此,我们用真实的世界种子研究来进行试验收集,用来协助拟订查询。为了支持我们的收集,我们以前不可能提供一种分析,即种子研究如何影响检索和进行几项实验,利用基于种子研究的方法比较使用种子研究与假种种子研究的效力。我们在http://giev/searvcom提供我们所有实验和分析的结果和结果。