This work describes our two approaches for the background linking task of TREC 2020 News Track. The main objective of this task is to recommend a list of relevant articles that the reader should refer to in order to understand the context and gain background information of the query article. Our first approach focuses on building an effective search query by combining weighted keywords extracted from the query document and uses BM25 for retrieval. The second approach leverages the capability of SBERT (Nils Reimers et al.) to learn contextual representations of the query in order to perform semantic search over the corpus. We empirically show that employing a language model benefits our approach in understanding the context as well as the background of the query article. The proposed approaches are evaluated on the TREC 2018 Washington Post dataset and our best model outperforms the TREC median as well as the highest scoring model of 2018 in terms of the nDCG@5 metric. We further propose a diversity measure to evaluate the effectiveness of the various approaches in retrieving a diverse set of documents. This would potentially motivate researchers to work on introducing diversity in their recommended list. We have open sourced our implementation on Github and plan to submit our runs for the background linking task in TREC 2020.
翻译:这项工作描述了我们为TREC 2020 New Trace的背景联系任务而采用的两种方法。这一任务的主要目的是建议一份读者应当参考的有关文章清单,以便理解背景和获取查询文章的背景资料。我们的第一个方法侧重于通过将查询文件的加权关键词合并来建立有效的搜索查询,并使用BM25 进行检索。第二个方法利用SBERT(Nils Reimers等人)的能力来了解查询的背景描述,以便进行对文体的语义搜索。我们的经验显示,使用一种语言模型有利于我们了解背景和查询文章的背景。在TREC 2018 华盛顿邮报数据集上评估了拟议方法,我们的最佳模型超越了查询文件的中位数,以及2018年NDCG@5衡量的最高评分模型。我们进一步提议了一种多样性措施,以评价各种方法在检索各种文件方面的有效性。这可能会激励研究人员在推荐的清单中介绍多样性的工作。我们已公开地将2020年的Github任务执行计划与我们的背景链接。