Queries with similar information needs tend to have similar document clicks, especially in biomedical literature search engines where queries are generally short and top documents account for most of the total clicks. Motivated by this, we present a novel architecture for biomedical literature search, namely Log-Augmented DEnse Retrieval (LADER), which is a simple plug-in module that augments a dense retriever with the click logs retrieved from similar training queries. Specifically, LADER finds both similar documents and queries to the given query by a dense retriever. Then, LADER scores relevant (clicked) documents of similar queries weighted by their similarity to the input query. The final document scores by LADER are the average of (1) the document similarity scores from the dense retriever and (2) the aggregated document scores from the click logs of similar queries. Despite its simplicity, LADER achieves new state-of-the-art (SOTA) performance on TripClick, a recently released benchmark for biomedical literature retrieval. On the frequent (HEAD) queries, LADER largely outperforms the best retrieval model by 39% relative NDCG@10 (0.338 v.s. 0.243). LADER also achieves better performance on the less frequent (TORSO) queries with 11% relative NDCG@10 improvement over the previous SOTA (0.303 v.s. 0.272). On the rare (TAIL) queries where similar queries are scarce, LADER still compares favorably to the previous SOTA method (NDCG@10: 0.310 v.s. 0.295). On all queries, LADER can improve the performance of a dense retriever by 24%-37% relative NDCG@10 while not requiring additional training, and further performance improvement is expected from more logs. Our regression analysis has shown that queries that are more frequent, have higher entropy of query similarity and lower entropy of document similarity, tend to benefit more from log augmentation.
翻译:相似信息需求的查询往往具有相似的文档点击,特别是在生物医学文献搜索引擎中,查询通常很短,前几个文档占大部分总点击量。出于这个动机,我们提出了一种新的生物医学文献搜索体系结构——Log-Augmented Dense Retrieval(LADER),它是一个简单的插件模块,通过查询的点击日志增加密集检索器的性能。具体而言,LADER通过密集检索器查找与给定查询相似的文档和查询。然后,LADER对类似查询的相关(点击)文档进行加权评分,其中权重是它们与输入查询的相似度。LADER的最终文档分数是密集检索器和类似查询的聚合文档分数的平均值。尽管简单,LADER在最近发布的生物医学文献检索基准测试TripClick上实现了最新的最佳性能。在常见的(HEAD)查询上,LADER比最佳检索模型大幅优越,相对NDCG@10提高了39%(0.338比0.243) )。在较不常见的(TORSO)查询上,LADER比之前的SOTA提高了11%的相对NDCG@10(0.303比0.272) )。在相似查询很少的稀有(TAIL)查询上,LADER仍然比以前的SOTA方法表现更好(NDCG@10:0.310比0.295)。在所有查询上,LADER可以在不需要额外培训的情况下将密集检索器的性能提高24%-37%的相对NDCG@10,从更多的日志中预期进一步的性能提升。我们的回归分析表明,较频繁,查询相似度熵较高,文档相似度熵较低的查询倾向于更多地受益于日志增强。