The enormous growth of research publications has made it challenging for academic search engines to bring the most relevant papers against the given search query. Numerous solutions have been proposed over the years to improve the effectiveness of academic search, including exploiting query expansion and citation analysis. Query expansion techniques mitigate the mismatch between the language used in a query and indexed documents. However, these techniques can suffer from introducing non-relevant information while expanding the original query. Recently, contextualized model BERT to document retrieval has been quite successful in query expansion. Motivated by such issues and inspired by the success of BERT, this paper proposes a novel approach called QeBERT. QeBERT exploits BERT-based embedding and Citation Network Analysis (CNA) in query expansion for improving scholarly search. Specifically, we use the context-aware BERT-embedding and CNA for query expansion in Pseudo-Relevance Feedback (PRF) fash-ion. Initial experimental results on the ACL dataset show that BERT-embedding can provide a valuable augmentation to query expansion and improve search relevance when combined with CNA.
翻译:研究出版物的巨大增长使得学术搜索引擎难以找到针对特定搜索查询的最相关文件。多年来,为了提高学术搜索的有效性,提出了许多解决办法,包括利用查询扩展和引证分析。查询扩展技术缓解了查询和索引化文件中所用语言之间的不匹配。然而,这些技术可能因在扩大原始查询的同时引入非相关信息而受到影响。最近,背景化的模型BERT在扩展查询方面非常成功。受这些问题的启发和BERT的成功启发,本文件提出了一种名为QeBERT的新颖办法。QeBERT利用基于BERT的嵌入和引用网络分析(CNA)进行查询扩展,以改进学术搜索。具体地说,我们使用环境意识的BERT编集和CNA来扩大Pseudo-Revelance Fash-ion的查询范围。ACLOC数据集的初步实验结果显示,在与CNA相结合时,BERT的构成可以提供宝贵的增强查询扩展和改进搜索的相关性。