Nowadays, search engine users commonly rely on query suggestions to improve their initial inputs. Current systems are very good at recommending lexical adaptations or spelling corrections to users' queries. However, they often struggle to suggest semantically related keywords given a user's query. The construction of a detailed query is crucial in some tasks, such as legal retrieval or academic search. In these scenarios, keyword suggestion methods are critical to guide the user during the query formulation. This paper proposes two novel models for the keyword suggestion task trained on scientific literature. Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence. Along with these models, we also present a specially tailored negative sampling approach that exploits how keywords appear in academic publications. We devise a ranking-based evaluation methodology following both known-item and ad-hoc search scenarios. Finally, we evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks.
翻译:目前,搜索引擎用户通常依靠查询建议来改进其初始投入。 当前的系统在推荐词汇调整或拼写对用户查询的校正方面非常擅长。 但是, 他们往往很难根据用户的查询来建议与语义相关的关键词。 构建一个详细的查询对于某些任务至关重要, 比如法律检索或学术搜索。 在这些情况下, 关键词建议方法对于指导用户在查询的配方过程中使用关键词至关重要。 本文为科学文献培训的关键词建议任务提出了两个新模式 。 我们的技术调整了 Word2Vec 和 FastText 的架构, 以便通过利用文件关键词共同生成关键词嵌入。 与这些模式一起, 我们还提出了一个专门定制的负面抽样方法, 利用学术出版物中关键词的出现方式。 我们根据已知项目和特设搜索设想设计了基于排序的评价方法。 最后, 我们对照最新词句嵌入模型来评估我们的提案, 显示任务基线上的重大改进 。