The amount of publicly available biomedical literature has been growing rapidly in recent years, yet question answering systems still struggle to exploit the full potential of this source of data. In a preliminary processing step, many question answering systems rely on retrieval models for identifying relevant documents and passages. This paper proposes a weighted cosine distance retrieval scheme based on neural network word embeddings. Our experiments are based on publicly available data and tasks from the BioASQ biomedical question answering challenge and demonstrate significant performance gains over a wide range of state-of-the-art models.
翻译:近年来,公开生物医学文献的数量迅速增长,然而,回答问题的系统仍难以充分发挥这一数据来源的潜力。在初步处理阶段,许多回答问题的系统依靠检索模型来确定相关文件和段落。本文建议采用基于神经网络字嵌入的加权余弦远程检索计划。我们的实验基于公开可得的数据和生物统计质量生物医学问题的任务,应对挑战,并表明在一系列最先进的模型上取得了显著的成绩。