Here we study the semantic search and retrieval problem in biomedical digital libraries. First, we introduce MedGraph, a knowledge graph embedding-based method that provides semantic relevance retrieval and ranking for the biomedical literature indexed in PubMed. Second, we evaluate our method using PubMed's Best Match algorithm. Moreover, we compare our method MedGraph to a traditional TFIDF based algorithm. We use a dataset extracted from PubMed, including 30 million articles' metadata such as abstracts, author information, citation information, and extracted biological entity mentions. We do that by pulling a subset of the dataset to evaluate MedGraph using predefined queries with ground truth ranked results. To our knowledge, this technique has not been explored before in biomedical information retrieval. In addition, our results provide evidence that semantic approaches to search and relevance in biomedical digital libraries that rely on knowledge graph modeling offer better search relevance results when compared with traditional approaches in terms of objective metrics.
翻译:我们在这里研究生物医学数字图书馆的语义搜索和检索问题。 首先,我们介绍MedGraph, 这是一种基于知识图嵌入的方法,它为PubMed中生物医学文献索引提供语义相关性检索和排名。 其次,我们用PubMed的最佳匹配算法评估我们的方法。 此外,我们将我们的方法MedGraph与传统的TFIDF算法进行比较。我们使用从PubMed中提取的数据集,包括3 000万个文章的元数据,如摘要、作者信息、引言信息和提取的生物实体等。我们这样做的方式是利用一组数据集来评估MedGraph,使用预先定义的查询和地面真理排名结果。 据我们所知,这种技术在生物医学信息检索中以前没有被探索过。 此外,我们的结果提供了证据,在依靠知识图表模型建模的生物医学数字图书馆中搜索和相关性的语义方法,与客观指标的传统方法相比,提供了更好的搜索相关结果。