Neural word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications since they provide vector representations of words that capture the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual sources to train word embeddings and apply these word embeddings to downstream biomedical applications. However, there has been little work on comprehensively evaluating the word embeddings trained from these resources. In this study, we provide a comprehensive empirical evaluation of word embeddings trained from four different resources, namely clinical notes, biomedical publications, Wikepedia, and news. We perform the evaluation qualitatively and quantitatively. In qualitative evaluation, we manually inspect five most similar medical words to a given set of target medical words, and then analyze word embeddings through the visualization of those word embeddings. Quantitative evaluation falls into two categories: extrinsic and intrinsic evaluation. Based on the evaluation results, we can draw the following conclusions. First, EHR and PubMed can capture the semantics of medical terms better than GloVe and Google News and find more relevant similar medical terms. Second, the medical semantic similarity captured by the word embeddings trained on EHR and PubMed are closer to human experts' judgments, compared to these trained on GloVe and Google News. Third, there does not exist a consistent global ranking of word embedding quality for downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, word embeddings trained from a similar domain corpus do not necessarily have better performance than other word embeddings for any downstream biomedical tasks.
翻译:在生物医学自然语言处理(NLP)应用中,内字嵌入被广泛使用,因为内字嵌入在生物医学自然语言处理(NLP)应用中广泛使用,因为它们提供了表达语言的矢量,反映了文字的语义性质和语言关系。许多生物医学应用使用不同的文字源来培训文字嵌入,并将这些词嵌入应用应用于下游生物医学应用中。然而,在全面评估从这些资源中培训的词嵌入方面,没有做多少工作。在本研究中,我们对从四种不同资源,即临床笔记、生物医学出版物、Wikepedia和新闻中培训的词嵌入进行全面的经验性评价。我们在质量评估中进行定性和定量评价。在质量评估中,我们用五种最相似的医学词汇对一组特定医学词进行人工检查,然后通过这些词的视觉嵌入来分析。 定量评估分为两类:外部和内在语言,根据评估结果,我们可以得出以下结论。首先,EHR和PuMed可以更好地从GloVe 和Goo New 上获取医学术语的语义,而在医学术语上找到更相关的医学术语。第二,经过更密切的内嵌化的内嵌化的内行任务是更接近的内行的内行的内行的内行的内行的内行的内行的内行,这些内行的内行的内行的内行的内行的内行的内行将更接近的内行的内行的内行到直言,这些内行到的内行的内行到的内行到的内行。