We propose Medical Entity Definition-based Sentence Embedding (MED-SE), a novel unsupervised contrastive learning framework designed for clinical texts, which exploits the definitions of medical entities. To this end, we conduct an extensive analysis of multiple sentence embedding techniques in clinical semantic textual similarity (STS) settings. In the entity-centric setting that we have designed, MED-SE achieves significantly better performance, while the existing unsupervised methods including SimCSE show degraded performance. Our experiments elucidate the inherent discrepancies between the general- and clinical-domain texts, and suggest that entity-centric contrastive approaches may help bridge this gap and lead to a better representation of clinical sentences.
翻译:我们建议医学实体基于定义的判决嵌入(MED-SE),这是一个为临床文本设计的新颖的、不受监督的对比性学习框架,它利用医学实体的定义;为此,我们对临床语义相似(STS)环境中的多重判决嵌入技术进行广泛分析。在我们设计的以实体为中心的环境中,MED-SE取得了显著更好的表现,而现有的包括SimCSE在内的未经监督的方法表现了退化。我们的实验阐明了普通文本和临床文本之间的内在差异,并表明以实体为中心的对比性方法可能有助于弥合这一差距,并导致更好地代表临床判决。