This work introduces BioLORD, a new pre-training strategy for producing meaningful representations for clinical sentences and biomedical concepts. State-of-the-art methodologies operate by maximizing the similarity in representation of names referring to the same concept, and preventing collapse through contrastive learning. However, because biomedical names are not always self-explanatory, it sometimes results in non-semantic representations. BioLORD overcomes this issue by grounding its concept representations using definitions, as well as short descriptions derived from a multi-relational knowledge graph consisting of biomedical ontologies. Thanks to this grounding, our model produces more semantic concept representations that match more closely the hierarchical structure of ontologies. BioLORD establishes a new state of the art for text similarity on both clinical sentences (MedSTS) and biomedical concepts (MayoSRS).
翻译:这项工作引入了BioLORD, 这是一项为临床判决和生物医学概念提供有意义的表述的新的培训前战略; 最先进的方法是尽量扩大同一概念名称表述的相似性,并通过对比性学习防止崩溃; 然而,由于生物医学名称并不总是不言自明,因此有时会导致非结论性表述; 生物法学组织通过使用定义和由生物医学本体组成的多关系知识图的简短描述来说明其概念表述,克服了这一问题; 由于这种基础化,我们的模式产生更多的语义概念表述,更接近本体学的等级结构; 生物法学组织为临床判决和生物医学概念的文本相似性确立了新的艺术状态。