Medical term normalization consists in mapping a piece of text to a large number of output classes. Given the small size of the annotated datasets and the extremely long tail distribution of the concepts, it is of utmost importance to develop models that are capable to generalize to scarce or unseen concepts. An important attribute of most target ontologies is their hierarchical structure. In this paper we introduce a simple and effective learning strategy that leverages such information to enhance the generalizability of both discriminative and generative models. The evaluation shows that the proposed strategy produces state-of-the-art performance on seen concepts and consistent improvements on unseen ones, allowing also for efficient zero-shot knowledge transfer across text typologies and datasets.
翻译:医学术语正常化是指将一段文字绘制成大量产出类别。鉴于附加说明的数据集规模小,而且概念分布尾端极长,因此,开发能够概括稀有或看不见概念的模式至关重要。大多数目标类型的一个重要属性是其等级结构。在本文件中,我们引入了一种简单有效的学习战略,利用这种信息提高歧视性和基因化模式的可普及性。评价表明,拟议战略在所见概念方面产生了最先进的业绩,并在无形概念上取得了一致的改进,还允许在文本类型和数据集之间进行有效的零光知识转让。