The massive growth of digital biomedical data is making biomedical text indexing and classification increasingly important. Accordingly, previous research has devised numerous techniques ranging from rule-based systems to deep neural networks, with most focusing on feedforward, convolutional or recurrent neural architectures. More recently, fine-tuned transformers-based pretrained models (PTMs) have demonstrated superior performance in many natural language processing tasks. However, the direct use of PTMs in the biomedical domain is only limited to the target documents, ignoring the rich semantic information in the label descriptions. In this paper, we develop an improved label attention-based architecture to inject semantic label description into the fine-tuning process of PTMs. Results on two public medical datasets show that the proposed fine-tuning scheme outperforms the conventionally fine-tuned PTMs and prior state-of-the-art models. Furthermore, we show that fine-tuning with the label attention mechanism is interpretable in the interpretability study.
翻译:数字生物医学数据的巨大增长使生物医学文本索引和分类变得日益重要。 因此,先前的研究已经开发了许多技术,从基于规则的系统到深神经网络,主要侧重于进料、进量或经常性神经结构。最近,基于微调变压器的预培训模型在许多自然语言处理任务中表现优异。然而,生物医学领域直接使用PTM只局限于目标文件,忽略了标签描述中丰富的语义信息。在本文中,我们开发了一个改进的基于标签的注意结构,将语义标签描述输入PTM的微调过程。两个公共医疗数据集的结果表明,拟议的微调计划超越了常规经微调的PTM和以前最先进的模型。此外,我们显示,在可解释性研究中,与标签关注机制的微调是可以解释的。