Processing information locked within clinical health records is a challenging task that remains an active area of research in biomedical NLP. In this work, we evaluate a broad set of machine learning techniques ranging from simple RNNs to specialised transformers such as BioBERT on a dataset containing clinical notes along with a set of annotations indicating whether a sample is cancer-related or not. Furthermore, we specifically employ efficient fine-tuning methods from NLP, namely, bottleneck adapters and prompt tuning, to adapt the models to our specialised task. Our evaluations suggest that fine-tuning a frozen BERT model pre-trained on natural language and with bottleneck adapters outperforms all other strategies, including full fine-tuning of the specialised BioBERT model. Based on our findings, we suggest that using bottleneck adapters in low-resource situations with limited access to labelled data or processing capacity could be a viable strategy in biomedical text mining. The code used in the experiments are going to be made available at https://github.com/omidrohanian/bottleneck-adapters.
翻译:临床健康记录中所含的信息处理是一项艰巨的任务,仍然是生物医学NLP的一个积极研究领域。 在这项工作中,我们评估了一套广泛的机器学习技术,从简单的RNN到专门的变压器,如生物生物生物伦理专家,这些技术涉及包含临床说明的数据集以及一组说明,表明样本是否与癌症有关。此外,我们特别采用了国家生物伦理专家的高效微调方法,即瓶颈适应器和即时调试,使模型适应我们的专门任务。我们的评估建议,微调一个经过自然语言预先训练的冷冻BERT模型,使用瓶颈适应器,超越所有其他战略,包括全面微调专门的生物生物伦理专家模型。根据我们的调查结果,我们建议,在资源有限的情况下,在有限使用贴有标签的数据或处理能力的低资源情况下使用瓶颈适应器,可能是生物医学文本采矿的一个可行战略。实验中使用的代码将在https://github.com/omidrohanian/bottleneck-adapters上公布。