Background: Encouraged by the success of pretrained Transformer models in many natural language processing tasks, their use for International Classification of Diseases (ICD) coding tasks is now actively being explored. In this study, we investigate three types of Transformer-based models, aiming to address the extreme label set and long text classification challenges that are posed by automated ICD coding tasks. Methods: The Transformer-based model PLM-ICD achieved the current state-of-the-art (SOTA) performance on the ICD coding benchmark dataset MIMIC-III. It was chosen as our baseline model to be further optimised. XR-Transformer, the new SOTA model in the general extreme multi-label text classification domain, and XR-LAT, a novel adaptation of the XR-Transformer model, were also trained on the MIMIC-III dataset. XR-LAT is a recursively trained model chain on a predefined hierarchical code tree with label-wise attention, knowledge transferring and dynamic negative sampling mechanisms. Results: Our optimised PLM-ICD model, which was trained with longer total and chunk sequence lengths, significantly outperformed the current SOTA PLM-ICD model, and achieved the highest micro-F1 score of 60.8%. The XR-Transformer model, although SOTA in the general domain, did not perform well across all metrics. The best XR-LAT based model obtained results that were competitive with the current SOTA PLM-ICD model, including improving the macro-AUC by 2.1%. Conclusion: Our optimised PLM-ICD model is the new SOTA model for automated ICD coding on the MIMIC-III dataset, while our novel XR-LAT model performs competitively with the previous SOTA PLM-ICD model.
翻译:受许多自然语言处理任务中经过预先训练的变换模型的成功鼓舞,目前正在积极探索这些模型用于国际疾病分类的编码任务。在本研究中,我们调查了三种类型的变换模型,目的是解决自动 ICD 编码任务带来的极端标签和长文本分类挑战。方法:基于变换模型的变换模型PLM- ICD在 ICD 基准数据处理 MIMI-III 上取得了当前最先进的变换模型(SOTA)性能。结果:我们用新的变现的 PLM-ICD 模型来进行进一步优化。 XR- Transforexer,通用多标签分类分类域域的新变换的SOTA模型,以及XR-Trading模型的新改编的变换版。XRR-Trading模型的变换版,是SOMR-R-RMR 最新模型和SO-RMRMRM 最新版本。SO-RO-RO-ROD 模型的改进后,整个SO-R-R-R-RO-RM-RMRM-RMRM 系统运行运行,整个SO-RO-R-RO-R-RO-R-R-R-R-R-R-R-R-SODR-S-S-R-SL-SD-SLADM-S-SD-S-S-S-SO-SB-S-S-SD-S-SD-SD-M-SL-SD-SD-SD-SD-SL-SD-SD-S-S-S-S-S-S-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SD-SD-SD-S-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-