Human coders assign standardized medical codes to clinical documents generated during patients' hospitalization, which is error-prone and labor-intensive. Automated medical coding approaches have been developed using machine learning methods such as deep neural networks. Nevertheless, automated medical coding is still challenging because of the imbalanced class problem, complex code association, and noise in lengthy documents. To solve these issues, we propose a novel neural network called Multitask Balanced and Recalibrated Neural Network. Significantly, the multitask learning scheme shares the relationship knowledge between different code branches to capture the code association. A recalibrated aggregation module is developed by cascading convolutional blocks to extract high-level semantic features that mitigate the impact of noise in documents. Also, the cascaded structure of the recalibrated module can benefit the learning from lengthy notes. To solve the class imbalanced problem, we deploy the focal loss to redistribute the attention of low and high-frequency medical codes. Experimental results show that our proposed model outperforms competitive baselines on a real-world clinical dataset MIMIC-III.
翻译:为解决这些问题,人类编码员为病人住院期间产生的临床文件指定了标准化的医疗编码,这是容易出错和劳动密集型的。使用深神经网络等机器学习方法开发了自动化医学编码方法。然而,由于舱级不平衡问题、复杂的代码关联和长篇文件中的噪音,自动化医学编码仍然具有挑战性。为了解决这些问题,我们提议建立一个新型神经网络,名为多塔什克平衡和再校准神经网络。重要的是,多任务学习计划共享不同代码分支之间的关系知识,以获取代码协会。一个重新校准的集成模块由不断演变的革命块开发,以提取高层次的语义特征,减轻文件中噪音的影响。此外,再校准模块的累进结构可以从长的注解学中受益。为了解决课堂失衡问题,我们使用焦点损失来重新分配低频和高频医学编码的注意力。实验结果显示,我们提议的模型在真实的临床数据集MIMIC-III上超越竞争性基线。