Clinical notes are unstructured text generated by clinicians during patient encounters. Clinical notes are usually accompanied by a set of metadata codes from the International Classification of Diseases(ICD). ICD code is an important code used in various operations, including insurance, reimbursement, medical diagnosis, etc. Therefore, it is important to classify ICD codes quickly and accurately. However, annotating these codes is costly and time-consuming. So we propose a model based on bidirectional encoder representations from transformers (BERT) using the sequence attention method for automatic ICD code assignment. We evaluate our approach on the medical information mart for intensive care III (MIMIC-III) benchmark dataset. Our model achieved performance of macro-averaged F1: 0.62898 and micro-averaged F1: 0.68555 and is performing better than a performance of the state-of-the-art model using the MIMIC-III dataset. The contribution of this study proposes a method of using BERT that can be applied to documents and a sequence attention method that can capture important sequence in-formation appearing in documents.
翻译:临床笔记是临床医生在病人遇到病人时产生的非结构化文本。临床笔记通常附有一套国际疾病分类(疾病分类)的元数据代码。疾病分类代码是各种业务,包括保险、报销、医疗诊断等,使用的重要代码。因此,必须迅速和准确地分类疾病分类代码。但是,这些代码的注释成本高,耗时费时。因此,我们提议了一个模型,以变压器(变压器)使用自动 ICD 代码分配的顺序注意法进行双向编码显示。我们评价了我们关于三号重症护理医疗信息模型(MIMIC-III)基准数据集的方法。我们的模式取得了宏观平均F1:0.62898和微平均值F1:0.68555的性能,而且比使用MIMIC-III 数据集的状态模型的性能更好。本研究的贡献是提出一种方法,即使用可应用于文件的BERT,以及一种序列注意方法,可以捕捉到文件中出现的重要形状。