多标签的多标签、几发 ICD 代码, 即自动递后生成, 即时 (Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt)

Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedure using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infer ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.

翻译：自动国际疾病分类( ICD) 编码的目的是为一份医疗说明指定多种 ICD 代码, 平均为 3 000+ 符号。由于多标签任务( 155 000+ ICD 代码候选人) 的高维空间( 多标签任务( 155 000+ ICD 代码候选人) 和长尾挑战 -- -- 许多 ICD 代码不经常分配, 但不常见 ICD 代码在临床上很重要。本研究通过将这个多标签分类任务转化为自动递增的一代任务来应对长尾挑战。具体地说, 我们首先引入一个新的培训前新目标, 利用 SOAP 结构, 医疗逻辑医生用于备注文件, 产生免费文本诊断和程序。第二, 我们的模型不是直接预测多标签任务( 15 000+ ICD 代码候选人) 的高维度空间( 15 000+ ICD 代码候选人), 而是长尾挑战 - 许多 ICD 代码。第三, 我们为多标签分类设计了一个新的快速模板。我们用快速模型来评估我们的新一代模型, 在所有代码任务基准( MIMIMI- III III) 和 FSO- IMA IMA 上分别从 mess FSO- 302, 和 FSO- mess 和 FSO- IMA IMO- mill mill mill mill mess 4, 我们的预、 IM IM IMO- fl 和 FSO- fl 4, IMFSO- sl 4, IMFS- sal- sl 4, 我们和 FS- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- sal- s m- sal- sem- sal- sem- sal- sal- sal- sal- sem- sal- sem- sal- sal- sem- fal- fal- fal- sem- sem- sem- s 、、、、、、、、、、、、、

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日