在有噪音时受监督的学习:在ICD-10代码分类中的应用 (Supervised Learning in the Presence of Noise: Application in ICD-10 Code Classification)

ICD coding is the international standard for capturing and reporting health conditions and diagnosis for revenue cycle management in healthcare. Manually assigning ICD codes is prone to human error due to the large code vocabulary and the similarities between codes. Since machine learning based approaches require ground truth training data, the inconsistency among human coders is manifested as noise in labeling, which makes the training and evaluation of ICD classifiers difficult in presence of such noise. This paper investigates the characteristics of such noise in manually-assigned ICD-10 codes and furthermore, proposes a method to train robust ICD-10 classifiers in the presence of labeling noise. Our research concluded that the nature of such noise is systematic. Most of the existing methods for handling label noise assume that the noise is completely random and independent of features or labels, which is not the case for ICD data. Therefore, we develop a new method for training robust classifiers in the presence of systematic noise. We first identify ICD-10 codes that human coders tend to misuse or confuse, based on the codes' locations in the ICD-10 hierarchy, the types of the codes, and baseline classifier's prediction behaviors; we then develop a novel training strategy that accounts for such noise. We compared our method with the baseline that does not handle label noise and the baseline methods that assume random noise, and demonstrated that our proposed method outperforms all baselines when evaluated on expert validated labels.

翻译：ICD 编码是获取和报告健康条件以及保健收入周期管理的诊断的国际标准。人工分配 ICD 代码容易人为错误, 原因是代码词汇繁多,代码相似。由于基于机器的学习方法需要地面真相培训数据, 人类代码的不一致表现为标签中的噪音, 这使得在出现这种噪音时难以对ICD分类人员进行培训和评价。本文调查人工指定的 ICD- 10 代码中这类噪音的特性, 并提议一种方法, 在有标签噪音的情况下, 训练强大的 ICD- 10 分类人员。我们的研究得出结论, 这种噪音的性质是系统性的。大多数处理标签噪音的现有方法假定, 噪音是完全随机的, 与特征或标签完全无关, 这对于 ICD 数据来说不是那样。因此, 我们开发了一种新的方法, 在有系统噪音的情况下, 训练强力分类人员使用ICD- 10 代码, 并且我们首先根据代码在ICD- 10 等级、代码类型和基线分类师的预测行为, 其性质是系统性的。我们随后用一种随机的训练方法来比较我们的标准方法, 我们用这种方法来评估我们的标签。