In this paper, we introduce a comprehensive framework for developing a machine learning-based SOAP (Subjective, Objective, Assessment, and Plan) classification system without manually SOAP annotated training data or with less manually SOAP annotated training data. The system is composed of the following two parts: 1) Data construction, 2) A neural network-based SOAP classifier, and 3) Transfer learning framework. In data construction, since a manual construction of a large size training dataset is expensive, we propose a rule-based weak labeling method utilizing the structured information of an EHR note. Then, we present a SOAP classifier composed of a pre-trained language model and bi-directional long-short term memory with conditional random field (Bi-LSTM-CRF). Finally, we propose a transfer learning framework that re-uses the trained parameters of the SOAP classifier trained with the weakly labeled dataset for datasets collected from another hospital. The proposed weakly label-based learning model successfully performed SOAP classification (89.99 F1-score) on the notes collected from the target hospital. Otherwise, in the notes collected from other hospitals and departments, the performance dramatically decreased. Meanwhile, we verified that the transfer learning framework is advantageous for inter-hospital adaptation of the model increasing the models' performance in every cases. In particular, the transfer learning approach was more efficient when the manually annotated data size was smaller. We showed that SOAP classification models trained with our weakly labeling algorithm can perform SOAP classification without manually annotated data on the EHR notes from the same hospital. The transfer learning framework helps SOAP classification model's inter-hospital migration with a minimal size of the manually annotated dataset.
翻译:在本文中,我们引入了一个全面的框架,用于开发基于机器学习的SOAP(目标、目标、评估和计划)分类系统,没有手动SOAP附加说明的培训数据,也没有手动较少的SOAP附加说明的培训数据,也没有手动的SOAP附加说明的培训数据。该系统由以下两个部分组成:(1) 数据构建,(2) 神经网络基础SOAP分类,和(3) 转移学习框架。在数据构建中,由于大规模培训数据集的手工构建费用昂贵,我们建议采用基于规则的薄弱标签标签方法,使用EHR说明的结构化信息。然后,我们提出了一个SOAP分类分类,由预先培训的语言模型和双向长期短期记忆和有条件随机随机字段(Bi-LSTM-CRF)组成。最后,我们提出了一个传输框架,重新使用SOAP分类的训练有素的参数,从另一家医院收集的数据转换模型(89.99 F1-CROC)成功完成了SAP分类。我们从目标医院收集的较弱的分类分类。在SOALLA模型中,我们通过不断更新的系统化的数据转换,我们从SOALLLLLA中学习的每个案例,我们不断学习的系统进行。