Monitoring the health status of patients in the ICU is crucial for providing them with better care and treatment. Massive raw electronic health records (EHR) give machine learning models more clinical texts and vital signs to make accurate predictions. Currently, many advanced NLP models have emerged for clinical note analysis. However, due to the complicated textual structure and noise in raw clinical data, coarse embedding approaches without domain-specific refining limit the accuracy improvement. To address this issue, we propose FINEEHR, a system adopting two representation learning techniques, including metric learning and fine-tuning, to refine clinical note embeddings, utilizing the inner correlation among different health statuses and note categories. We evaluate the performance of FINEEHR using two metrics, AUC and AUC-PR, on a real-world MIMIC III dataset. Our experimental results demonstrate that both refining approaches can improve prediction accuracy, and their combination presents the best results. It outperforms previous works, achieving an AUC improvement of over 10%, with an average AUC of 96.04% and an average AUC-PR of 96.48% across various classifiers.
翻译:暂无翻译