Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. However, current state-of-the-art models are mostly applicable to clinical notes written in English. We therefore investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and have a small amount of in-domain data available. We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains such as cardiology, oncology and the ICU. Our results reveal two strategies that outperform the state-of-the-art: Translation-based methods in combination with domain-specific encoders and cross-lingual encoders plus adapters. We find that these strategies perform especially well for classifying rare phenotypes and we advise on which method to prefer in which situation. Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
翻译:临床口腔切除能够自动从病人记录中提取临床条件,这对全世界的医生和诊所都有好处。然而,目前最先进的模式大多适用于英文书写的临床笔记。因此,我们调查跨语言知识转移战略,以便不使用英语的诊所执行这一任务,并掌握少量的日常数据。我们评估了希腊和西班牙诊所利用心脏病学、肿瘤学和综合症等不同临床领域的临床笔记的这些战略。我们的结果显示,有两个战略比最新战略要好:与特定域的编码器和跨语言的编码器加适应器相结合的基于翻译的方法。我们发现,这些战略在对稀有型型号进行分类方面表现特别好,我们建议采用哪种方法。我们的结果显示,使用多语种数据总体改进了临床口腔模式,可以弥补数据稀缺性。