使用 Seq2Seq2Seq 模型校正临床对话框定线错误 (Clinical Dialogue Transcription Error Correction using Seq2Seq Models)

Good communication is critical to good healthcare. Clinical dialogue is a conversation between health practitioners and their patients, with the explicit goal of obtaining and sharing medical information. This information contributes to medical decision-making regarding the patient and plays a crucial role in their healthcare journey. The reliance on note taking and manual scribing processes are extremely inefficient and leads to manual transcription errors when digitizing notes. Automatic Speech Recognition (ASR) plays a significant role in speech-to-text applications, and can be directly used as a text generator in conversational applications. However, recording clinical dialogue presents a number of general and domain-specific challenges. In this paper, we present a seq2seq learning approach for ASR transcription error correction of clinical dialogues. We introduce a new Gastrointestinal Clinical Dialogue (GCD) Dataset which was gathered by healthcare professionals from a NHS Inflammatory Bowel Disease clinic and use this in a comparative study with four commercial ASR systems. Using self-supervision strategies, we fine-tune a seq2seq model on a mask-filling task using a domain-specific PubMed dataset which we have shared publicly for future research. The BART model fine-tuned for mask-filling was able to correct transcription errors and achieve lower word error rates for three out of four commercial ASR outputs.

翻译：良好的沟通对良好的医疗保健至关重要。临床对话是卫生从业人员及其病人之间的对话,其明确目标是获取和分享医疗信息。这种信息有助于病人的医疗决策,在医疗旅程中发挥着关键作用。依赖笔记和人工笔记程序极为低效,导致笔记的手工转录错误。自动语音识别(ASR)在语音到文字应用中起着重要作用,可以直接用作谈话应用的文本生成器。然而,记录临床对话是一系列一般性和特定领域的挑战。在本文中,我们介绍了用于对临床对话进行ASR转录错误纠正的后继2Seq学习方法。我们引入了一个新的胃肠临床对话(GCD)数据集,该数据集由NHS Inflamary Bowel疾病诊所的保健专业人员收集,并用于与四个商用ASR系统的比较研究。我们使用自上型视野战略,对用于顶级填写面具的后继2Seq模型提出了若干一般性和特定领域的挑战。我们公开分享了用于未来研究的AAA-A-A-A-A-A-A-A-A-A-S-S-C-C-C-C-C-C-C-C-C-C-S-C-RR-C-S-C-C-C-C-C-C-C-C-C-C-R-C-C-C-C-C-R-C-R-C-C-C-C-C-C-C-C-C-C-C-R-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-R-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-R-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C