Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 18 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP), demonstrate that CBERTScore more closely matches what clinicians prefer, and release the benchmark for the community to further develop clinically-aware ASR metrics.
翻译:自动语音识别(ASR)在医疗环境中具有节省时间、削减成本、提高报告准确性和减少医生工作疲劳的潜力。然而,由于避免医学相关转录错误的重要性,医疗保健行业采用这项技术的速度较慢。在这项工作中,我们提出了临床BERT得分(CBERTScore),一种ASR指标,其惩罚医学相关错误更多。 我们证明,与其他指标(WER,BLUE,METEOR等)相比,该指标更接近临床医生对医学句子的偏好,有时差距很大。我们收集了一个包含18个临床医师偏好的、具有149个真实医学句子的基准,称为临床医师转录偏好基准(CTP),证明CBERTScore更接近于临床医生的偏好,并将该基准发布给社区以进一步发展具有临床感知的ASR指标。