Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 13 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP), demonstrate that CBERTScore more closely matches what clinicians prefer, and release the benchmark for the community to further develop clinically-aware ASR metrics.
翻译:在医学方面,自动言语识别(ASR)具有节省时间、降低成本、提高报告准确度和减少医生消耗的潜力,然而,医疗行业采用这一技术的速度较慢,部分原因是避免医疗相关转录错误的重要性。在这项工作中,我们展示了临床BERTScore(CBERTScore)(CBERTScore)(CBERTScore)(CBERTScore)(CBERTScore))(CBERTScore)(CBERTScore)(CBER)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR)(ASR(ASR)(ASR)(ASR)(ARS(A)(ARS)(ARS)(AD)(ARS(A)(ART)(A)(A)(AD)(ART)(AD)(ART)(ART)(ASR)(AD)(AD)(AD)(AD)(AD)(AD)(ASR(A(ASR)(ASR)(ASR(ASR)(ASR)(ASR)(AD)(ASR)(A)(ASR)(AD)(ASR(ASR(ASR)(AD)(AD)(ASR)(AD)(AD)(AD)(A(A)(A)(A)(AD)(A)(A(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A(A(A)(A)(A)(A)(A)(A)(A)(A(A(A)(A)(A(A(A)(ASR)))(A(A(AD))(AD)(AD)(AD)(A(A(A))(A)(A)(A)(A)(A</s>