Although personalized automatic speech recognition (ASR) models have recently been designed to recognize even severely impaired speech, model performance may degrade over time for persons with degenerating speech. The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition throughout disease progression. Speech was recorded by four individuals with degrading speech due to amyotrophic lateral sclerosis (ALS). Word error rates (WER) across recording sessions were computed for three ASR models: Unadapted Speaker Independent (U-SI), Adapted Speaker Independent (A-SI), and Adapted Speaker Dependent (A-SD or personalized). The performance of all three models degraded significantly over time as speech became more impaired, but the performance of the A-SD model improved markedly when it was updated with recordings from the severe stages of speech progression. Recording additional utterances early in the disease before speech degraded significantly did not improve the performance of A-SD models. Overall, our findings emphasize the importance of continuous recording (and model retraining) when providing personalized models for individuals with progressive speech impairments.
翻译:虽然最近设计了个性化自动语音识别模式,以识别甚至严重受损的言语,但示范性表现可能会随着时间推移而降低,对于低发性言语的人来说,该研究的目的是:(1)分析低发性言人一段时间以来的性能变化,(2)探索缓解战略,以优化疾病持续发病期间的认知;四位因音缩水的横向性硬化(ALS)而发表有辱人格的言论的人录制了这些讲话;三个ASR模式:未调适的议长独立(U-SI)、适应的议长独立(A-SI)和适应的议长独立(A-SD或个性化),在为有逐步性言语障碍的个人提供个性化模式时,对A-SD模式的性能有了显著改善;记录在言论严重退化之前疾病早期的其他发音没有改善A-SD模式的性能。总体而言,我们的调查结果强调持续记录(和模式再培训)的重要性。