CUHK 太阳话语识别系统最近的进展 (Recent Progress in the CUHK Dysarthric Speech Recognition System)

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based ASR technologies that predominantly target normal speech. This paper presents recent research efforts at the Chinese University of Hong Kong (CUHK) to improve the performance of disordered speech recognition systems on the largest publicly available UASpeech dysarthric speech corpus. A set of novel modelling techniques including neural architectural search, data augmentation using spectra-temporal perturbation, model based speaker adaptation and cross-domain generation of visual features within an audio-visual speech recognition (AVSR) system framework were employed to address the above challenges. The combination of these techniques produced the lowest published word error rate (WER) of 25.21% on the UASpeech test set 16 dysarthric speakers, and an overall WER reduction of 5.4% absolute (17.6% relative) over the CUHK 2018 dysarthric speech recognition system featuring a 6-way DNN system combination and cross adaptation of out-of-domain normal speech data trained systems. Bayesian model adaptation further allows rapid adaptation to individual dysarthric speakers to be performed using as little as 3.06 seconds of speech. The efficacy of these techniques were further demonstrated on a CUDYS Cantonese dysarthric speech recognition task.

翻译：尽管过去几十年来自动语音识别(ASR)技术取得了快速进展,但承认无序言论仍然是迄今为止一项极具挑战性的任务。无序言论对当前数据密集的深神经网络(DNNS)基于以正常言论为主要目标的ASR技术提出了广泛的挑战。本文介绍了中国香港大学(CUHK)最近为改善现有最大公开开放的 UASpeech dysarth 语音资料库无序语音识别系统的性能而进行的研究努力。一套新颖的模拟技术包括神经建筑搜索、使用光谱-时空扰动、基于模范的扬声器改造和在视听语音识别(AVSR)框架内跨视像功能生成等一系列挑战。这些技术的结合使得在UASpeech测试中公布的最低字误差率(WER)为25.21%,16个读读者达萨勒特调调调调调音频(WERDY)总体减少了5.4%的绝对值(相对为17.6%),CHK 2018调调调调调语音识别系统以经过6路的正常语音识别系统培训的6度调整后,使DNNS系统得以迅速调整。DNURSLAUSLUSLUDUDUDLU值调整,作为正常语言数据系统,作为正常调制制制的正常调制调制的系统,作为正常调制的自动调制数据系统,作为正常调制的系统,作为正常调制的自动调制的自动调制的自动调制数据系统,作为正常调制的系统的基础,作为正常调制的系统,作为正常调制制制制制制制制制制制制制制制的系统,作为正常调制调制的系统,作为正常调制调制调制。