With the advancements in deep learning (DL) and an increasing interest in data-driven speech processing methods, there is a major challenge in accessing pathological speech data. Public challenge data offers a potential remedy for this but may expose patient health information by re-identification attacks. Therefore, we investigate in this study whether or not pathological speech is more vulnerable to such re-identification than healthy speech. Our study is the first large-scale investigation on the effects of different speech pathology on automatic speaker verification (ASV) using a real-world pathological speech corpus of more than 2,000 test subjects with various speech and voice disorders from different ages. Utilizing a DL-based ASV method, we obtained a mean equal error rate (EER) of 0.89% with a standard deviation of 0.06%, which is a factor of three lower than comparable healthy speech databases. We further perform detailed analyses of external influencing factors on ASV such as age, pathology, recording environment, utterance length, and intelligibility, to explore their respective effect. Our experiments indicate that some types of speech pathology, in particular dysphonia, regardless of speech intelligibility, are more vulnerable to a breach of privacy compared to healthy speech. We also observe that the effect of pathology lies in the range of other factors, such as age, microphone, and recording environment.
翻译:随着深层学习的进步(DL)和对数据驱动语音处理方法的兴趣日益浓厚,在获取病理语言数据方面存在重大挑战。公众挑战数据提供了潜在的补救,但有可能通过再识别攻击暴露病人的健康信息。因此,我们在本研究中调查病理语言是否比健康言论数据库更容易被重新识别为比健康言论更容易。我们的研究是对不同语言病理学对自动语音校验(ASV)的影响的首次大规模调查,它使用的是来自不同年龄的具有各种语言和声音障碍的2,000多个实际世界病理语言材料。我们利用以DL为基础的ASV方法,我们获得了0.899%的平均平均误差率(EER),标准偏差为0.06%,比可比健康言论数据库低三个因素。我们进一步详细分析对ASV的外部影响因素,如年龄、病理学、记录环境、长度和可感知性,以探究它们各自的影响。我们的实验表明,某些类型的语言病理学,特别是以DIV-PV为主的言理学,无论语言的偏差程度为0.06%,还比健康语言感官路。