Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. Our experiments show that S3Ms have tolerance toward gender bias. Moreover, we find that the content of speech has little impact on the performance of S3Ms across downstream tasks, but S3Ms do show a preference toward a slower speech rate.
翻译:自我监督的演讲模式(S3Ms)在许多演讲的下游任务(如ASR)中被证明是成功的。然而,培训前数据如何影响S3Ms的下游行为仍然是一个尚未探讨的问题。在本文中,我们研究了培训前数据如何通过针对性别、内容和手工业等不同言论因素的有偏见的数据集的培训前模型影响S3Ms。我们实验显示,S3Ms对性别偏见持容忍态度。此外,我们发现,演讲的内容对S3Ms在下游任务中的性能影响不大,但S3Ms确实表现出倾向于更慢的言语率。