在国家图书馆听取声音 -- -- 瑞典语的语音材料和音响模式 (Hearing voices at the National Library -- a speech corpus and acoustic model for the Swedish language)

This paper explains our work in developing new acoustic models for automated speech recognition (ASR) at KBLab, the infrastructure for data-driven research at the National Library of Sweden (KB). We evaluate different approaches for a viable speech-to-text pipeline for audiovisual resources in Swedish, using the wav2vec 2.0 architecture in combination with speech corpuses created from KB's collections. These approaches include pretraining an acoustic model for Swedish from the ground up, and fine-tuning existing monolingual and multilingual models. The collections-based corpuses we use have been sampled from millions of hours of speech, with a conscious attempt to balance regional dialects to produce a more representative, and thus more democratic, model. The acoustic model this enabled, "VoxRex", outperforms existing models for Swedish ASR. We also evaluate combining this model with various pretrained language models, which further enhanced performance. We conclude by highlighting the potential of such technology for cultural heritage institutions with vast collections of previously unlabelled audiovisual data. Our models are released for further exploration and research here: https://huggingface.co/KBLab.

翻译：本文解释了我们在KBLab为自动语音识别(ASR)开发新的声音模型的工作,KBLab是瑞典国家图书馆(KB)数据驱动研究的基础设施。我们评估瑞典视听资源可行的语音到文字管道的不同方法,使用 wav2vec 2.0 架构,结合KB收藏的语音内容。这些方法包括:从地下开始为瑞典人预先培训一个声音模型,并微调现有的单语和多语种模型。我们使用的基于收藏的拼图是从数百万小时的语音中抽样的,有意识地试图平衡区域方言,以产生更具代表性,从而更加民主的模型。这个“VoxRex”启用的声学模型优于瑞典视听资源的现有模型。我们还评估了将这一模型与各种预先培训的语言模型相结合,进一步提高了绩效。我们最后通过强调这些技术对文化遗产机构的潜力,大量收集了先前没有标签的视听数据。我们使用的模式被放在这里进行进一步探讨和研究:https://huggingface.co/KBLab。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日