This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR). EURO adopts the state-of-the-art UASR learning method introduced by the Wav2vec-U, originally implemented at FAIRSEQ, which leverages self-supervised speech representations and adversarial training. In addition to wav2vec2, EURO extends the functionality and promotes reproducibility for UASR tasks by integrating S3PRL and k2, resulting in flexible frontends from 27 self-supervised models and various graph-based decoding strategies. EURO is implemented in ESPnet and follows its unified pipeline to provide UASR recipes with a complete setup. This improves the pipeline's efficiency and allows EURO to be easily applied to existing datasets in ESPnet. Extensive experiments on three mainstream self-supervised models demonstrate the toolkit's effectiveness and achieve state-of-the-art UASR performance on TIMIT and LibriSpeech datasets. EURO will be publicly available at https://github.com/espnet/espnet, aiming to promote this exciting and emerging research area based on UASR through open-source activity.
翻译:本文介绍ESPnet 不受监督的ASR开放源码工具包(EURO),这是一个用于不受监督的自动语音识别的端到端开放源码工具包(UASR)。欧洲区域办事处采用Wav2vec-U采用的由Wav2vec-U采用的最新UASR学习方法,该方法最初由Wav2vec-U在FAIRSEQ实施,利用自我监督的语音陈述和对抗性培训。除了wav2vec2外,欧洲区域办事处还扩展了该功能,通过整合S3PRL和K2,促进对UASR任务的再传播。这导致27个自我监督模式和各种基于图表的解码战略的灵活前端。欧洲区域办事处在ESPnet网实施并遵循其统一的管道,向UASR食谱提供全套的UASR食谱。这提高了管道的效率,使EURO易于应用于ESPnet网中的现有数据集。关于三种主流自我监督模式的广泛实验展示了该工具包的有效性,并实现了在TIMIMEX和Listripest/Ex数据库的这一新出现的公共活动领域实现欧盟艺术USR的绩效。