Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.
翻译:最近的语音到文字模型往往需要大量硬件资源,而且大多用英语进行培训。本文介绍了德语以及西班牙语和法语的语音到文字模型,具有以下特点:(a)这些模型规模小,在像RaspberryPi这样的微型控制器上实时运行。 (b) 使用经过预先训练的英语模型,它们可以用相对较小的数据集接受消费级硬件培训。 (c) 这些模型与其他解决方案竞争,并且优于德国语。在这方面,这些模型结合了其他方法的优势,其中仅包括所展示的特征的一个子集。此外,该文件为处理数据集提供了一个新的图书馆,侧重于简单的扩展,同时增加数据集,并展示了一种最优化的方式,用来自类似字母的其他语言的经过预先训练的模型传授新语言。