We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from phonological context. While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription. AlloVera allows the training of speech recognition models that output phonetic transcriptions in the International Phonetic Alphabet (IPA), regardless of the input language. We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task. We explore the implications of this technology (and related technologies) for the documentation of endangered and minority languages. We further explore other applications for which AlloVera will be suitable as it grows, including phonological typology.
翻译:我们引入了一个新的资源, AlloVera, 它提供14种语言的218个通俗到电话的映射。 电话是对比式的声调单元, 方程式是它们从声学角度可以预见的各种具体成就。 虽然电话代表是语言特有的, 电话代表( 以( allo) 表示) 距离普遍( 独立语言) 的抄录非常近。 AlloVera 允许培训语音识别模型, 以在国际语音字母( IPA) 中输出语音抄录, 不论输入语言 。 我们展示了一个“ 通用” 的全话模型, Allosaurus, 由AlloVera 建立, 胜于“ 通用” 电话模型和语言特定模型, 用于语音描述任务。 我们探索这一技术( 及相关技术) 对濒危语言和少数民族语言文件的影响。 我们进一步探索AloVera 在其成长中将适合的其他应用程序, 包括声学分类。