Finnish is a language with multiple dialects that not only differ from each other in terms of accent (pronunciation) but also in terms of morphological forms and lexical choice. We present the first approach to automatically detect the dialect of a speaker based on a dialect transcript and transcript with audio recording in a dataset consisting of 23 different dialects. Our results show that the best accuracy is received by combining both of the modalities, as text only reaches to an overall accuracy of 57\%, where as text and audio reach to 85\%. Our code, models and data have been released openly on Github and Zenodo.
翻译:芬兰语是一种语言,有多种方言,不仅口音(发音)不同,而且形态形式和词汇选择也不同。我们提出第一种办法,根据方言录音记录和录音记录,在由23种不同方言组成的数据集中自动检测发言者的方言。我们的结果表明,通过将两种方式结合起来,收到的准确性最高,因为文本的总体准确性只有57 ⁇,作为文字和音频达到85 ⁇ 。我们的代码、模型和数据在Github和Zenodo上公开发布。