Spoken dialog systems are slowly becoming and integral part of the human experience due to their various advantages over textual interfaces. Spoken language understanding (SLU) systems are fundamental building blocks of spoken dialog systems. But creating SLU systems for low resourced languages is still a challenge. In a large number of low resourced language, we don't have access to enough data to build automatic speech recognition (ASR) technologies, which are fundamental to any SLU system. Also, ASR based SLU systems do not generalize to unwritten languages. In this paper, we present a series of experiments to explore extremely low-resourced settings where we perform intent classification with systems trained on as low as one data-point per intent and with only one speaker in the dataset. We also work in a low-resourced setting where we do not use language specific ASR systems to transcribe input speech, which compounds the challenge of building SLU systems to simulate a true low-resourced setting. We test our system on Belgian Dutch (Flemish) and English and find that using phonetic transcriptions to make intent classification systems in such low-resourced setting performs significantly better than using speech features. Specifically, when using a phonetic transcription based system over a feature based system, we see average improvements of 12.37% and 13.08% for binary and four-class classification problems respectively, when averaged over 49 different experimental settings.
翻译:口语对话系统由于相对于文本界面的优势不同,正在慢慢地成为人类经验不可分割的一部分。口语理解系统(SLU)是口语对话系统的基本构件。但为低资源语言创建 SLU 系统仍是一项挑战。在大量低资源语言中,我们无法获得足够的数据来建立自动语音识别技术,而这种技术对任何SLU系统来说都是根本性的。此外,基于ASLU的SLU系统并不概括为非书面语言。在本文中,我们提出了一系列实验,以探索极低资源环境,在这种环境中,我们进行意向性分类的系统是按每个意图一个数据点进行低培训的系统,在数据集中只有一位发言者。我们还在低资源环境中工作,我们不使用特定语言的ASR系统来转换输入输入语音技术,这增加了建立SLU系统模拟真正低资源环境的挑战。我们用比利时荷兰语(Flemish)和英语测试我们的系统,并发现我们使用电话翻译系统来在这种低资源设置中进行意向分类系统分类,即按每个意图进行分类,按一个低资源配置的系统,每个意图进行分类,按一个低数据点进行分类,每只一个低数据,每个,每只使用4个日历,我们用4个日历,我们使用一个普通的系统,比具体地看到一个系统,用4个系统,比用4个系统改进。