We describe the system our team used during NIST's LoReHLT (Low Resource Human Language Technologies) 2017 Evaluations, which evaluated document topic classification. We present a language agnostic approach combining universal acoustic modeling, evaluation-language-to-English machine translation (MT) and an English-language topic classifier. This combination requires no transcribed speech in the given evaluation language, nor even in a related language. We also examine the benefits of system adaptation from various collected resources. The two evaluation languages (incident languages by the LORELEI terminology) were Tigrinya (IL5) and Oromo (IL6) and for both our system performed well.
翻译:我们描述我们小组在NIST的2017年低资源人类语言技术评价中所使用的系统,评估文件主题分类,我们提出了一个语言不可知性办法,将通用声学建模、评价语言对英语机器翻译(MT)和英语专题分类结合起来,这种结合不需要在特定评价语言中进行转录,甚至不需要使用相关语言。我们还从收集的各种资源中研究系统调整的好处。两种评价语言(LORELEI术语中的奇特语言)是Tigrinya语(IL5)和Oromo语(IL6),对我们的系统来说都表现良好。