Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g. What is the word for 'tree'?). We integrate voice activity detection (VAD), spoken language identification (SLI), and automatic speech recognition (ASR) to transcribe the metalinguistic content, which an authorised person can quickly scan to triage recordings that can be annotated by people with lower levels of access. We report work-in-progress processing 136 hours archival audio containing a mix of English and Muruwari. Our collaborative work with the Muruwari custodian of the archival materials show that this workflow reduces metalanguage transcription time by 20% even given only minimal amounts of annotated training data: 10 utterances per language for SLI and 39 minutes of the English for ASR.
翻译:许多濒危语言语言的录音档案记录仍然没有附加说明,对于社区成员和语言学习方案来说,许多濒危语言语言的录音档案记录仍然不为社区成员和语言学习方案所使用。一个瓶颈是注释的时间密集性。对于进入限制的录音,甚至出现更窄的瓶颈,例如语言必须由授权社区成员审查或过滤才能开始注释的语文。我们提出一个隐私保护工作流程,以扩大在濒危语言的录音记录记录记录中存在的瓶颈,这些记录混合了英语和穆鲁瓦里语混杂使用的语言,如英语和穆鲁瓦里语,我们与档案材料穆鲁瓦里语保管人的合作工作显示,我们将语音活动探测(VAD),口语识别(SLI)和自动语音识别(ASR)纳入金属符号内容的转换,授权人员可以快速扫描可被低语水平的人附加注释的磁带。我们报告工作进展中处理136小时的档案录音,其中含有英语和穆鲁瓦里语的混合语评论和问题(例如“Tree”的词名词。我们与档案保管人的合作工作表明,这一工作流程减少了20 %的元翻译时间,甚至至少39分钟的英文数据仅限为10分钟。