Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g. What is the word for 'tree'?). We integrate voice activity detection (VAD), spoken language identification (SLI), and automatic speech recognition (ASR) to transcribe the metalinguistic content, which an authorised person can quickly scan to triage recordings that can be annotated by people with lower levels of access. We report work-in-progress processing 136 hours archival audio containing a mix of English and Muruwari. Our collaborative work with the Muruwari custodian of the archival materials show that this workflow reduces metalanguage transcription time by 20% even given only minimal amounts of annotated training data: 10 utterances per language for SLI and for ASR at most 39 minutes, and possibly as little as 39 seconds.
翻译:许多濒危语言语言的录音档案记录仍然没有附加说明,社区成员和语言学习方案也无法获得。一个瓶颈是注释的时间密集性。对于有访问限制的录音,甚至出现更窄的瓶颈,例如,在批注开始之前必须由授权社区成员审查或过滤的语言。我们提议一个隐私保护工作流程,以扩大濒危语言语言的录音记录记录瓶颈,这些记录与英语和穆鲁瓦里语混合使用的语言(例如,“树”一词是什么?我们与档案材料Muruwari保管人的合作工作表明,我们整合语音活动探测(VAD),口语识别(SLI)和自动语音识别(ASR),以将金属翻译内容进行转录,授权人员可以快速扫描可由低访问水平的人加注的磁带记录。我们报告工作进展中处理136小时的档案音频,其中包括英语和穆鲁瓦里语的混合语评论和问题。我们与档案材料穆鲁瓦里语保管人的合作工作显示,在最短的19秒内,甚至至少39分钟的SLI数据中,只提供最起码的40分钟。