Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications. Robust SAD is usually more difficult in noisy conditions with varying signal-to-noise ratios (SNR). The Fearless Steps challenge has recently provided such data from the NASA Apollo-11 mission for different speech processing tasks including SAD. Most audio recordings are degraded by different kinds and levels of noise varying within and between channels. This paper describes the EML online algorithm for the most recent phase of this challenge. The proposed algorithm can be trained both in a supervised and unsupervised manner and assigns speech and non-speech labels at runtime approximately every 0.1 sec. The experimental results show a competitive accuracy on both development and evaluation datasets with a real-time factor of about 0.002 using a single CPU machine.
翻译:语音活动探测(SAD)将语音活动部分定位在音频录音中,是大多数语音技术应用的主要部分。强势的 SAD通常在噪音条件下更为困难,信号到噪音比率不一。无畏的步骤挑战最近为美国航天局阿波罗-11号飞行任务提供了包括SAD在内的不同语音处理任务的数据。大多数录音由于频道内部和频道之间的不同类型和噪音水平而退化。本文描述了最近阶段的EML在线算法。拟议的算法可以以监督和不受监督的方式加以培训,并大约每0.1秒在运行时分配语音和非语音标签。实验结果显示,开发和评价数据集的竞争性精确度,实时系数约为0.002,使用单一的CPU机器。