Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This `ears-off' situation motivates augmenting existing datasets in a privacy-preserving manner. In this paper, we present \aura, a solution to make existing noise suppression test sets more challenging and diverse while being sample efficient. \aura is `ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of \aura, we augment the INTERSPEECH 2021 DNS challenge by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips obtained from Audio Set. \aura makes the existing benchmark test set harder by 0.27 in DNSMOS P.835 OVLR (7\%), $0.64$ harder in DNSMOS P.835 SIG (16\%), increases diversity by $31\%$, and achieves a $26\%$ improvement in Spearman's rank correlation coefficient (SRCC) compared to random sampling. Finally, we open-source \aura to stimulate research of test set development.
翻译:在生产环境中运行的噪音抑制模型通常在公开可得的数据集上接受培训。然而,由于缺少对代表性客户数据的培训/测试,这一方法导致倒退。此外,由于隐私原因,开发商无法倾听客户内容。这种“早退”状况促使以隐私保护的方式增加现有的数据集。在本文件中,我们提出使现有噪音抑制测试组更具挑战性和多样性的解决方案,同时具有样本效率。\aura使现有的基准测试组“早退”,因为它依赖一个特征提取器和语言质量衡量标准DNSMOS P.835(7美元),两者都事先接受了从公共来源获得的数据的训练。作为aura的应用,我们增加了INTERSPEECH 2021 DNS挑战,通过对来自Librivox混合的20K清洁语音剪片的新一批数据进行抽样取样,通过从音频Set获得的噪音剪辑,使现有基准测试组更难于0.27,DNSMOS P.835 P.835 (7美元),在DNSMOS-RIGS级上更难进行升级,在SIAS-SQLAQSICSICR(16_SQSIQ) 上,在SIGIQSIGIRC 的升级上,在SIGIGILOLOBRBR 上实现升级升级升级(1616),在SQSQS__BR___BAR_BAR_BAR的升级,在S_BAR_BAR的升级,在S_BAR的升级,在SBAR_BAR的升级的升级,在SBAR的升级的升级上,在SIGIGIGIBAR_BAR_BAR_BAR的升级。