The selection of maskers and playback gain levels in a soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not representative of the target population, or by listening tests, which can be time-consuming and labour-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to the dynamic nature of real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker and gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference time, as well as the tedious pre-calibration process required for new maskers. The proposed system was validated on a large-scale dataset of subjective responses to augmented soundscapes with more than 440 participants, ensuring the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level.
翻译:在音频扩增系统中选择掩码器和回旋增益级别对于提高特定环境的整体声响舒适度至关重要。传统上,选择适当的遮罩器和增益级别取决于专家意见,而专家意见可能不代表目标人群,或听觉测试可能耗时费时费力。此外,由此产生的遮罩器和增益静态选择往往与真实世界声景的动态性质不相适应。在这项工作中,我们利用一个深层学习模型来共同选择最佳遮罩器及其增益水平,以适应特定声景。拟议的模型设计时采用了高度模块化的构件,允许优化推断过程,通过大量掩码器和增益组合迅速搜索。此外,我们引入了在数字增益水平上使用特性维系声景增益条件,消除了计算上昂贵的波形-持续混合过程。我们在这项工作中利用了一个深层学习模型,对最佳遮罩及其增益水平进行了联合选择。拟议的系统在大规模模型上进行了验证,其准确性反应能力水平比正常预测水平更能确保主观反应的参与者更顺利地预测水平。