In-ear microphones in hearing protection devices can be utilized to capture the own voice speech of the person wearing the devices in noisy environments. Since in-ear recordings of the own voice are typically band-limited, an own voice reconstruction system is required to recover clean broadband speech from the in-ear signals. However, the availability of speech data for this scenario is typically limited due to device-specific transfer characteristics and the need to collect data from in-situ measurements. In this paper, we apply a deep learning-based bandwidth-extension system to the own voice reconstruction task and investigate different training strategies in order to overcome the limited availability of training data. Experimental results indicate that the use of simulated training data based on recordings of several talkers in combination with a fine-tuning approach using real data is advantageous compared to directly training on a small real dataset.
翻译:听力保护装置中的耳内麦克风可以用来捕捉在吵闹的环境中戴装置的人自己的语音。由于自己声音的在耳内录音一般是带宽限制的,因此需要有一个自己的声音重建系统,以便从在耳内的信号中恢复干净的宽带话语,但是,由于装置的具体传输特点和从现场测量收集数据的需要,这种情景的语音数据通常有限。在本文中,我们用一个深层次的基于学习的带宽扩展系统来进行声音重建任务,并调查不同的培训战略,以便克服培训数据的有限可用性。实验结果显示,使用模拟培训数据,根据几个谈话者的记录,加上使用精确调整的方法,使用真实数据比直接培训小型真实数据集更有利。