In this paper, we present Extreme Bandwidth Extension Network (EBEN), a Generative Adversarial network (GAN) that enhances audio measured with body-conduction microphones. This type of capture equipment suppresses ambient noise at the expense of speech bandwidth, thereby requiring signal enhancement techniques to recover the wideband speech signal. EBEN leverages a multiband decomposition of the raw captured speech to decrease the data time-domain dimensions, and give better control over the full-band signal. This multiband representation is fed to a U-Net-like model, which adopts a combination of feature and adversarial losses to recover an enhanced audio signal. We also benefit from this original representation in the proposed discriminator architecture. Our approach can achieve state-of-the-art results with a lightweight generator and real-time compatible operation.
翻译:在本文中,我们展示了极端带宽扩展网络(EBEN),这是一个能增强用身体-导体麦克风测量音频的创能反向网络(GAN),这种抓捕设备压抑环境噪音,而以语音带宽为代价,从而需要增强信号技术来恢复宽带语音信号。EBEN通过对原始话语的多波段分解来降低数据时间范围,并更好地控制全频信号。这种多波段表示方式被输入一个类似U-Net的模型,该模型采用特征和对抗性损失的组合来恢复增强的音频信号。我们也受益于在拟议的歧视者结构中的这种原始表述方式。我们的方法可以实现最先进的结果,使用轻量的生成器和实时兼容操作。</s>