In this paper, we present Extreme Bandwidth Extension Network (EBEN), a generative adversarial network (GAN) that enhances audio measured with noise-resilient microphones. This type of capture equipment suppresses ambient noise at the expense of speech bandwidth, thereby requiring signal enhancement techniques to recover the wideband speech signal. EBEN leverages a multiband decomposition of the raw captured speech to decrease the data time-domain dimensions, and give better control over the full-band signal. This multiband representation is fed to a U-Net-like model, which adopts a combination of feature and adversarial losses to recover an enhanced audio signal. We also benefit from this original representation in the proposed discriminator architecture. Our approach can achieve state-of-the-art results with a lightweight generator and real-time compatible operation.
翻译:在本文中,我们展示了极端带宽扩展网络(EBEN),这是一个基因对抗网络(GAN),它能用抗噪音麦克风增强音频测量音频。这种抓捕设备压制环境噪音,而以语音带宽为代价,从而需要增强信号技术来恢复宽带语音信号。EBEN利用从原始话语中多带分解的多带分解来减少数据时间-空间尺寸,并更好地控制全带信号。这种多波段表示方式被输入一个类似U-Net的模型,该模型采用特征和对抗性损失的组合来恢复增强的音频信号。我们也受益于在拟议的歧视者结构中的这种原始表述方式。我们的方法可以实现最先进的结果,使用轻量的生成器和实时兼容操作。