In this paper, we propose a residual echo suppression method using a UNet neural network that directly maps the outputs of a linear acoustic echo canceler to the desired signal in the spectral domain. This system embeds a design parameter that allows a tunable tradeoff between the desired-signal distortion and residual echo suppression in double-talk scenarios. The system employs 136 thousand parameters, and requires 1.6 Giga floating-point operations per second and 10 Mega-bytes of memory. The implementation satisfies both the timing requirements of the AEC challenge and the computational and memory limitations of on-device applications. Experiments are conducted with 161~h of data from the AEC challenge database and from real independent recordings. We demonstrate the performance of the proposed system in real-life conditions and compare it with two competing methods regarding echo suppression and desired-signal distortion, generalization to various environments, and robustness to high echo levels.
翻译:在本文中,我们提出使用UNet神经网络的剩余回声抑制方法,该方法将直线声回声取消器的输出直接映射到光谱域中想要的信号。这个系统包含一个设计参数,使预期信号扭曲和残余回声抑制在双轨情景中能够进行金枪鱼平衡。这个系统使用136,000个参数,每秒需要1.6Giga浮点操作,每秒需要1.6Giga浮点操作,每秒需要10兆字节的内存。执行既满足AEC挑战的时间安排要求,也满足设备应用的计算和记忆限制。实验用AEC挑战数据库和真实独立记录中的数据进行了161~h的实验。我们展示了拟议系统在现实环境中的性能,并将它与以下两种相互竞争的方法进行了比较:回声抑制和预期信号扭曲,对各种环境的概括,以及对高回声水平的稳健度。