During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not only degrades the voice quality but also raises serious privacy issues. In this paper, we develop a novel system, called Spatial Aware Multi-task learning-based Separation (SAMS), to extract audio signals from the target user during teleconferencing. Our solution consists of three novel components: (i) generating fine-grained location embeddings from the user's voice and inaudible tracking sound, which contains the user's position and rich multipath information, (ii) developing a source separation neural network using multi-task learning to jointly optimize source separation and location, and (iii) significantly speeding up inference to provide a real-time guarantee. Our testbed experiments demonstrate the effectiveness of our approach
翻译:在科维德会议期间,在线会议已成为我们生活中不可或缺的部分。这一趋势很可能继续,因为它们既方便又广,但是,来自其他家庭成员、室友、办公室室友的背景噪音不仅会降低声音质量,而且会引发严重的隐私问题。在本文中,我们开发了一个新颖的系统,名为“空间认知多任务学习隔离”(SAMS),目的是在电话会议期间从目标用户那里获取音频信号。我们的解决办法包括三个新颖的组成部分:(一) 从用户的声音和无法听懂的跟踪声音中产生精细的定位,其中含有用户的位置和丰富的多路径信息;(二) 利用多任务学习,开发一个源分离神经网络,以共同优化源分离和定位;(三) 大大加快推论,以提供实时保证。我们的试验实验证明了我们方法的有效性。