Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep-learning based dialog classification and denoising. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems.
翻译:对话框增强( DE) 是一个使用户能够提高电视或电影内容中与非对话框声音的对话水平的特征。 当只有原始组合时, DE 是“ 未制导的 ”, 需要源分离 。 在本文中, 我们描述深空间系统, 该系统使用动态空间提示和源提示进行源分离, 以支持未制导的DE。 其技术包括spastio 级别过滤( SLF) 和基于深层学习的对话框分类和去除。 我们通过主观监听测试, 显示深空空间与可用于测试的最新系统相比, 总体性能显著改善。 我们探索使用现有自动测量仪评估无制导的DE系统的可行性 。