HISSSNet:通过低资源耳语等级式原型网络探测和识别声音事件和发言者身份</s> (HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones)

Modern noise-cancelling headphones have significantly improved users' auditory experiences by removing unwanted background noise, but they can also block out sounds that matter to users. Machine learning (ML) models for sound event detection (SED) and speaker identification (SID) can enable headphones to selectively pass through important sounds; however, implementing these models for a user-centric experience presents several unique challenges. First, most people spend limited time customizing their headphones, so the sound detection should work reasonably well out of the box. Second, the models should be able to learn over time the specific sounds that are important to users based on their implicit and explicit interactions. Finally, such models should have a small memory footprint to run on low-power headphones with limited on-chip memory. In this paper, we propose addressing these challenges using HiSSNet (Hierarchical SED and SID Network). HiSSNet is an SEID (SED and SID) model that uses a hierarchical prototypical network to detect both general and specific sounds of interest and characterize both alarm-like and speech sounds. We show that HiSSNet outperforms an SEID model trained using non-hierarchical prototypical networks by 6.9 - 8.6 percent. When compared to state-of-the-art (SOTA) models trained specifically for SED or SID alone, HiSSNet achieves similar or better performance while reducing the memory footprint required to support multiple capabilities on-device.

翻译：现代消音耳机通过消除不必要的背景噪音,大大改善了用户的听觉经验,消除了不必要的背景噪音,但也能够阻断用户认为重要的声音。机器学习(ML)声音检测(SED)模型和语音识别(SID)模型可以让耳机有选择地通过重要声音传递; 然而,实施这些以用户为中心的模型带来了一些独特的挑战。首先,大多数人花有限的时间定制耳机,因此声音检测应该合理顺利地从盒子里抽出。其次,模型应该能够随着时间而了解对用户以其隐含和明确互动为基础很重要的具体声音。最后,这些模型应该有一个小的记忆足迹,用低功率的耳机检测(SED)模型,用高等级SESSNet(SED)和SID(SID(SED(SED))网络,使用等级分级的热门性能检测一般和特定的兴趣声音,并描述类似和语音声音的声音。我们显示,HSSNet(SISNet(SISNet)应该用经过专门训练的SISA(SIS-SISISA)模型,然后用SISISISMA(SISISISD(SAS-SAS-S-S-SIS-SIR-S-SIS-SIS-SIR-SIS-SIS-SAS-SAS-S-SAS-S-S-S-S-S-SAS-S-S-S-S-S-S-S-S-S-S-S-S-S-SIR-SIR-S-S-SIS-S-S-SIR-SIS-S-S-S-S-S-S-SIR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SIS-A-S-S-S-S-S-SIS-SIS-SIS-SIS-SIS-SIS-SIS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-A-A-A-</s>

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日