Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fine-grained categories. ProtoSound is motivated by prior work examining sound awareness needs of DHH people and by a survey we conducted with 472 DHH participants. To evaluate ProtoSound, we characterized performance on two real-world sound datasets, showing significant improvement over state-of-the-art (e.g., +9.7% accuracy on the first dataset). We then deployed ProtoSound's end-user training and real-time recognition through a mobile application and recruited 19 hearing participants who listened to the real-world sounds and rated the accuracy across 56 locations (e.g., homes, restaurants, parks). Results show that ProtoSound personalized the model on-device in real-time and accurately learned sounds across diverse acoustic contexts. We close by discussing open challenges in personalizable sound recognition, including the need for better recording interfaces and algorithmic improvements.
翻译:最近的进展使聋人和听力困难(DHH)用户在移动设备上实现了自动听觉识别系统;然而,这些工具使用预先训练的通用声音识别模型,这些模型无法满足DHH用户的不同需要。我们引入了普罗托Sound,这是一个定制声音识别模型的互动系统,通过记录几个实例,从而能够实现个性化和细微的分类。普罗托Sound的动机是事先研究DHH(DHH)用户的正确认识需求,以及我们与472 DHH(DH)参与者进行了一项调查。为了评估普罗托Sound,我们在两个真实世界声音数据集上的表现表现突出,显示比最新数据集(例如,+9.7%的精确度)有显著改善。我们随后采用了普罗托Sound的最终用户培训和实时识别,通过移动应用程序实时识别,并招募了19名听者,他们倾听了DHHHH(DHH)人的正确度,并在56个地点(例如家、餐馆、公园)进行了评估。结果显示,ProtoSoundSound使模型在实时和准确了解最新声音声音声音声音的改进了各种声学界面界面界面。我们通过讨论了更深入了解了需要。