Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans. Current automated models of Machine Listening vary from purely data-driven approaches to approaches imitating human systems. In recent years, the most promising approaches have been hybrid in that they have used data-driven approaches informed by models of the perceptual, cognitive, and semantic processes of the human system. Not only does the guidance provided by models of human perception and domain knowledge enable better, and more generalizable Machine Listening, in the converse, the lessons learned from these models may be used to verify or improve our models of human perception themselves. This paper summarizes advances in the development of such hybrid approaches, ranging from Machine Listening models that are informed by models of peripheral (human) auditory processes, to those that employ or derive semantic information encoded in relations between sounds. The research described herein was presented in a special session on "Synergy between human and machine approaches to sound/scene recognition and processing" at the 2023 ICASSP meeting.
翻译:从我们的观点来看,目前机器听力的自动化模型从纯粹的数据驱动方式到模仿人类系统的方法各不相同。近年来,最有希望的方法是混合的,它们使用了数据驱动的方法,其依据是人类系统的概念、认知和语义过程的模式。不仅人类感知和域知识模型提供的指导使得人类感知和域知识模型能够更好和更普遍化的机器听力,反之,可以从这些模型中汲取的教训可用来核查或改进我们人类感知模型本身。本文总结了这种混合方法的开发进展,从利用外围(人类)听力过程模型的机器听力模型到使用或获得在声音关系中编码的语义信息的模式。本文所述研究是在2023年ICASSP会议上关于“人与机器对声音/声音的识别和处理方法之间的矛盾”的特别会议上介绍的。</s>