"Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties, e.g., by simulating the sounds of footsteps, ambient environmental sounds, or visible objects on the screen. While foley is traditionally produced by foley artists, there is increasing interest in automatic or machine-assisted techniques building upon recent advances in sound synthesis and generative models. To foster more participation in this growing research area, we propose a challenge for automatic foley synthesis. Through case studies on successful previous challenges in audio and machine learning, we set the goals of the proposed challenge: rigorous, unified, and efficient evaluation of different foley synthesis systems, with an overarching goal of drawing active participation from the research community. We outline the details and design considerations of a foley sound synthesis challenge, including task definition, dataset requirements, and evaluation criteria.
翻译:“Foley”是指在制作后,通过模拟脚步声、周围环境声音或屏幕上的可见物体等声音,在多媒体中增加的声效应。虽然Foley传统上是由Foley艺术家制作的,但人们越来越有兴趣利用最新进展的健全合成和基因化模型来开发自动或机器辅助技术。为了促进更多人参与这一不断增长的研究领域,我们提出了自动Foley合成的挑战。通过对以前在听力和机器学习方面成功的挑战进行个案研究,我们确定了拟议挑战的目标:严格、统一和高效地评价不同的福利合成系统,总体目标是吸引研究界的积极参与。我们概述了福利合成挑战的细节和设计考虑,包括任务定义、数据集要求和评价标准。