Sounds are essential to how humans perceive and interact with the world and are captured in recordings and shared on the Internet on a minute-by-minute basis. These recordings, which are predominantly videos, constitute the largest archive of sounds we know. However, most of these recordings have undescribed content making necessary methods for automatic sound analysis, indexing and retrieval. These methods have to address multiple challenges, such as the relation between sounds and language, numerous and diverse sound classes, and large-scale evaluation. We propose a system that continuously learns from the web relations between sounds and language, improves sound recognition models over time and evaluates its learning competency in the large-scale without references. We introduce the Never-Ending Learner of Sounds (NELS), a project for continuously learning of sounds and their associated knowledge, available on line in nels.cs.cmu.edu
翻译:声音对人类感知和与世界的交互至关重要,全球每分钟都有以视频的形态记录和传输的音频素材的数量庞大,形成了全球最大的音频档案。然而,这些记录大部分没有经过描述和标注,这就需要用到自动声音分析、索引和检索的方法。这些方法需要解决多种挑战,如声音与语言之间的关系、数以万计和无数的声音类别、大规模的评估等等。我们提出了一个系统,它从网上不断学习声音与语言之间的关系,不断完善声音识别模型并在没有标注的前提下进行大规模的智能评估。我们推出了永恒学习声音者(NELS)项目,它可在nels.cs.cmu.edu上获取。