Access to informative databases is a crucial part of notable research developments. In the field of domestic audio classification, there have been significant advances in recent years. Although several audio databases exist, these can be limited in terms of the amount of information they provide, such as the exact location of the sound sources, and the associated noise levels. In this work, we detail our approach on generating an unbiased synthetic domestic audio database, consisting of sound scenes and events, emulated in both quiet and noisy environments. Data is carefully curated such that it reflects issues commonly faced in a dementia patients environment, and recreate scenarios that could occur in real-world settings. Similarly, the room impulse response generated is based on a typical one-bedroom apartment at Hebrew SeniorLife Facility. As a result, we present an 11-class database containing excerpts of clean and noisy signals at 5-seconds duration each, uniformly sampled at 16 kHz. Using our baseline model using Continues Wavelet Transform Scalograms and AlexNet, this yielded a weighted F1-score of 86.24 percent.
翻译:在国内音频分类领域,近年来取得了显著进展。尽管存在若干音频数据库,但这些数据库提供的信息数量有限,例如声源的确切位置和相关噪音水平。在这项工作中,我们详细介绍了如何建立一个公正、由声音场景和事件组成的国内合成音频数据库,在宁静和吵闹的环境中加以效仿。数据经过仔细整理,以反映痴呆病人环境中常见的问题,并重新创造现实世界环境中可能出现的情景。同样,所产生的室冲动反应以希伯来老年人生活设施典型的单居室公寓为基础。结果,我们提出一个11级数据库,每5秒钟包含清洁和吵闹信号的节录,统一抽样时间为16千赫兹。使用“继续波列变压卡片”和“亚历克斯网”的基线模型,得出了86.24%的加权F1分数。