Choosing optimal maskers for existing soundscapes to effect a desired perceptual change via soundscape augmentation is non-trivial due to extensive varieties of maskers and a dearth of benchmark datasets with which to compare and develop soundscape augmentation models. To address this problem, we make publicly available the ARAUS (Affective Responses to Augmented Urban Soundscapes) dataset, which comprises a five-fold cross-validation set and independent test set totaling 25,440 unique subjective perceptual responses to augmented soundscapes presented as audio-visual stimuli. Each augmented soundscape is made by digitally adding "maskers" (bird, water, wind, traffic, construction, or silence) to urban soundscape recordings at fixed soundscape-to-masker ratios. Responses were then collected by asking participants to rate how pleasant, annoying, eventful, uneventful, vibrant, monotonous, chaotic, calm, and appropriate each augmented soundscape was, in accordance with ISO 12913-2:2018. Participants also provided relevant demographic information and completed standard psychological questionnaires. We perform exploratory and statistical analysis of the responses obtained to verify internal consistency and agreement with known results in the literature. Finally, we demonstrate the benchmarking capability of the dataset by training and comparing four baseline models for urban soundscape pleasantness: a low-parameter regression model, a high-parameter convolutional neural network, and two attention-based networks in the literature.
翻译:通过声景增强,为现有声景扩展实现理想的感知变化而选择最佳掩码器,由于掩码器种类繁多,缺乏用于比较和开发声景增强模型的基准数据集,因此不是三角的。为了解决这个问题,我们公开提供ARAUS(增强城市声景增强的动因对策)数据集,该数据集包括一个五倍的交叉校验成套和独立测试集,总共25,440个独特的主观感知反应,作为视听图象显示的增强声景。每个增强的声景都具有独特的主观感应。通过数字化地在城市声景增强模型中添加“造型”(鸟、水、风、交通、建筑或沉默)来比较和发展声景增强模型。然后,通过询问参与者对令人愉快、烦恼、多事、不稳定、充满活力、单调、混乱、平静和适当的每个增强型模型,根据ISO 12913-2:2018,参与者还提供了相关的人口信息并完成了标准心理学问卷。我们用数字化的图象化和统计模型对固定的图像网络在固定的音景色比标比比比率比率中,我们最后通过测试了测试了基准模型,以核实内部测测结果。