We investigate an method for quantifying city characteristics based on impressions of a sound environment. The quantification of the city characteristics will be beneficial to government policy planning, tourism projects, etc. In this study, we try to predict two soundscape impressions, meaning pleasantness and eventfulness, using sound data collected by the cloud-sensing method. The collected sounds comprise meta information of recording location using Global Positioning System. Furthermore, the soundscape impressions and sound-source features are separately assigned to the cloud-sensing sounds by assessments defined using Swedish Soundscape-Quality Protocol, assessing the quality of the acoustic environment. The prediction models are built using deep neural networks with multi-layer perceptron for the input of 10-second sound and the aerial photographs of its location. An acoustic feature comprises equivalent noise level and outputs of octave-band filters every second, and statistics of them in 10~s. An image feature is extracted from an aerial photograph using ResNet-50 and autoencoder architecture. We perform comparison experiments to demonstrate the benefit of each feature. As a result of the comparison, aerial photographs and sound-source features are efficient to predict impression information. Additionally, even if the sound-source features are predicted using acoustic and image features, the features also show fine results to predict the soundscape impression close to the result of oracle sound-source features.
翻译:我们根据对良好环境的印象调查城市特征的量化方法。城市特征的量化将有利于政府的政策规划、旅游项目等。在本研究中,我们试图利用云层遥感方法收集的可靠数据预测两种声景印象,即舒适和多动性。收集的声音包括使用全球定位系统记录位置的元信息。此外,声景印象和声源特征通过使用瑞典声音质量协议界定的评估,对云雾监测声音进行单独分配,评估声学环境的质量。预测模型是利用具有多层透视器的深神经网络构建的,用于输入10秒声音和其位置的航空照片。声学特征包括每秒相当的噪音水平和八波段过滤器输出结果,以及10秒的这些数据。一个图像特征是用ResNet-50和自动电离子结构的航空照片提取的。我们进行比较实验,以展示每个特征的效益。作为比较的结果,航空照片和声源特征是高效地预测印象。此外,如果声源特征是精确的图像结果,则也显示声源的图像结果。