Measuring the acoustic characteristics of a space is often done by capturing its impulse response (IR), a representation of how a full-range stimulus sound excites it. This is the first work that generates an IR from a single image, which we call Image2Reverb. This IR is then applied to other signals using convolution, simulating the reverberant characteristics of the space shown in the image. Recording these IRs is both time-intensive and expensive, and often infeasible for inaccessible locations. We use an end-to-end neural network architecture to generate plausible audio impulse responses from single images of acoustic environments. We evaluate our method both by comparisons to ground truth data and by human expert evaluation. We demonstrate our approach by generating plausible impulse responses from diverse settings and formats including well known places, musical halls, rooms in paintings, images from animations and computer games, synthetic environments generated from text, panoramic images, and video conference backgrounds.
翻译:测量空间的声学特性通常通过捕捉其脉冲反应(IR)来进行, 表示全程刺激的声震如何刺激它。 这是第一次从一个图像中生成IR, 我们称之为imp2Reverb。 然后, 将这个IR应用到其他信号中, 使用卷变, 模拟图像中显示的空间的反动特性 。 记录这些IR 既耗时又昂贵, 也常常无法进入无法进入的位置 。 我们使用终端到终端神经网络架构来从声音环境的单一图像中产生合理的音频反应。 我们通过比较地面真实数据和人类专家评估来评估我们的方法。 我们展示了我们的方法, 从各种设置和格式中产生可信的脉冲反应, 包括众所周知的地点、 音乐厅、 绘画室、 动画和计算机游戏的图像、 文本产生的合成环境、 泛光图像 和 视频会议背景 。