图像2Reverb: 跨模式变换器脉冲反应合成 (Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis)

Measuring the acoustic characteristics of a space is often done by capturing its impulse response (IR), a representation of how a full-range stimulus sound excites it. This is the first work that generates an IR from a single image, which we call Image2Reverb. This IR is then applied to other signals using convolution, simulating the reverberant characteristics of the space shown in the image. Recording these IRs is both time-intensive and expensive, and often infeasible for inaccessible locations. We use an end-to-end neural network architecture to generate plausible audio impulse responses from single images of acoustic environments. We evaluate our method both by comparisons to ground truth data and by human expert evaluation. We demonstrate our approach by generating plausible impulse responses from diverse settings and formats including well known places, musical halls, rooms in paintings, images from animations and computer games, synthetic environments generated from text, panoramic images, and video conference backgrounds.

翻译：测量空间的声学特性通常通过捕捉其脉冲反应(IR)来进行, 表示全程刺激的声震如何刺激它。这是第一次从一个图像中生成IR, 我们称之为imp2Reverb。然后, 将这个IR应用到其他信号中, 使用卷变, 模拟图像中显示的空间的反动特性。记录这些IR 既耗时又昂贵, 也常常无法进入无法进入的位置。我们使用终端到终端神经网络架构来从声音环境的单一图像中产生合理的音频反应。我们通过比较地面真实数据和人类专家评估来评估我们的方法。我们展示了我们的方法, 从各种设置和格式中产生可信的脉冲反应, 包括众所周知的地点、音乐厅、绘画室、动画和计算机游戏的图像、文本产生的合成环境、泛光图像和视频会议背景。

相关内容

关注 14

信息检索杂志（IR）为信息检索的广泛领域中的理论、算法分析和实验的发布提供了一个国际论坛。感兴趣的主题包括对应用程序（例如Web，社交和流媒体，推荐系统和文本档案）的搜索、索引、分析和评估。这包括对搜索中人为因素的研究、桥接人工智能和信息检索以及特定领域的搜索应用程序。官网地址：https://dblp.uni-trier.de/db/journals/ir/

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知会员服务

24+阅读 · 2020年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日