PILOT: 引入概率健全事件本地化变异器 (PILOT: Introducing Transformers for Probabilistic Sound Event Localization) - 专知论文

会员服务 ·

0

估计/估计量 · 变换 · Neural Networks · Networks · state-of-the-art ·

2021 年 6 月 7 日

PILOT: Introducing Transformers for Probabilistic Sound Event Localization

翻译：PILOT: 引入概率健全事件本地化变异器

Christopher Schymura,Benedikt Bönninghoff,Tsubasa Ochiai,Marc Delcroix,Keisuke Kinoshita,Tomohiro Nakatani,Shoko Araki,Dorothea Kolossa

from arxiv, Accepted at INTERSPEECH 2021

Sound event localization aims at estimating the positions of sound sources in the environment with respect to an acoustic receiver (e.g. a microphone array). Recent advances in this domain most prominently focused on utilizing deep recurrent neural networks. Inspired by the success of transformer architectures as a suitable alternative to classical recurrent neural networks, this paper introduces a novel transformer-based sound event localization framework, where temporal dependencies in the received multi-channel audio signals are captured via self-attention mechanisms. Additionally, the estimated sound event positions are represented as multivariate Gaussian variables, yielding an additional notion of uncertainty, which many previously proposed deep learning-based systems designed for this application do not provide. The framework is evaluated on three publicly available multi-source sound event localization datasets and compared against state-of-the-art methods in terms of localization error and event detection accuracy. It outperforms all competing systems on all datasets with statistical significant differences in performance.

翻译：声音事件本地化旨在估计声音接收器(例如麦克风阵列)在环境中声音源的位置。该领域最近的进展最突出地侧重于利用深层的经常性神经网络。受变压器结构作为传统经常性神经网络的合适替代物的成功启发,本文件引入了一个新型的基于变压器的声音事件本地化框架,通过自省机制捕捉到收到的多频道音频信号的时间依赖性。此外,估计声音事件位置以多变化高斯变数为代表,产生了额外的不确定性概念,而许多先前为这一应用设计的深层基于学习的系统没有提供这种概念。这个框架以三种公开的多源声音事件本地化数据集为基础,并与最新的本地化错误和事件探测准确性方法进行比较,它超越了所有数据集上所有相互竞争的系统,其性能在统计上存在显著差异。

0

相关内容

估计/估计量

估计/估计量

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

已删除

将门创投

8+阅读 · 2019年6月13日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Transformer-based deep imitation learning for dual-arm robot manipulation

Arxiv

0+阅读 · 2021年8月1日

TASK3 DCASE2021 Challenge: Sound event localization and detection using squeeze-excitation residual CNNs

Arxiv

0+阅读 · 2021年7月30日

Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs

Arxiv

0+阅读 · 2021年7月30日

An optimal control approach to determine resistance-type boundary conditions from in-vivo data for cardiovascular simulations

Arxiv

0+阅读 · 2021年7月29日

Probabilistic and Geometric Depth: Detecting Objects in Perspective

Arxiv

0+阅读 · 2021年7月29日

High-dimensional modeling of spatial and spatio-temporal conditional extremes using INLA and the SPDE approach

Arxiv

0+阅读 · 2021年7月29日

Sound Event Detection with Adaptive Frequency Selection

Arxiv

0+阅读 · 2021年7月29日

Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers

Arxiv

0+阅读 · 2021年7月28日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Arxiv

4+阅读 · 2018年7月4日

VIP会员

文章信息

相关主题

估计/估计量

Neural Networks

state-of-the-art

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

已删除

将门创投

8+阅读 · 2019年6月13日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

【泡泡前沿追踪】跟踪SLAM前沿动态系列之IROS2018

泡泡机器人SLAM

29+阅读 · 2018年10月28日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

相关论文

Transformer-based deep imitation learning for dual-arm robot manipulation

Arxiv

0+阅读 · 2021年8月1日

TASK3 DCASE2021 Challenge: Sound event localization and detection using squeeze-excitation residual CNNs

Arxiv

0+阅读 · 2021年7月30日

Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs

Arxiv

0+阅读 · 2021年7月30日

An optimal control approach to determine resistance-type boundary conditions from in-vivo data for cardiovascular simulations

Arxiv

0+阅读 · 2021年7月29日

Probabilistic and Geometric Depth: Detecting Objects in Perspective

Arxiv

0+阅读 · 2021年7月29日

High-dimensional modeling of spatial and spatio-temporal conditional extremes using INLA and the SPDE approach

Arxiv

0+阅读 · 2021年7月29日

Sound Event Detection with Adaptive Frequency Selection

Arxiv

0+阅读 · 2021年7月29日

Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers

Arxiv

0+阅读 · 2021年7月28日

A Survey of Transformers

Arxiv

103+阅读 · 2021年6月8日

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Arxiv

4+阅读 · 2018年7月4日

微信扫码咨询专知VIP会员