您只听到一次: 一个类似 YOLO 的音频分割和音频事件检测的 YOLO 算法 (You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection) - 专知论文

会员服务 ·

0

ONCE · Analysis · INFORMS · 神经元 · Performer ·

2022 年 9 月 18 日

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

翻译：您只听到一次: 一个类似 YOLO 的音频分割和音频事件检测的 YOLO 算法

Satvik Venkatesh,David Moffat,Eduardo Reck Miranda

from arxiv, 19 pages, 4 figures, 8 tables. Added more experimental validation and background information. Published in Applied Sciences

Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.

翻译：音频分解和音频事件检测是机器监听中的关键话题,目的是检测声学类及其各自的界限。它对于音频内容分析、语音识别、音量索引和音乐信息检索非常有用。近年来,大多数研究文章采用逐级分解法。这个技术将音频分为小框架,并单独对这些框架进行分类。在本文中, 我们展示了一种新颖的方法, 名为“ 你只听一次( YOHO) ”, 由计算机视野中流行的YOLO 算法启发。我们将声学边界的检测转换成一个回归问题, 而不是基于框架的分类。这样做的方法是, 由不同的输出神经元来检测音频类的存在, 并预测其开始和结束点。 YOHO 的F度测量相对改进, 与最先进的演进常规神经网络相比, 在多个数据集之间, 1%到6% 用于音频段分解和声音事件检测。由于YOHOO的输出是更接近尾端端, 且神经量较少预测, 推判速度至少是6倍的平时程速度。

0

相关内容

ONCE

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

苹果炭疽叶枯病菌致病相关基因GCVIR1的鉴定及其致病变异位点分析

国家自然科学基金

0+阅读 · 2015年12月31日

lincRNA-IFNL3P1在丙肝病毒感染中对III型干扰素基因表达调控的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

胚胎干细胞肝向分化的力学-生物学耦合规律及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

水稻亚种间新合成四倍体早期世代基因组变异

国家自然科学基金

0+阅读 · 2013年12月31日

Notch2/γ-secretase抑制人椎间盘髓核细胞凋亡及细胞外基质降解的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

特定lincRNA在体细胞重编程中的功能与机制

国家自然科学基金

0+阅读 · 2012年12月31日

多层结构含水层系统土体变形机理和地面沉降预测

国家自然科学基金

0+阅读 · 2012年12月31日

固体SCR的作用机理和匹配理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

污水处理系统中以质粒和整合子为载体的抗生素抗性基因的分子传播机制及抗性基因的归趋

国家自然科学基金

0+阅读 · 2011年12月31日

A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector

A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector

Arxiv

0+阅读 · 2022年10月27日

Transforming the Interactive Segmentation for Medical Imaging

Arxiv

0+阅读 · 2022年10月27日

Domain Adaptive Segmentation of Electron Microscopy with Sparse Point Annotations

Arxiv

0+阅读 · 2022年10月26日

Refining Action Boundaries for One-stage Detection

Arxiv

0+阅读 · 2022年10月25日

A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images

Arxiv

0+阅读 · 2022年10月25日

Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding

Arxiv

0+阅读 · 2022年10月21日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Arxiv

13+阅读 · 2020年12月3日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Arxiv

20+阅读 · 2019年10月25日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector

A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector

Arxiv

0+阅读 · 2022年10月27日

Transforming the Interactive Segmentation for Medical Imaging

Arxiv

0+阅读 · 2022年10月27日

Domain Adaptive Segmentation of Electron Microscopy with Sparse Point Annotations

Arxiv

0+阅读 · 2022年10月26日

Refining Action Boundaries for One-stage Detection

Arxiv

0+阅读 · 2022年10月25日

A Comparative Attention Framework for Better Few-Shot Object Detection on Aerial Images

Arxiv

0+阅读 · 2022年10月25日

Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding

Arxiv

0+阅读 · 2022年10月21日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Arxiv

13+阅读 · 2020年12月3日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Arxiv

20+阅读 · 2019年10月25日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

相关基金

苹果炭疽叶枯病菌致病相关基因GCVIR1的鉴定及其致病变异位点分析

国家自然科学基金

0+阅读 · 2015年12月31日

lincRNA-IFNL3P1在丙肝病毒感染中对III型干扰素基因表达调控的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

胚胎干细胞肝向分化的力学-生物学耦合规律及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

水稻亚种间新合成四倍体早期世代基因组变异

国家自然科学基金

0+阅读 · 2013年12月31日

Notch2/γ-secretase抑制人椎间盘髓核细胞凋亡及细胞外基质降解的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

特定lincRNA在体细胞重编程中的功能与机制

国家自然科学基金

0+阅读 · 2012年12月31日

多层结构含水层系统土体变形机理和地面沉降预测

国家自然科学基金

0+阅读 · 2012年12月31日

固体SCR的作用机理和匹配理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

污水处理系统中以质粒和整合子为载体的抗生素抗性基因的分子传播机制及抗性基因的归趋

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员