MM-ALT:多式自动音频传输系统 (MM-ALT: A Multimodal Automatic Lyric Transcription System) - 专知论文

会员服务 ·

0

ALT · 转录系统 · 多峰值 · 语音识别 · INFORMS ·

2022 年 10 月 5 日

MM-ALT: A Multimodal Automatic Lyric Transcription System

翻译：MM-ALT:多式自动音频传输系统

Xiangming Gu,Longshen Ou,Danielle Ong,Ye Wang

from arxiv, Accepted by ACM Multimedia 2022. Camera ready version and correct some typos

Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest from both the speech and music information retrieval communities, given its significant application potential. However, ALT with audio data alone is a notoriously difficult task due to instrumental accompaniment and musical constraints resulting in degradation of both the phonetic cues and the intelligibility of sung lyrics. To tackle this challenge, we propose the MultiModal Automatic Lyric Transcription system (MM-ALT), together with a new dataset, N20EM, which consists of audio recordings, videos of lip movements, and inertial measurement unit (IMU) data of an earbud worn by the performing singer. We first adapt the wav2vec 2.0 framework from automatic speech recognition (ASR) to the ALT task. We then propose a video-based ALT method and an IMU-based voice activity detection (VAD) method. In addition, we put forward the Residual Cross Attention (RCA) mechanism to fuse data from the three modalities (i.e., audio, video, and IMU). Experiments show the effectiveness of our proposed MM-ALT system, especially in terms of noise robustness. Project page is at https://n20em.github.io.

翻译：自动读音系统(ALT)是一个新生的研究领域,它吸引了语言和音乐信息检索社区越来越多的兴趣,因为它具有巨大的应用潜力。然而,单凭音频数据的ALT是一个众所周知的困难任务,因为有工具的配合和音乐限制导致语音提示的退化和歌词的可感性。为了应对这一挑战,我们提议采用多式自动读音记录系统(MM-ALT),以及一个新的数据集N20EM,它包括音乐录音、嘴唇运动视频和表演歌手戴的耳膜测量单位(IMU)数据。我们首先将Wav2vec 2.0框架从自动语音识别(ASR)改为自动语音识别(ALT)任务。我们然后提出以视频为基础的ALT方法和以IMU为基础的语音活动探测(VAD)方法。此外,我们提出了从三种模式(即音频、视频和惯性测量单位)中整合数据(N20EMMU)的机制。实验显示我们拟议的AM-LT系统的有效性,特别是在MALTM系统中。

0

相关内容

ALT

International Conference on Algorithmic Learning Theory（ALT）是由算法学习理论协会（AALT），和其他相关活动一起来推广学习理论。官网链接：http://alt2019.algorithmiclearningtheory.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

转甲状腺素蛋白在运动改善肥胖小鼠胰岛素抵抗中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

线粒体电压依赖性阴离子通道蛋白调节足细胞炎症小体激活在糖尿病肾病中的致病机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Clustered Federated Learning based on Nonconvex Pairwise Fusion

Arxiv

0+阅读 · 2022年11月8日

RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection

Arxiv

0+阅读 · 2022年11月8日

Automatic Change-Point Detection in Time Series via Deep Learning

Arxiv

0+阅读 · 2022年11月7日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机系统 - 反无人机系统：测试方法》364页

《无人机蜂群攻击防御的预测建模：面向美军战备的人工智能轨迹预测与最优拦截策略设计》最新报告

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

《将空中力量带向海洋：美国海军航空发展的四条竞争路径及其教训》报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Clustered Federated Learning based on Nonconvex Pairwise Fusion

Arxiv

0+阅读 · 2022年11月8日

RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection

Arxiv

0+阅读 · 2022年11月8日

Automatic Change-Point Detection in Time Series via Deep Learning

Arxiv

0+阅读 · 2022年11月7日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

A Robust Real-Time Automatic License Plate Recognition based on the YOLO Detector

Arxiv

13+阅读 · 2018年3月1日

相关基金

Klotho抑制TRPC6诱导的足细胞损伤在糖尿病肾病中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

转甲状腺素蛋白在运动改善肥胖小鼠胰岛素抵抗中的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

线粒体电压依赖性阴离子通道蛋白调节足细胞炎症小体激活在糖尿病肾病中的致病机制

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员