MM-ALT:多式自动音频传输系统 (MM-ALT: A Multimodal Automatic Lyric Transcription System) - 专知论文

会员服务 ·

0

转录系统 · ALT · 多峰值 · 语音识别 · INFORMS ·

2023 年 2 月 17 日

MM-ALT: A Multimodal Automatic Lyric Transcription System

翻译：MM-ALT:多式自动音频传输系统

Xiangming Gu,Longshen Ou,Danielle Ong,Ye Wang

from arxiv, Accepted by ACM Multimedia 2022. Camera ready version and appendix

Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest from both the speech and music information retrieval communities, given its significant application potential. However, ALT with audio data alone is a notoriously difficult task due to instrumental accompaniment and musical constraints resulting in degradation of both the phonetic cues and the intelligibility of sung lyrics. To tackle this challenge, we propose the MultiModal Automatic Lyric Transcription system (MM-ALT), together with a new dataset, N20EM, which consists of audio recordings, videos of lip movements, and inertial measurement unit (IMU) data of an earbud worn by the performing singer. We first adapt the wav2vec 2.0 framework from automatic speech recognition (ASR) to the ALT task. We then propose a video-based ALT method and an IMU-based voice activity detection (VAD) method. In addition, we put forward the Residual Cross Attention (RCA) mechanism to fuse data from the three modalities (i.e., audio, video, and IMU). Experiments show the effectiveness of our proposed MM-ALT system, especially in terms of noise robustness. Project page is at https://n20em.github.io.

翻译：自动读音系统(ALT)是一个新生的研究领域,它吸引了语言和音乐信息检索社区越来越多的兴趣,因为它具有巨大的应用潜力。然而,单凭音频数据的ALT是一个众所周知的困难任务,因为有工具的配合和音乐限制导致语音提示的退化和歌词的可感性。为了应对这一挑战,我们提议采用多式自动读音记录系统(MM-ALT),以及一个新的数据集N20EM,它包括音乐录音、嘴唇运动视频和表演歌手戴的耳膜测量单位(IMU)数据。我们首先将Wav2vec 2.0框架从自动语音识别(ASR)改为自动语音识别(ALT)任务。我们然后提出以视频为基础的ALT方法和以IMU为基础的语音活动探测(VAD)方法。此外,我们提出了从三种模式(即音频、视频和惯性测量单位)中整合数据(N20EMMU)的机制。实验显示我们拟议的AM-LT系统的有效性,特别是在MALTM系统中。

0

相关内容

转录系统

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】深度学习思维导图

【推荐】深度学习思维导图

机器学习研究会

15+阅读 · 2017年8月20日

GSK-3β抑制PP2A甲酯酶表达的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

c-FLIP调控细胞坏死性凋亡在糖尿病肾病进展中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Faecalibacterium prausnitzii协同LFA-1在炎症性肠病发生中调控淋巴细胞分化及功能的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

A2AlTaO7基稀土钽铝酸盐设计及热物理性能调控

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

人细胞型朊病毒基因PRNP的表达调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

利用γ948;T细胞进行恶性肿瘤过继免疫治疗的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Arxiv

0+阅读 · 2023年4月11日

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Arxiv

0+阅读 · 2023年4月11日

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Arxiv

0+阅读 · 2023年4月10日

Reconstruction-driven Dynamic Refinement based Unsupervised Domain Adaptation for Joint Optic Disc and Cup Segmentation

Arxiv

0+阅读 · 2023年4月10日

Editable User Profiles for Controllable Text Recommendation

Arxiv

0+阅读 · 2023年4月9日

Efficient Multimodal Sampling via Tempered Distribution Flow

Arxiv

0+阅读 · 2023年4月8日

Multi-Modal Self-Supervised Learning for Recommendation

Arxiv

0+阅读 · 2023年4月7日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

基于大型语言模型的网络威胁情报：利用LLM提取MITRE ATT&CK技术 | 最新文献

无人机（UAV）战略：区域大国与暴力非国家行为体在中东冲突中对无人机的运用 | 130页

神经技术与未来无人机战争的交汇点 | 最新报告

美国从“蛛网行动”中汲取轰炸机舰队保护教训

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】深度学习思维导图

【推荐】深度学习思维导图

机器学习研究会

15+阅读 · 2017年8月20日

相关论文

HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models

Arxiv

0+阅读 · 2023年4月11日

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Arxiv

0+阅读 · 2023年4月11日

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Arxiv

0+阅读 · 2023年4月10日

Reconstruction-driven Dynamic Refinement based Unsupervised Domain Adaptation for Joint Optic Disc and Cup Segmentation

Arxiv

0+阅读 · 2023年4月10日

Editable User Profiles for Controllable Text Recommendation

Arxiv

0+阅读 · 2023年4月9日

Efficient Multimodal Sampling via Tempered Distribution Flow

Arxiv

0+阅读 · 2023年4月8日

Multi-Modal Self-Supervised Learning for Recommendation

Arxiv

0+阅读 · 2023年4月7日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

相关基金

GSK-3β抑制PP2A甲酯酶表达的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

c-FLIP调控细胞坏死性凋亡在糖尿病肾病进展中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Faecalibacterium prausnitzii协同LFA-1在炎症性肠病发生中调控淋巴细胞分化及功能的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

A2AlTaO7基稀土钽铝酸盐设计及热物理性能调控

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

人细胞型朊病毒基因PRNP的表达调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

利用γ948;T细胞进行恶性肿瘤过继免疫治疗的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Curcumin双向调控HO-1/HO-2协同抑制Aβeme复合物防治AD的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

多文种文档图像识别的多层次马尔可夫随机场模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员