重新思考视听同步,以进行积极音速器探测 (Rethinking Audio-visual Synchronization for Active Speaker Detection) - 专知论文

会员服务 ·

0

Extensibility · MoDELS · motivation · Attention · Learning ·

2022 年 6 月 21 日

Rethinking Audio-visual Synchronization for Active Speaker Detection

翻译：重新思考视听同步,以进行积极音速器探测

Abudukelimu Wuerkaixi,You Zhang,Zhiyao Duan,Changshui Zhang

from arxiv, Accepted by IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2022)

Active speaker detection (ASD) systems are important modules for analyzing multi-talker conversations. They aim to detect which speakers or none are talking in a visual scene at any given time. Existing research on ASD does not agree on the definition of active speakers. We clarify the definition in this work and require synchronization between the audio and visual speaking activities. This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking. To address this problem, we propose a cross-modal contrastive learning strategy and apply positional encoding in attention modules for supervised ASD models to leverage the synchronization cue. Experimental results suggest that our model can successfully detect unsynchronized speaking as not speaking, addressing the limitation of current models.

翻译：主动语音探测系统(ASD)是分析多对话的重要模块,目的是检测在任何特定时间的视觉场景中发言者或无人在讲话。关于ASD的现有研究对主动发言者的定义并不一致。我们澄清了这项工作的定义,要求声频和视觉语音活动同步。这种定义的澄清是因为我们进行了广泛的实验,我们通过这些实验发现,现有的ASD方法未能模拟视听同步,常常将不同步的视频归类为积极发言。为了解决这一问题,我们建议采用一种跨现代对比学习战略,并在受监督的ASD模型的注意力模块中应用位置编码来利用同步提示。实验结果表明,我们的模型能够成功地检测出非同步的语句,解决当前模型的局限性。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

瘦素调节2型糖尿病大鼠交感神经活性及压力反射敏感性的机制

国家自然科学基金

0+阅读 · 2015年12月31日

认知天波超视距雷达低可探测目标检测方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

WO3纳米粒子修饰的N掺杂TiO2中空纳米棒阵列的制备及其可见光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

重金属废水制备新型Ferrite/LDH纳米复合材料及其催化吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于电纺纳米纤维膜的重金属离子高效吸附材料的制备及其性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

Survivin 在瘢痕疙瘩中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PPARγ拮抗Egr-1对增生性瘢痕TGF-β1促纤维化信号的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

Calreticulin-STAT3/PKC信号通路介导的线粒体损伤在扩张型心肌病发病中的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

功能性高分子纳米复合材料三维微纳结构可控制备

国家自然科学基金

0+阅读 · 2009年12月31日

β4GalT I在肝癌中的作用及其转录调控研究

国家自然科学基金

0+阅读 · 2008年12月31日

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Arxiv

0+阅读 · 2022年8月10日

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

Arxiv

0+阅读 · 2022年8月9日

Hierarchical Interpretation of Neural Text Classification

Arxiv

0+阅读 · 2022年8月9日

Safe Data Collection for Offline and Online Policy Learning

Arxiv

0+阅读 · 2022年8月4日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

22+阅读 · 2021年9月23日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用大语言模型（LLM）优化海军陆战队经验教训学习》2025年最新103页

《加拿大陆军顶层作战概念》2025最新33页

超越第一人称视角（FPV）无人机：汲取俄乌战争的全部教训

《瓦洛伦斯（ValoRens）项目 - 预测分析：解读敌方意图》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Arxiv

0+阅读 · 2022年8月10日

Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

Arxiv

0+阅读 · 2022年8月9日

Hierarchical Interpretation of Neural Text Classification

Arxiv

0+阅读 · 2022年8月9日

Safe Data Collection for Offline and Online Policy Learning

Arxiv

0+阅读 · 2022年8月4日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

64+阅读 · 2021年10月25日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

22+阅读 · 2021年9月23日

Mining Dual Emotion for Fake News Detection

Arxiv

13+阅读 · 2020年10月19日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

相关基金

瘦素调节2型糖尿病大鼠交感神经活性及压力反射敏感性的机制

国家自然科学基金

0+阅读 · 2015年12月31日

认知天波超视距雷达低可探测目标检测方法研究

国家自然科学基金

2+阅读 · 2014年12月31日

WO3纳米粒子修饰的N掺杂TiO2中空纳米棒阵列的制备及其可见光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

重金属废水制备新型Ferrite/LDH纳米复合材料及其催化吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于电纺纳米纤维膜的重金属离子高效吸附材料的制备及其性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

Survivin 在瘢痕疙瘩中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PPARγ拮抗Egr-1对增生性瘢痕TGF-β1促纤维化信号的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

Calreticulin-STAT3/PKC信号通路介导的线粒体损伤在扩张型心肌病发病中的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

功能性高分子纳米复合材料三维微纳结构可控制备

国家自然科学基金

0+阅读 · 2009年12月31日

β4GalT I在肝癌中的作用及其转录调控研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员