MAAS: 积极监听器探测的多模式分配 (MAAS: Multi-modal Assignation for Active Speaker Detection) - 专知论文

会员服务 ·

0

Extensibility · 近似 · 图 · state-of-the-art · MoDELS ·

2021 年 1 月 11 日

MAAS: Multi-modal Assignation for Active Speaker Detection

翻译：MAAS: 积极监听器探测的多模式分配

Juan León-Alcázar,Fabian Caba Heilbron,Ali Thabet,Bernard Ghanem

Active speaker detection requires a solid integration of multi-modal cues. While individual modalities can approximate a solution, accurate predictions can only be achieved by explicitly fusing the audio and visual features and modeling their temporal progression. Despite its inherent muti-modal nature, current methods still focus on modeling and fusing short-term audiovisual features for individual speakers, often at frame level. In this paper we present a novel approach to active speaker detection that directly addresses the multi-modal nature of the problem, and provides a straightforward strategy where independent visual features from potential speakers in the scene are assigned to a previously detected speech event. Our experiments show that, an small graph data structure built from a single frame, allows to approximate an instantaneous audio-visual assignment problem. Moreover, the temporal extension of this initial graph achieves a new state-of-the-art on the AVA-ActiveSpeaker dataset with a mAP of 88.8\%.

翻译：主动语音信号的探测需要多式提示的坚实整合。虽然单个模式可以近似一种解决方案, 准确的预测只能通过明确折叠音频和视觉特征和模拟其时间进展来实现。尽管其固有的超模式性质, 目前的方法仍然侧重于为个别发言者建模和引信短期视听特征, 通常在框架一级。在本文中, 我们展示了一种新颖的主动语音信号探测方法, 直接解决了问题的多式性质, 并提供直接的战略, 将现场潜在发言者的独立视觉特征指定给先前发现的语音事件。我们的实验显示, 从一个框架建起的小图形数据结构可以近似瞬时的视听分配问题。此外, 初始图形的暂时扩展可以实现AVA- ApioSpeaker数据集的一个新状态, 其MAP为88.8 ⁇ 。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

已删除

雪球

6+阅读 · 2018年8月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Arxiv

7+阅读 · 2020年3月11日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Speeding-up Object Detection Training for Robotics with FALKON

Speeding-up Object Detection Training for Robotics with FALKON

Arxiv

6+阅读 · 2018年8月27日

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

Arxiv

3+阅读 · 2018年3月4日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

已删除

雪球

6+阅读 · 2018年8月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Arxiv

7+阅读 · 2020年3月11日

Augmentation for small object detection

Augmentation for small object detection

Arxiv

13+阅读 · 2019年2月19日

Domain Specific Approximation for Object Detection

Arxiv

5+阅读 · 2018年10月4日

Speeding-up Object Detection Training for Robotics with FALKON

Speeding-up Object Detection Training for Robotics with FALKON

Arxiv

6+阅读 · 2018年8月27日

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

Arxiv

3+阅读 · 2018年3月4日

微信扫码咨询专知VIP会员