Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization - 专知论文

会员服务 ·

0

Better · 基准 · 假阳性 · MoDELS · 英特尔 (Intel) ·

2023 年 8 月 10 日

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

翻译：暂无翻译

from arxiv, Validation report for the Ego4D challenge at ECCV 2022

This report describes our approach for the Audio-Visual Diarization (AVD) task of the Ego4D Challenge 2022. Specifically, we present multiple technical improvements over the official baselines. First, we improve the detection performance of the camera wearer's voice activity by modifying the training scheme of its model. Second, we discover that an off-the-shelf voice activity detection model can effectively remove false positives when it is applied solely to the camera wearer's voice activities. Lastly, we show that better active speaker detection leads to a better AVD outcome. Our final method obtains 65.9% DER on the test set of Ego4D, which significantly outperforms all the baselines. Our submission achieved 1st place in the Ego4D Challenge 2022.

翻译：暂无翻译

0

相关内容

Better

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

12+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

高频ZnO/IDT/SiO2/金刚石SAW乳腺癌抗原免疫传感器研究

国家自然科学基金

1+阅读 · 2015年12月31日

氢键(O:H-O)受激弛豫势能路径的键弛豫理论与声子谱学实验标定

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

水窗及水窗附近软X射线激光研究

国家自然科学基金

0+阅读 · 2014年12月31日

PDIWS: Thermal Imaging Dataset for Person Detection in Intrusion Warning Systems

Arxiv

0+阅读 · 2023年10月2日

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

Arxiv

0+阅读 · 2023年10月2日

CaveSeg: Deep Semantic Segmentation and Scene Parsing for Autonomous Underwater Cave Exploration

Arxiv

0+阅读 · 2023年9月28日

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment

Arxiv

0+阅读 · 2023年9月28日

FireFly v2: Advancing Hardware Support for High-Performance Spiking Neural Network with a Spatiotemporal FPGA Accelerator

Arxiv

0+阅读 · 2023年9月28日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Arxiv

13+阅读 · 2020年12月3日

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Arxiv

10+阅读 · 2020年3月13日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Arxiv

10+阅读 · 2018年4月29日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

英特尔 (Intel)

相关VIP内容

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

论文浅尝 | Question Answering over Freebase

论文浅尝 | Question Answering over Freebase

开放知识图谱

19+阅读 · 2018年1月9日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

PDIWS: Thermal Imaging Dataset for Person Detection in Intrusion Warning Systems

Arxiv

0+阅读 · 2023年10月2日

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

Arxiv

0+阅读 · 2023年10月2日

CaveSeg: Deep Semantic Segmentation and Scene Parsing for Autonomous Underwater Cave Exploration

Arxiv

0+阅读 · 2023年9月28日

NOMAD: Unsupervised Learning of Perceptual Embeddings for Speech Enhancement and Non-matching Reference Audio Quality Assessment

Arxiv

0+阅读 · 2023年9月28日

FireFly v2: Advancing Hardware Support for High-Performance Spiking Neural Network with a Spatiotemporal FPGA Accelerator

Arxiv

0+阅读 · 2023年9月28日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Arxiv

13+阅读 · 2020年12月3日

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection

Arxiv

10+阅读 · 2020年3月13日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Arxiv

10+阅读 · 2018年4月29日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

12+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

DMB信号水汽探测方法若干问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

高频ZnO/IDT/SiO2/金刚石SAW乳腺癌抗原免疫传感器研究

国家自然科学基金

1+阅读 · 2015年12月31日

氢键(O:H-O)受激弛豫势能路径的键弛豫理论与声子谱学实验标定

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

水窗及水窗附近软X射线激光研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员