Ego4D挑战2022的英特尔实验室:更好的视听分裂基线 (Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization) - 专知论文

会员服务 ·

0

Better · 基准 · 假阳性 · MoDELS · 英特尔 (Intel) ·

2022 年 10 月 14 日

Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization

翻译：Ego4D挑战2022的英特尔实验室:更好的视听分裂基线

from arxiv, Validation report for the Ego4D challenge at ECCV 2022

This report describes our approach for the Audio-Visual Diarization (AVD) task of the Ego4D Challenge 2022. Specifically, we present multiple technical improvements over the official baselines. First, we improve the detection performance of the camera wearer's voice activity by modifying the training scheme of its model. Second, we discover that an off-the-shelf voice activity detection model can effectively remove false positives when it is applied solely to the camera wearer's voice activities. Lastly, we show that better active speaker detection leads to a better AVD outcome. Our final method obtains 65.9% DER on the test set of Ego4D, which significantly outperforms all the baselines. Our submission achieved 1st place in the Ego4D Challenge 2022.

翻译：本报告介绍了我们执行Ego4D挑战2022的视听分解(AVD)任务的方法。具体地说,我们介绍了官方基线的多项技术改进。首先,我们通过修改其模型的培训计划,改进了摄影机磨损器语音活动的检测性能。第二,我们发现,当光机磨损器的语音活动检测模型仅用于摄影机的语音活动时,现成的语音活动检测模型可以有效地消除虚假的阳性。最后,我们显示,更积极的语音检测可导致更好的AVD结果。我们的最后方法在Ego4D测试中获得了65.9%的DER,大大超越了所有基线。我们的呈件在Ego4D挑战2022中达到了第1位。

0

相关内容

Better

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

神经元凋亡时GSK-3/Egr-1上调PUMA的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

投加低浓度臭氧控制VOCs生物过滤系统生物量的机理及控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于单粒子精确照射和SELEX技术筛选辐射损伤指示物适配体研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点体系中的自旋输运动力学：基于自旋分辨的全计数统计研究

国家自然科学基金

0+阅读 · 2012年12月31日

新的核膜定位分子TRAF3IP3促进细胞增殖的机制研究及其在血液系统细胞中的功能探索

国家自然科学基金

0+阅读 · 2011年12月31日

用于兰州HIRFL－CSR内外靶实验飞行时间探测器的多气隙电阻板室研制

国家自然科学基金

0+阅读 · 2009年12月31日

电子回旋共振放电电离特性的PIC/MCC模拟

国家自然科学基金

0+阅读 · 2009年12月31日

二苯乙烯苷对氧化应激诱导的内皮细胞凋亡的影响

国家自然科学基金

0+阅读 · 2008年12月31日

AVATAR submission to the Ego4D AV Transcription Challenge

Arxiv

0+阅读 · 2022年11月18日

Data-Centric Debugging: mitigating model failures via targeted data collection

Arxiv

0+阅读 · 2022年11月17日

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Arxiv

0+阅读 · 2022年11月17日

Cross-Modal Adapter for Text-Video Retrieval

Arxiv

0+阅读 · 2022年11月17日

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

Arxiv

0+阅读 · 2022年11月17日

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Arxiv

1+阅读 · 2022年11月17日

Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization

Arxiv

0+阅读 · 2022年11月17日

Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022

Arxiv

0+阅读 · 2022年11月16日

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

Arxiv

0+阅读 · 2022年11月16日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

VIP会员

文章信息

相关主题

英特尔 (Intel)

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能赋能自主武器与人类控制第三部分：人类控制与系统操作员 | 35页

人工智能赋能自主武器与人类控制第一部分：人类控制与机器学习的设计和开发 | 46页

军事指挥控制系统：2025年5种用途

人工智能赋能自主武器与人类控制第二部分：人类控制与军事指挥官 | 38页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

AVATAR submission to the Ego4D AV Transcription Challenge

Arxiv

0+阅读 · 2022年11月18日

Data-Centric Debugging: mitigating model failures via targeted data collection

Arxiv

0+阅读 · 2022年11月17日

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Arxiv

0+阅读 · 2022年11月17日

Cross-Modal Adapter for Text-Video Retrieval

Arxiv

0+阅读 · 2022年11月17日

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

Arxiv

0+阅读 · 2022年11月17日

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

Arxiv

1+阅读 · 2022年11月17日

Divide and Conquer: 3D Point Cloud Instance Segmentation With Point-Wise Binarization

Arxiv

0+阅读 · 2022年11月17日

Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022

Arxiv

0+阅读 · 2022年11月16日

A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge

Arxiv

0+阅读 · 2022年11月16日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

相关基金

Zakharov系统的解的动力学行为研究

国家自然科学基金

0+阅读 · 2015年12月31日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

神经元凋亡时GSK-3/Egr-1上调PUMA的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

投加低浓度臭氧控制VOCs生物过滤系统生物量的机理及控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于单粒子精确照射和SELEX技术筛选辐射损伤指示物适配体研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点体系中的自旋输运动力学：基于自旋分辨的全计数统计研究

国家自然科学基金

0+阅读 · 2012年12月31日

新的核膜定位分子TRAF3IP3促进细胞增殖的机制研究及其在血液系统细胞中的功能探索

国家自然科学基金

0+阅读 · 2011年12月31日

用于兰州HIRFL－CSR内外靶实验飞行时间探测器的多气隙电阻板室研制

国家自然科学基金

0+阅读 · 2009年12月31日

电子回旋共振放电电离特性的PIC/MCC模拟

国家自然科学基金

0+阅读 · 2009年12月31日

二苯乙烯苷对氧化应激诱导的内皮细胞凋亡的影响

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员