点点对话:演讲者在野外的diarization (Spot the conversation: speaker diarisation in the wild) - 专知论文

会员服务 ·

0

可约的 · 数据集 · Integration · YouTube · 多样性 ·

2021 年 8 月 15 日

Spot the conversation: speaker diarisation in the wild

翻译：点点对话:演讲者在野外的diarization

Joon Son Chung,Jaesung Huh,Arsha Nagrani,Triantafyllos Afouras,Andrew Zisserman

from arxiv, The dataset will be available for download from http://www.robots.ox.ac.uk/~vgg/data/voxceleb/voxconverse.html . The development set will be released in July 2020, and the test set will be released in October 2020

The goal of this paper is speaker diarisation of videos collected 'in the wild'. We make three key contributions. First, we propose an automatic audio-visual diarisation method for YouTube videos. Our method consists of active speaker detection using audio-visual methods and speaker verification using self-enrolled speaker models. Second, we integrate our method into a semi-automatic dataset creation pipeline which significantly reduces the number of hours required to annotate videos with diarisation labels. Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community. Our dataset consists of overlapping speech, a large and diverse speaker pool, and challenging background conditions.

翻译：本文的目标是对“ 野生” 收集的视频进行语音分解。我们做出三大贡献。首先, 我们提出YouTube视频的自动视听分解方法。我们的方法包括使用视听方法和自我放大的演讲模型对演讲者进行积极的检测。其次, 我们将我们的方法整合到半自动数据元件创建管道中, 从而大大减少用分解标签对视频进行批注所需的小时数。最后, 我们利用这条管道来创建大规模分解数据集, 名为“ 野生” 视频中收集的VoxConvers, 我们将向研究界公开发布这些数据。我们的数据集由相互重叠的演讲、大型和多样化的演讲者库以及具有挑战性的背景条件组成。

0

相关内容

可约的

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

92+阅读 · 2020年7月10日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

专知会员服务

5+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Arxiv

0+阅读 · 2021年10月12日

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Arxiv

0+阅读 · 2021年10月12日

Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

Arxiv

0+阅读 · 2021年10月9日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize

Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize

Arxiv

5+阅读 · 2018年12月27日

Neural Approaches to Conversational AI

Neural Approaches to Conversational AI

Arxiv

8+阅读 · 2018年12月13日

DAiSEE: Towards User Engagement Recognition in the Wild

Arxiv

5+阅读 · 2018年4月12日

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Arxiv

4+阅读 · 2018年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

最新《自动机器学习》综述论文，AutoML: A Survey of the State-of-the-Art

专知会员服务

92+阅读 · 2020年7月10日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

专知会员服务

5+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

计算机视觉的不同任务

计算机视觉的不同任务

专知

5+阅读 · 2018年8月27日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

深度学习医学图像分析文献集

深度学习医学图像分析文献集

机器学习研究会

19+阅读 · 2017年10月13日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Arxiv

0+阅读 · 2021年10月12日

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Arxiv

0+阅读 · 2021年10月12日

Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

Arxiv

0+阅读 · 2021年10月9日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Domain Generalization in Vision: A Survey

Arxiv

16+阅读 · 2021年7月18日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize

Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize

Arxiv

5+阅读 · 2018年12月27日

Neural Approaches to Conversational AI

Neural Approaches to Conversational AI

Arxiv

8+阅读 · 2018年12月13日

DAiSEE: Towards User Engagement Recognition in the Wild

Arxiv

5+阅读 · 2018年4月12日

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Arxiv

4+阅读 · 2018年4月2日

微信扫码咨询专知VIP会员