野外发言人的认可 (Speaker Recognition in the Wild) - 专知论文

会员服务 ·

0

簇 · 可辨认的 · 声纹识别 · 语音识别 · 自动语音识别 ·

2022 年 5 月 5 日

Speaker Recognition in the Wild

翻译：野外发言人的认可

Neeraj Chhimwal,Anirudh Gupta,Rishabh Gaur,Harveen Singh Chadha,Priyanshi Shah,Ankur Dhuriya,Vivek Raghavan

from arxiv, This paper was submitted to Interspeech 2022

In this paper, we propose a pipeline to find the number of speakers, as well as audios belonging to each of these now identified speakers in a source of audio data where number of speakers or speaker labels are not known a priori. We used this approach as a part of our Data Preparation pipeline for Speech Recognition in Indic Languages (https://github.com/Open-Speech-EkStep/vakyansh-wav2vec2-experimentation). To understand and evaluate the accuracy of our proposed pipeline, we introduce two metrics: Cluster Purity, and Cluster Uniqueness. Cluster Purity quantifies how "pure" a cluster is. Cluster Uniqueness, on the other hand, quantifies what percentage of clusters belong only to a single dominant speaker. We discuss more on these metrics in section \ref{sec:metrics}. Since we develop this utility to aid us in identifying data based on speaker IDs before training an Automatic Speech Recognition (ASR) model, and since most of this data takes considerable effort to scrape, we also conclude that 98\% of data gets mapped to the top 80\% of clusters (computed by removing any clusters with less than a fixed number of utterances -- we do this to get rid of some very small clusters and use this threshold as 30), in the test set chosen.

翻译：在本文中,我们建议建立一个管道,以便在一个音频数据源中查找发言者的人数,以及属于其中每个现在确定的发言者的音频,因为在那里,发言者或发言者标签的数目是事先不为人知的。我们使用这一方法作为我们“承认英译语言语音”的数据准备管道的一部分(https://github.com/ Open-Speech-EkStep/vakyansh-vakyansh-wav2vec2-exeriation)。为了理解和评估我们提议的管道的准确性,我们引入了两个衡量标准:集群纯度和集群独特性。集纯度量度量化了一个组的“纯度”。另一方面,集群独特性量化了组中有多少组只属于一个主要发言者。我们在\ref{sec:量性}一节中更多地讨论这些计量标准,以便在培训自动语音识别模型之前,帮助我们根据发言者的识别数据,我们开发了两个衡量标准,并且由于大多数数据都花费了相当大的努力去“纯粹”一个组的“纯度。在另一组中,我们还得出结论,98* 将这一组的标数固定到这个组的标数在80个组中,比标数的组中的总数要降到80个组中。

0

相关内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

SIRT1介导组蛋白乙酰化在同型半胱氨酸致动脉粥样硬化中的作用及特异性miRNAs调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

pVHL磷酸化修饰及对其抑癌功能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

两个E3连接酶在FERONIA信号通路中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

PANDER-FOXO1信号通路在非酒精性脂肪肝发生过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

调节Kupffer细胞介导炎症通路改善追赶生长所致胰岛素抵抗

国家自然科学基金

0+阅读 · 2011年12月31日

LDA+Guztwiller方法研究铁基超导体

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Arxiv

0+阅读 · 2022年6月24日

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

Arxiv

0+阅读 · 2022年6月24日

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Arxiv

0+阅读 · 2022年6月24日

Towards End-to-End Private Automatic Speaker Recognition

Towards End-to-End Private Automatic Speaker Recognition

Arxiv

0+阅读 · 2022年6月23日

Template-based Approach to Zero-shot Intent Recognition

Arxiv

0+阅读 · 2022年6月22日

COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

Arxiv

0+阅读 · 2022年6月20日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

自动语音识别

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】开源权重模型的知识蒸馏检测

多模态大语言模型遇见多模态情绪识别与推理：综述

【NTU博士论文】面向高效感知与可扩展生成的三维物理世界

CMU《生成式人工智能》最新课程

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Arxiv

0+阅读 · 2022年6月24日

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

Arxiv

0+阅读 · 2022年6月24日

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Arxiv

0+阅读 · 2022年6月24日

Towards End-to-End Private Automatic Speaker Recognition

Towards End-to-End Private Automatic Speaker Recognition

Arxiv

0+阅读 · 2022年6月23日

Template-based Approach to Zero-shot Intent Recognition

Arxiv

0+阅读 · 2022年6月22日

COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

Arxiv

0+阅读 · 2022年6月20日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Arxiv

10+阅读 · 2018年3月8日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

相关基金

拓扑绝缘体与超导体耦合体系中交叉Andreev反射研究

国家自然科学基金

1+阅读 · 2014年12月31日

SIRT1介导组蛋白乙酰化在同型半胱氨酸致动脉粥样硬化中的作用及特异性miRNAs调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

pVHL磷酸化修饰及对其抑癌功能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

两个E3连接酶在FERONIA信号通路中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

PANDER-FOXO1信号通路在非酒精性脂肪肝发生过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

调节Kupffer细胞介导炎症通路改善追赶生长所致胰岛素抵抗

国家自然科学基金

0+阅读 · 2011年12月31日

LDA+Guztwiller方法研究铁基超导体

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员