CNC-Celeb:多族扬声器识别 (CN-Celeb: multi-genre speaker recognition) - 专知论文

会员服务 ·

0

声纹识别 · Performer · 数据集 · 噪声 · 示例 ·

2021 年 11 月 24 日

CN-Celeb: multi-genre speaker recognition

翻译：CNC-Celeb:多族扬声器识别

Lantian Li,Ruiqi Liu,Jiawen Kang,Yue Fan,Hao Cui,Yunqi Cai,Ravichander Vipperla,Thomas Fang Zheng,Dong Wang

from arxiv, submitted to Speech Communication

Research on speaker recognition is extending to address the vulnerability in the wild conditions, among which genre mismatch is perhaps the most challenging, for instance, enrollment with reading speech while testing with conversational or singing audio. This mismatch leads to complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) and extrinsic (i.e., recording device, background noise). Unfortunately, the few existing multi-genre corpora are not only limited in size but are also recorded under controlled conditions, which cannot support conclusive research on the multi-genre problem. In this work, we firstly publish CN-Celeb, a large-scale multi-genre corpus that includes in-the-wild speech utterances of 3,000 speakers in 11 different genres. Secondly, using this dataset, we conduct a comprehensive study on the multi-genre phenomenon, in particular the impact of the multi-genre challenge on speaker recognition and the performance gain when the new dataset is used to conduct multi-genre training.

翻译：有关扬声器识别的研究正在扩大,以解决在野生条件下的脆弱性问题,其中,族系不匹配可能是最具挑战性的问题,例如,在用谈话或歌唱音音音进行测试时,会以阅读语言注册,这种不匹配导致复杂的和复合的会间变异,既有内在的(即,语言风格、生理状态),也有外在的(即,录音装置、背景噪音),不幸的是,现有的少数多族族群不仅体积有限,而且记录在受控制的条件下,无法支持对多族系问题进行结论性研究。在这项工作中,我们首先出版了一个大型的CN-Celeb多族群,其中包括11种不同族系3 000名发言者的亲身讲话。第二,我们利用这一数据集,对多族现象进行全面研究,特别是多族系挑战对语音识别的影响,以及在使用新数据集进行多族系培训时的性能收益。

0

相关内容

声纹识别

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

专知会员服务

10+阅读 · 2019年11月28日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

专知会员服务

5+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

TiramisuASR：用TensorFlow实现的语音识别引擎

TiramisuASR：用TensorFlow实现的语音识别引擎

专知

3+阅读 · 2020年8月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

已删除

将门创投

5+阅读 · 2019年4月4日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

stackGAN通过文字描述生成图片的V2项目

stackGAN通过文字描述生成图片的V2项目

CreateAMind

3+阅读 · 2018年1月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Neural-FST Class Language Model for End-to-End Speech Recognition

Neural-FST Class Language Model for End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年1月31日

Multi-View Self-Attention Based Transformer for Speaker Recognition

Arxiv

0+阅读 · 2022年1月27日

A Survey on Neural Speech Synthesis

Arxiv

14+阅读 · 2021年6月30日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Speaker Recognition from raw waveform with SincNet

Arxiv

6+阅读 · 2018年7月29日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

Arxiv

4+阅读 · 2017年12月2日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

【文献综述】Text Detection and Recognition in the Wild: A Review 自然文本检测与识别

专知会员服务

46+阅读 · 2020年6月11日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

【CVPR 2019 | tutorial】野外家庭的视觉识别： Visual Recognition of Families In the Wild

专知会员服务

10+阅读 · 2019年11月28日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

【O'Reilly AI Conference 2019】让Alexa和Siri互相交流（Make Alexa and Siri speak with each other: Toward a universal grammar in AI），whoelse.ai创始人， Tobias

专知会员服务

5+阅读 · 2019年11月6日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

检索增强生成（RAG）技术，261页slides

美联参会指南-联合规划与执行概述及政策框架 | 32页

从DeepSeek-R1学到的三个核心经验

大规模视觉模型中的提示式适配：综述

相关资讯

TiramisuASR：用TensorFlow实现的语音识别引擎

TiramisuASR：用TensorFlow实现的语音识别引擎

专知

3+阅读 · 2020年8月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

已删除

将门创投

5+阅读 · 2019年4月4日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

stackGAN通过文字描述生成图片的V2项目

stackGAN通过文字描述生成图片的V2项目

CreateAMind

3+阅读 · 2018年1月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Neural-FST Class Language Model for End-to-End Speech Recognition

Neural-FST Class Language Model for End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年1月31日

Multi-View Self-Attention Based Transformer for Speaker Recognition

Arxiv

0+阅读 · 2022年1月27日

A Survey on Neural Speech Synthesis

Arxiv

14+阅读 · 2021年6月30日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Speaker Recognition from raw waveform with SincNet

Arxiv

6+阅读 · 2018年7月29日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

Arxiv

4+阅读 · 2017年12月2日

微信扫码咨询专知VIP会员