不要说得太快:数据偏差对自我监督的演讲模式的影响 (Don't speak too fast: The impact of data bias on self-supervised speech models) - 专知论文

会员服务 ·

0

有偏 · 有偏数据集 · MoDELS · FAST · Performer ·

2022 年 4 月 26 日

Don't speak too fast: The impact of data bias on self-supervised speech models

翻译：不要说得太快:数据偏差对自我监督的演讲模式的影响

Yen Meng,Yi-Hui Chou,Andy T. Liu,Hung-yi Lee

from arxiv, Accepted by ICASSP 2022

Self-supervised Speech Models (S3Ms) have been proven successful in many speech downstream tasks, like ASR. However, how pre-training data affects S3Ms' downstream behavior remains an unexplored issue. In this paper, we study how pre-training data affects S3Ms by pre-training models on biased datasets targeting different factors of speech, including gender, content, and prosody, and evaluate these pre-trained S3Ms on selected downstream tasks in SUPERB Benchmark. Our experiments show that S3Ms have tolerance toward gender bias. Moreover, we find that the content of speech has little impact on the performance of S3Ms across downstream tasks, but S3Ms do show a preference toward a slower speech rate.

翻译：自我监督的演讲模式(S3Ms)在许多演讲的下游任务(如ASR)中被证明是成功的。然而,培训前数据如何影响S3Ms的下游行为仍然是一个尚未探讨的问题。在本文中,我们研究了培训前数据如何通过针对性别、内容和手工业等不同言论因素的有偏见的数据集的培训前模型影响S3Ms。我们实验显示,S3Ms对性别偏见持容忍态度。此外,我们发现,演讲的内容对S3Ms在下游任务中的性能影响不大,但S3Ms确实表现出倾向于更慢的言语率。

0

相关内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Tau蛋白异常导致学习记忆损害的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于压电本征横向驱动机制的大位移MEMS驱动器关键问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据云存储中的安全审计方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

多时相InSAR电离层探测和延迟改正的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

IRES调控EV71神经毒性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

细胞周期蛋白Nlp在维持基因组稳定和肿瘤发生中的作用和分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于QAR记录数据和飞行员测试数据的民航飞行员飞行综合素质与飞行操作特征的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

EfficientFormer: Vision Transformers at MobileNet Speed

Arxiv

0+阅读 · 2022年6月14日

Self-critiquing models for assisting human evaluators

Arxiv

1+阅读 · 2022年6月14日

Feature Selection for Discovering Distributional Treatment Effect Modifiers

Arxiv

0+阅读 · 2022年6月13日

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年6月11日

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Arxiv

0+阅读 · 2022年6月10日

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Arxiv

0+阅读 · 2022年6月10日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Weakly Supervised One-Shot Detection with Attention Siamese Networks

Arxiv

14+阅读 · 2018年1月12日

VIP会员

文章信息

相关主题

有偏数据集

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

EfficientFormer: Vision Transformers at MobileNet Speed

Arxiv

0+阅读 · 2022年6月14日

Self-critiquing models for assisting human evaluators

Arxiv

1+阅读 · 2022年6月14日

Feature Selection for Discovering Distributional Treatment Effect Modifiers

Arxiv

0+阅读 · 2022年6月13日

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年6月11日

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Arxiv

0+阅读 · 2022年6月10日

AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models

Arxiv

0+阅读 · 2022年6月10日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Weakly Supervised One-Shot Detection with Attention Siamese Networks

Arxiv

14+阅读 · 2018年1月12日

相关基金

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Tau蛋白异常导致学习记忆损害的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于压电本征横向驱动机制的大位移MEMS驱动器关键问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

数据云存储中的安全审计方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

多时相InSAR电离层探测和延迟改正的理论与方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

IRES调控EV71神经毒性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

细胞周期蛋白Nlp在维持基因组稳定和肿瘤发生中的作用和分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于QAR记录数据和飞行员测试数据的民航飞行员飞行综合素质与飞行操作特征的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员