MauBERT：面向少样本声学单元发现的通用语音归纳偏置 (MauBERT: Universal Phonetic Inductive Biases for Few-Shot Acoustic Units Discovery) - 专知论文

会员服务 ·

0

监督 · 偏置 · 发音特征 · 归纳偏置 · 样本 ·

MauBERT: Universal Phonetic Inductive Biases for Few-Shot Acoustic Units Discovery

翻译：MauBERT：面向少样本声学单元发现的通用语音归纳偏置

Angelo Ortiz Tandazo,Manel Khentout,Youssef Benchekroun,Thomas Hueber,Emmanuel Dupoux

This paper introduces MauBERT, a multilingual extension of HuBERT that leverages articulatory features for robust cross-lingual phonetic representation learning. We continue HuBERT pre-training with supervision based on a phonetic-to-articulatory feature mapping in 55 languages. Our models learn from multilingual data to predict articulatory features or phones, resulting in language-independent representations that capture multilingual phonetic properties. Through comprehensive ABX discriminability testing, we show MauBERT models produce more context-invariant representations than state-of-the-art multilingual self-supervised learning models. Additionally, the models effectively adapt to unseen languages and casual speech with minimal self-supervised fine-tuning (10 hours of speech). This establishes an effective approach for instilling linguistic inductive biases in self-supervised speech models.

翻译：本文介绍MauBERT，这是HuBERT的多语言扩展版本，利用发音特征实现鲁棒的跨语言语音表征学习。我们在55种语言中基于语音-发音特征映射的监督下继续HuBERT的预训练。我们的模型通过多语言数据学习预测发音特征或音素，从而获得能捕捉多语言语音特性的语言无关表征。通过全面的ABX可区分性测试，我们证明MauBERT模型比最先进的多语言自监督学习模型能产生更具上下文不变性的表征。此外，该模型通过极少量自监督微调（10小时语音数据）即可有效适应未见语言和随意语音。这为自监督语音模型注入语言归纳偏置建立了有效途径。

0

相关内容

【Google AI-Yi Tay】Transformer记忆为可微搜索索引”(DSI)

【Google AI-Yi Tay】Transformer记忆为可微搜索索引”(DSI)

专知会员服务

10+阅读 · 2022年3月4日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

专知会员服务

48+阅读 · 2019年11月17日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

将Python用于NLP：Pattern 库简介

将Python用于NLP：Pattern 库简介

Python程序员

15+阅读 · 2019年6月7日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

Auto-Keras与AutoML：入门指南

Auto-Keras与AutoML：入门指南

云栖社区

18+阅读 · 2019年2月9日

基于形态和多词的有限语料蒙汉互译调序优化方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Arxiv

0+阅读 · 12月24日

SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision

Arxiv

0+阅读 · 12月23日

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

Arxiv

0+阅读 · 12月23日

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Arxiv

0+阅读 · 12月23日

CoPE: A Small Language Model for Steerable and Scalable Content Labeling

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

相关VIP内容

【Google AI-Yi Tay】Transformer记忆为可微搜索索引”(DSI)

【Google AI-Yi Tay】Transformer记忆为可微搜索索引”(DSI)

专知会员服务

10+阅读 · 2022年3月4日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

【ACL 2019 Tutorials】深度贝叶斯自然语言处理（Deep Bayesian Natural Language Processing），Jen-Tzung Chien

专知会员服务

48+阅读 · 2019年11月17日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

【斯坦福博士论文】数据、决策与过度依赖：构建可信人工智能的核心挑战

《多域时代中维持弹性军事训练：挑战与机遇》

【AAAI2026】专家数量何为最优？面向混合专家模型的语义专业化优化研究

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

相关资讯

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知

16+阅读 · 2020年5月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

将Python用于NLP：Pattern 库简介

将Python用于NLP：Pattern 库简介

Python程序员

15+阅读 · 2019年6月7日

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言自然语言处理：文本向量化——词嵌入（Word Embedding）

R语言中文社区

10+阅读 · 2019年4月6日

Auto-Keras与AutoML：入门指南

Auto-Keras与AutoML：入门指南

云栖社区

18+阅读 · 2019年2月9日

相关论文

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Arxiv

0+阅读 · 12月24日

SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision

Arxiv

0+阅读 · 12月23日

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation

Arxiv

0+阅读 · 12月23日

GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

Arxiv

0+阅读 · 12月23日

CoPE: A Small Language Model for Steerable and Scalable Content Labeling

Arxiv

0+阅读 · 12月19日

相关基金

基于形态和多词的有限语料蒙汉互译调序优化方法

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

面向汉语文本理解的语义计算方法

国家自然科学基金

8+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员