学习识别方言特征 (Learning to Recognize Dialect Features) - 专知论文

会员服务 ·

0

学成 · MoDELS · 模型评估 · 情景 · 标注 ·

2021 年 5 月 6 日

Learning to Recognize Dialect Features

翻译：学习识别方言特征

Dorottya Demszky,Devyani Sharma,Jonathan H. Clark,Vinodkumar Prabhakaran,Jacob Eisenstein

from arxiv, NAACL camera-ready

Building NLP systems that serve everyone requires accounting for dialect differences. But dialects are not monolithic entities: rather, distinctions between and within dialects are captured by the presence, absence, and frequency of dozens of dialect features in speech and text, such as the deletion of the copula in "He {} running". In this paper, we introduce the task of dialect feature detection, and present two multitask learning approaches, both based on pretrained transformers. For most dialects, large-scale annotated corpora for these features are unavailable, making it difficult to train recognizers. We train our models on a small number of minimal pairs, building on how linguists typically define dialect features. Evaluation on a test set of 22 dialect features of Indian English demonstrates that these models learn to recognize many features with high accuracy, and that a few minimal pairs can be as effective for training as thousands of labeled examples. We also demonstrate the downstream applicability of dialect feature detection both as a measure of dialect density and as a dialect classifier.

翻译：为每个人服务的建设NLP系统需要考虑方言差异。但方言并不是单一实体:相反,言语和文字中数十种方言特征的存在、缺乏和频率,如删除“He ⁇ running ” 中的 Copula。在本文中,我们引入了方言特征探测任务,并提出了两种多任务学习方法,这两种方法都以预先培训的变压器为基础。对于大多数方言来说,没有对这些特征的大型附加说明的体,因此难以培训识别者。我们用少量的最小对子来培训我们的模型,其基础是语言学家通常如何定义方言特征。对一套由印度英语的22个方言特征组成的测试评价显示,这些模型学会了高度精准地识别许多特征,并且有几对最起码的对子可以有效地培训,如同数千个标注的例子。我们还展示了方言特征检测作为方言密度和方言分类方法的下游适用性。

0

相关内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【强化学习研讨会|Microsoft Research】利用批量强化学习确定医疗保健中的治疗方案（Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare）,哈佛大学助理教授| Finale Doshi-Velez

【强化学习研讨会|Microsoft Research】利用批量强化学习确定医疗保健中的治疗方案（Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare）,哈佛大学助理教授| Finale Doshi-Velez

专知会员服务

7+阅读 · 2019年10月3日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

Zero-Resource Cross-Lingual Named Entity Recognition

Arxiv

5+阅读 · 2019年11月22日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Few-shot classification in Named Entity Recognition Task

Arxiv

6+阅读 · 2018年12月14日

Task-Free Continual Learning

Arxiv

6+阅读 · 2018年12月10日

Neural Machine Translation for Bilingually Scarce Scenarios: A Deep Multi-task Learning Approach

Arxiv

9+阅读 · 2018年5月11日

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

Arxiv

3+阅读 · 2018年4月5日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【强化学习研讨会|Microsoft Research】利用批量强化学习确定医疗保健中的治疗方案（Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare）,哈佛大学助理教授| Finale Doshi-Velez

【强化学习研讨会|Microsoft Research】利用批量强化学习确定医疗保健中的治疗方案（Towards Using Batch Reinforcement Learning to Identify Treatment Options in Healthcare）,哈佛大学助理教授| Finale Doshi-Velez

专知会员服务

7+阅读 · 2019年10月3日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Arxiv

11+阅读 · 2021年4月29日

Zero-Resource Cross-Lingual Named Entity Recognition

Arxiv

5+阅读 · 2019年11月22日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Few-shot classification in Named Entity Recognition Task

Arxiv

6+阅读 · 2018年12月14日

Task-Free Continual Learning

Arxiv

6+阅读 · 2018年12月10日

Neural Machine Translation for Bilingually Scarce Scenarios: A Deep Multi-task Learning Approach

Arxiv

9+阅读 · 2018年5月11日

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

Arxiv

3+阅读 · 2018年4月5日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员