将变换器应用到字符级别转换 (Applying the Transformer to Character-level Transduction) - 专知论文

会员服务 ·

0

变换 · Performer · Batch Size · MoDELS · 规范化 ·

2021 年 1 月 28 日

Applying the Transformer to Character-level Transduction

翻译：将变换器应用到字符级别转换

Shijie Wu,Ryan Cotterell,Mans Hulden

from arxiv, EACL 2021

The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, there are few works that outperform recurrent models using the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size plays a crucial role in the performance of the transformer on character-level tasks, and we show that with a large enough batch size, the transformer does indeed outperform recurrent models. We also introduce a simple technique to handle feature-guided character-level transduction that further improves performance. With these insights, we achieve state-of-the-art performance on morphological inflection and historical text normalization. We also show that the transformer outperforms a strong baseline on two other character-level transduction tasks: grapheme-to-phoneme conversion and transliteration.

翻译：变压器在各种单词级 NLP 任务中表现优于经常性神经网络序列到序列模型。然而,在字符级转换任务中,例如形态渗透生成和历史文本正常化方面,几乎没有什么作品能够比变压器的经常模型更优异。在一项经验研究中,我们发现,与经常序列到序列模型相比,批量大小在字符级任务变压器的运行中发挥着关键作用,而且我们显示,由于批量大小足够大,变压器确实优于经常模型。我们还引入了一种简单技术,处理地貌引导字符级转换,从而进一步提高性能。有了这些洞察,我们就能在形态变化和历史文本正常化方面实现最先进的性能。我们还表明,变压器在另外两个特性级转换任务上,即图形式对电话转换和转换,比重基线要强。

0

相关内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

专知会员服务

19+阅读 · 2020年4月9日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

知识图谱本体结构构建论文合集

知识图谱本体结构构建论文合集

专知会员服务

108+阅读 · 2019年10月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | AutoML与轻量模型列表

Github项目推荐 | AutoML与轻量模型列表

AI研习社

9+阅读 · 2019年5月4日

AutoML与轻量模型大列表

AutoML与轻量模型大列表

专知

8+阅读 · 2019年4月29日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

专知

8+阅读 · 2018年6月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

Glyce: Glyph-vectors for Chinese Character Representations

Glyce: Glyph-vectors for Chinese Character Representations

Arxiv

6+阅读 · 2019年1月29日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

End-to-End Text Classification via Image-based Embedding using Character-level Networks

End-to-End Text Classification via Image-based Embedding using Character-level Networks

Arxiv

5+阅读 · 2018年10月10日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Open Set Chinese Character Recognition using Multi-typed Attributes

Open Set Chinese Character Recognition using Multi-typed Attributes

Arxiv

4+阅读 · 2018年8月27日

Classical Structured Prediction Losses for Sequence to Sequence Learning

Arxiv

6+阅读 · 2018年5月24日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

【ACL2020-复旦大学】FLAT：采用扁平化Transformer的中文NER，FLAT: Chinese NER Using Flat-Lattice Transformer

专知会员服务

64+阅读 · 2020年4月28日

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

【Google大脑】进化正则激活层，Evolving Normalization-Activation Layers

专知会员服务

19+阅读 · 2020年4月9日

【伯克利】再思考 Transformer中的Batch Normalization

【伯克利】再思考 Transformer中的Batch Normalization

专知会员服务

41+阅读 · 2020年3月21日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

知识图谱本体结构构建论文合集

知识图谱本体结构构建论文合集

专知会员服务

108+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | AutoML与轻量模型列表

Github项目推荐 | AutoML与轻量模型列表

AI研习社

9+阅读 · 2019年5月4日

AutoML与轻量模型大列表

AutoML与轻量模型大列表

专知

8+阅读 · 2019年4月29日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

【论文推荐】最新六篇视频分类相关论文—教师学生网络、表观-关系、Charades-Ego、视觉数据合成、图蒸馏、细粒度视频分类

专知

8+阅读 · 2018年6月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

Pay Less Attention with Lightweight and Dynamic Convolutions

Pay Less Attention with Lightweight and Dynamic Convolutions

Arxiv

4+阅读 · 2019年1月29日

Glyce: Glyph-vectors for Chinese Character Representations

Glyce: Glyph-vectors for Chinese Character Representations

Arxiv

6+阅读 · 2019年1月29日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

End-to-End Text Classification via Image-based Embedding using Character-level Networks

End-to-End Text Classification via Image-based Embedding using Character-level Networks

Arxiv

5+阅读 · 2018年10月10日

Improving the Transformer Translation Model with Document-Level Context

Arxiv

4+阅读 · 2018年10月8日

Open Set Chinese Character Recognition using Multi-typed Attributes

Open Set Chinese Character Recognition using Multi-typed Attributes

Arxiv

4+阅读 · 2018年8月27日

Classical Structured Prediction Losses for Sequence to Sequence Learning

Arxiv

6+阅读 · 2018年5月24日

Improving Sentiment Analysis in Arabic Using Word Representation

Arxiv

4+阅读 · 2018年2月28日

微信扫码咨询专知VIP会员