多观点分词规范化 (Multi-view Subword Regularization) - 专知论文

会员服务 ·

0

正则化项 · 启发式算法 · 词表 · 全 · 词元分析器 ·

2021 年 4 月 6 日

Multi-view Subword Regularization

翻译：多观点分词规范化

Xinyi Wang,Sebastian Ruder,Graham Neubig

from arxiv, Accepted at NAACL 2021

Multilingual pretrained representations generally rely on subword segmentation algorithms to create a shared multilingual vocabulary. However, standard heuristic algorithms often lead to sub-optimal segmentation, especially for languages with limited amounts of data. In this paper, we take two major steps towards alleviating this problem. First, we demonstrate empirically that applying existing subword regularization methods(Kudo, 2018; Provilkov et al., 2020) during fine-tuning of pre-trained multilingual representations improves the effectiveness of cross-lingual transfer. Second, to take full advantage of different possible input segmentations, we propose Multi-view Subword Regularization (MVR), a method that enforces the consistency between predictions of using inputs tokenized by the standard and probabilistic segmentations. Results on the XTREME multilingual benchmark(Hu et al., 2020) show that MVR brings consistent improvements of up to 2.5 points over using standard segmentation algorithms.

翻译：多语言预先培训的表述通常依赖亚字分解算法来创建共同的多语种词汇。然而,标准的超自然算法往往导致亚最佳分解,特别是对于数据数量有限的语言。在本文件中,我们采取了两个主要步骤来缓解这一问题。首先,我们从经验上证明,在微调培训前多语种表述法时,应用现有的子字正规化方法(Kudo, 2018年;Provilkov等人,2020年)提高了跨语言传输的有效性。第二,为了充分利用不同可能的输入分解,我们提议多观点分解法(MVR),这是一种在使用标准分解法和概率分解法(Hu等人,2020年)所象征的投入预测之间实现一致性的方法。XTREME多语言基准(Hu等人,2020年)的结果显示,MVR在使用标准分解算法的基础上,持续改进了2.5个点。

0

相关内容

正则化项

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

65+阅读 · 2020年7月25日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

77+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

154+阅读 · 2019年11月9日

注意力机制模型最新综述

注意力机制模型最新综述

专知会员服务

260+阅读 · 2019年10月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

11+阅读 · 2019年5月6日

已删除

将门创投

3+阅读 · 2019年4月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

专知

8+阅读 · 2018年6月21日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Pre-training Universal Language Representation

Arxiv

0+阅读 · 2021年5月30日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

0+阅读 · 2021年5月28日

Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Arxiv

1+阅读 · 2021年5月27日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Multi-view Knowledge Graph Embedding for Entity Alignment

Arxiv

36+阅读 · 2019年6月6日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning

Arxiv

5+阅读 · 2019年5月9日

Active Metric Learning for Supervised Classification

Arxiv

9+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

启发式算法

词元分析器

相关VIP内容

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

65+阅读 · 2020年7月25日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知会员服务

77+阅读 · 2020年6月11日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

91+阅读 · 2020年3月12日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

154+阅读 · 2019年11月9日

注意力机制模型最新综述

注意力机制模型最新综述

专知会员服务

260+阅读 · 2019年10月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

11+阅读 · 2019年5月6日

已删除

将门创投

3+阅读 · 2019年4月25日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

【论文推荐】最新九篇机器翻译相关论文—深度多任务学习、深度RNNs、注意焦点、多源神经机器翻译

专知

8+阅读 · 2018年6月21日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Pre-training Universal Language Representation

Arxiv

0+阅读 · 2021年5月30日

ResT: An Efficient Transformer for Visual Recognition

Arxiv

0+阅读 · 2021年5月28日

Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training

Arxiv

1+阅读 · 2021年5月27日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Multi-view Knowledge Graph Embedding for Entity Alignment

Arxiv

36+阅读 · 2019年6月6日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning

Arxiv

5+阅读 · 2019年5月9日

Active Metric Learning for Supervised Classification

Arxiv

9+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员