分析语言特征对跨语言转让的影响 (Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer) - 专知论文

会员服务 ·

0

Performer · NLP · 预测器/决策函数 · CASES · state-of-the-art ·

2021 年 5 月 12 日

Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

翻译：分析语言特征对跨语言转让的影响

Błażej Dolicki,Gerasimos Spanakis

There is an increasing amount of evidence that in cases with little or no data in a target language, training on a different language can yield surprisingly good results. However, currently there are no established guidelines for choosing the training (source) language. In attempt to solve this issue we thoroughly analyze a state-of-the-art multilingual model and try to determine what impacts good transfer between languages. As opposed to the majority of multilingual NLP literature, we don't only train on English, but on a group of almost 30 languages. We show that looking at particular syntactic features is 2-4 times more helpful in predicting the performance than an aggregated syntactic similarity. We find out that the importance of syntactic features strongly differs depending on the downstream task - no single feature is a good performance predictor for all NLP tasks. As a result, one should not expect that for a target language $L_1$ there is a single language $L_2$ that is the best choice for any NLP task (for instance, for Bulgarian, the best source language is French on POS tagging, Russian on NER and Thai on NLI). We discuss the most important linguistic features affecting the transfer quality using statistical and machine learning methods.

翻译：越来越多的证据表明,在缺乏或没有目标语言数据的情况下,关于不同语言的培训可以产生出令人惊讶的良好结果。然而,目前没有为选择培训(源)语言制定既定的指导方针。为了解决这一问题,我们彻底分析一个最先进的多语言模式,并试图确定什么影响语言之间的良好转让。与大多数多语言国家语言方案文献相比,我们不只用英语培训,而是用近30种语言培训。我们显示,对特定合成特征的审视比综合类似性能预测性能的2-4倍更有助于预测性能。我们发现,根据下游任务,合成特征的重要性差异很大――没有一个单一特征是所有国家语言方案任务的良好性能预测。因此,对于一个目标语言来说,我们不应该指望只有1美元,我们只用1美元来培训英语,而是用近30种语言进行。我们发现,对于任何国家语言方案任务的最佳选择(例如,保加利亚语系,最好的源语言在POS标签上是法语,俄罗斯语在NER和泰语系上,我们用影响统计质量传输的最重要语言特征)。

1

相关内容

Performer

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

专知会员服务

7+阅读 · 2020年4月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Empirically Measuring Transfer Distance for System Design and Operation

Arxiv

0+阅读 · 2021年7月2日

Fairness for Image Generation with Uncertain Sensitive Attributes

Arxiv

0+阅读 · 2021年7月2日

Combining Feature and Instance Attribution to Detect Artifacts

Combining Feature and Instance Attribution to Detect Artifacts

Arxiv

0+阅读 · 2021年7月1日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

Projecting Embeddings for Domain Adaptation: Joint Modeling of Sentiment Analysis in Diverse Domains

Arxiv

8+阅读 · 2018年6月13日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

VIP会员

文章信息

相关主题

预测器/决策函数

state-of-the-art

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

所有跨语言嵌入式都应该讲英语吗? | Should All Cross-Lingual Embeddings Speak English?

专知会员服务

7+阅读 · 2020年4月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

学习自然语言处理路线图

学习自然语言处理路线图

专知会员服务

139+阅读 · 2019年9月24日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Linguistically Regularized LSTMs for Sentiment Classification

Linguistically Regularized LSTMs for Sentiment Classification

黑龙江大学自然语言处理实验室

8+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Empirically Measuring Transfer Distance for System Design and Operation

Arxiv

0+阅读 · 2021年7月2日

Fairness for Image Generation with Uncertain Sensitive Attributes

Arxiv

0+阅读 · 2021年7月2日

Combining Feature and Instance Attribution to Detect Artifacts

Combining Feature and Instance Attribution to Detect Artifacts

Arxiv

0+阅读 · 2021年7月1日

Unsupervised Cross-lingual Representation Learning at Scale

Arxiv

5+阅读 · 2019年11月5日

Improving Few-shot Text Classification via Pretrained Language Representations

Arxiv

3+阅读 · 2019年8月22日

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Arxiv

3+阅读 · 2018年12月20日

Linguistically-Informed Self-Attention for Semantic Role Labeling

Arxiv

17+阅读 · 2018年8月28日

Projecting Embeddings for Domain Adaptation: Joint Modeling of Sentiment Analysis in Diverse Domains

Arxiv

8+阅读 · 2018年6月13日

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Arxiv

3+阅读 · 2018年1月19日

Fluency-Guided Cross-Lingual Image Captioning

Arxiv

3+阅读 · 2017年8月15日

微信扫码咨询专知VIP会员