多语言多语言多语言多模式学习,并配有机器翻译文本 (Multilingual Multimodal Learning with Machine Translated Text) - 专知论文

会员服务 ·

0

多峰值 · Learning · Machine Translation · 多模态学习 · MoDELS ·

2022 年 10 月 24 日

Multilingual Multimodal Learning with Machine Translated Text

翻译：多语言多语言多语言多模式学习,并配有机器翻译文本

Chen Qiu,Dan Oneata,Emanuele Bugliarello,Stella Frank,Desmond Elliott

from arxiv, EMNLP 2022

Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model. We apply it to both pretraining and fine-tuning data with a state-of-the-art model. In order to prevent models from learning from low-quality translated text, we propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning, both at pretraining and fine-tuning.

翻译：多数愿景和语言培训前研究都侧重于英语任务,然而,多语种多语种评价数据集(例如多语种30K、xGQA、XVNLI和MARVL)的创建对寻找高质量多语种和多语种的培训数据提出了新的挑战。在本文中,我们调查机器翻译英语多式联运数据是否可有效替代缺乏容易获得的多语种数据的问题。我们称这个框架TD-MML:多语种多语种多语种学习翻译数据,可应用于任何多式联运数据集和模型。我们将其应用到采用最先进的模型的预培训和微调数据中。为了防止模式从低质量翻译文本中学习,我们建议用两种衡量标准自动删除由此产生的数据集中的这种翻译。在IGLUE基准中,关于20种语言的5项任务的实验中,我们表明,翻译数据可以为多语种多式联运学习提供有用的信号,无论是在培训前还是微调。

1

相关内容

多峰值

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

联合风险投资：行为动机、伙伴选择及绩效研究

国家自然科学基金

0+阅读 · 2015年12月31日

多粒度超启发计算方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

变化环境下南方湿热区韩江流域景观格局演变与生态水文耦合研究

国家自然科学基金

0+阅读 · 2014年12月31日

Web2.0环境下公众用户知识共享行为的多层次影响机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

稻瘟病菌cAMP-PKA信号途径下游基因调控网络及其功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属氧化物(TiO2、SnO2)纳米晶高能晶面的可控合成及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Elmo1介导TGF-β1活化Rac1的机制及其在肾间质纤维化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

艾比湖湿地边缘带景观格局演变与生态服务功能关系的定量研究

国家自然科学基金

0+阅读 · 2012年12月31日

毒害物质在红树林生态系统的累积规律及其生态环境影响

国家自然科学基金

0+阅读 · 2012年12月31日

基于青枯菌生态位特异基因表达的马铃薯比较转录学研究

国家自然科学基金

0+阅读 · 2012年12月31日

PDEBENCH: An Extensive Benchmark for Scientific Machine Learning

Arxiv

0+阅读 · 2022年12月9日

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Arxiv

0+阅读 · 2022年12月7日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Optimization Models for Machine Learning: A Survey

Arxiv

18+阅读 · 2019年1月16日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

VIP会员

文章信息

相关主题

Machine Translation

多模态学习

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

相关论文

PDEBENCH: An Extensive Benchmark for Scientific Machine Learning

Arxiv

0+阅读 · 2022年12月9日

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Arxiv

0+阅读 · 2022年12月7日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

Multimodal Learning with Transformers: A Survey

Arxiv

69+阅读 · 2022年6月13日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

35+阅读 · 2021年8月2日

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Arxiv

12+阅读 · 2020年4月15日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Optimization Models for Machine Learning: A Survey

Arxiv

18+阅读 · 2019年1月16日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

相关基金

联合风险投资：行为动机、伙伴选择及绩效研究

国家自然科学基金

0+阅读 · 2015年12月31日

多粒度超启发计算方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

变化环境下南方湿热区韩江流域景观格局演变与生态水文耦合研究

国家自然科学基金

0+阅读 · 2014年12月31日

Web2.0环境下公众用户知识共享行为的多层次影响机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

稻瘟病菌cAMP-PKA信号途径下游基因调控网络及其功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属氧化物(TiO2、SnO2)纳米晶高能晶面的可控合成及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Elmo1介导TGF-β1活化Rac1的机制及其在肾间质纤维化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

艾比湖湿地边缘带景观格局演变与生态服务功能关系的定量研究

国家自然科学基金

0+阅读 · 2012年12月31日

毒害物质在红树林生态系统的累积规律及其生态环境影响

国家自然科学基金

0+阅读 · 2012年12月31日

基于青枯菌生态位特异基因表达的马铃薯比较转录学研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员