探索多种语言条款水平的多语语言道德学最先进的语言建模方法和数据增强技术 (Exploring the State-of-the-Art Language Modeling Methods and Data Augmentation Techniques for Multilingual Clause-Level Morphology) - 专知论文

会员服务 ·

0

Analysis · state-of-the-art · 数据增强 · 语言模型化 · MoDELS ·

2022 年 11 月 3 日

Exploring the State-of-the-Art Language Modeling Methods and Data Augmentation Techniques for Multilingual Clause-Level Morphology

翻译：探索多种语言条款水平的多语语言道德学最先进的语言建模方法和数据增强技术

Emre Can Acikgoz,Tilek Chubakov,Müge Kural,Gözde Gül Şahin,Deniz Yuret

This paper describes the KUIS-AI NLP team's submission for the 1$^{st}$ Shared Task on Multilingual Clause-level Morphology (MRL2022). We present our work on all three parts of the shared task: inflection, reinflection, and analysis. We mainly explore two approaches: Transformer models in combination with data augmentation, and exploiting the state-of-the-art language modeling techniques for morphological analysis. Data augmentation leads a remarkable performance improvement for most of the languages in the inflection task. Prefix-tuning on pretrained mGPT model helps us to adapt reinflection and analysis tasks in a low-data setting. Additionally, we used pipeline architectures using publicly available open source lemmatization tools and monolingual BERT-based morphological feature classifiers for reinflection and analysis tasks, respectively. While Transformer architectures with data augmentation and pipeline architectures achieved the best results for inflection and reinflection tasks, pipelines and prefix-tuning on mGPT received the highest results for the analysis task. Our methods achieved first place in each of the three tasks and outperforms mT5-baseline with ~89\% for inflection, ~80\% for reinflection and ~12\% for analysis. Our code https://github.com/emrecanacikgoz/mrl2022 is publicly available.

翻译：本文描述了 KUIS-AI NLP 团队为多语言条款水平道德学共同任务提交的 1 美元的共享任务。我们介绍了我们关于共享任务所有三个部分的工作: 透视、重新透视和分析。我们主要探讨两种方法: 与数据扩增相结合的变换模型, 以及利用最新语言模型技术进行形态分析。数据扩增使大多数语言在渗透任务中的业绩显著改善。预先培训的 mGPT 模型的预先调整帮助我们在低数据设置中调整重新公布和分析任务。此外, 我们使用公开的开放源 Lemmatization 工具和基于单一语言的变形特征解析器, 分别用于重现和分析任务。具有数据扩增和管道结构的变换结构在渗透和再现任务、输油管道和 mGPT 80 前的调整工作获得了最高的分析结果。我们的第三种变形方法, 用于我们第三种变形的 Rencom 。

0

相关内容

Analysis

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

70+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

17+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

86+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

51+阅读 · 2018年6月28日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

抗MRSA活性rhodomyrtosone B类似物的合成和构效关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

气固系统的底层建模-约简及介尺度结构机理分析

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Riemann-Hilbert 方法和随机矩阵谱分析中的 Painleve 渐近

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

句法制导的统计汉语句义分析方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

量子霍尔物理中的自旋现象研究

国家自然科学基金

0+阅读 · 2009年12月31日

汉语文语转换中语义与表现力联合建模

国家自然科学基金

0+阅读 · 2008年12月31日

UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

Arxiv

1+阅读 · 2022年12月23日

Metadata-guided Consistency Learning for High Content Images

Arxiv

0+阅读 · 2022年12月22日

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

Arxiv

0+阅读 · 2022年12月22日

Deep Riemannian Networks for EEG Decoding

Arxiv

0+阅读 · 2022年12月21日

Critic-Guided Decoding for Controlled Text Generation

Arxiv

0+阅读 · 2022年12月21日

Task Ambiguity in Humans and Language Models

Arxiv

0+阅读 · 2022年12月20日

The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning

Arxiv

0+阅读 · 2022年12月16日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

29+阅读 · 2021年7月28日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Data Augmentation using Pre-trained Transformer Models

Arxiv

15+阅读 · 2020年3月4日

VIP会员

文章信息

相关主题

state-of-the-art

语言模型化

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

70+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

72+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

17+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

86+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

热门VIP内容

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

1+阅读 · 2022年11月2日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

51+阅读 · 2018年6月28日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

相关论文

UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

Arxiv

1+阅读 · 2022年12月23日

Metadata-guided Consistency Learning for High Content Images

Arxiv

0+阅读 · 2022年12月22日

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

Arxiv

0+阅读 · 2022年12月22日

Deep Riemannian Networks for EEG Decoding

Arxiv

0+阅读 · 2022年12月21日

Critic-Guided Decoding for Controlled Text Generation

Arxiv

0+阅读 · 2022年12月21日

Task Ambiguity in Humans and Language Models

Arxiv

0+阅读 · 2022年12月20日

The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning

Arxiv

0+阅读 · 2022年12月16日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

29+阅读 · 2021年7月28日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Data Augmentation using Pre-trained Transformer Models

Arxiv

15+阅读 · 2020年3月4日

相关基金

抗MRSA活性rhodomyrtosone B类似物的合成和构效关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

气固系统的底层建模-约简及介尺度结构机理分析

国家自然科学基金

0+阅读 · 2014年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

可压缩湍流粒子输运的拉格朗日（Lagrangian）研究

国家自然科学基金

0+阅读 · 2013年12月31日

Riemann-Hilbert 方法和随机矩阵谱分析中的 Painleve 渐近

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

句法制导的统计汉语句义分析方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

量子霍尔物理中的自旋现象研究

国家自然科学基金

0+阅读 · 2009年12月31日

汉语文语转换中语义与表现力联合建模

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员