以图表为基础的低资源语音贴字的多语种标签 (Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging) - 专知论文

会员服务 ·

0

词性标注 · 标记传播 · 标注 · 情景 · 图 ·

2022 年 10 月 18 日

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

翻译：以图表为基础的低资源语音贴字的多语种标签

Ayyoob Imani,Silvia Severini,Masoud Jalili Sabet,François Yvon,Hinrich Schütze

from arxiv, EMNLP 2022

Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-resource languages lack labeled data for training. An established method for training a POS tagger in such a scenario is to create a labeled training set by transferring from high-resource languages. In this paper, we propose a novel method for transferring labels from multiple high-resource source to low-resource target languages. We formalize POS tag projection as graph-based label propagation. Given translations of a sentence in multiple languages, we create a graph with words as nodes and alignment links as edges by aligning words for all language pairs. We then propagate node labels from source to target using a Graph Neural Network augmented with transformer layers. We show that our propagation creates training sets that allow us to train POS taggers for a diverse set of languages. When combined with enhanced contextualized embeddings, our method achieves a new state-of-the-art for unsupervised POS tagging of low-resource languages.

翻译：部分语音标签( POS) 是 NLP 管道中的一个重要部分, 但许多低资源语言缺乏标签数据。在这种情况下培训 POS 跳板的既定方法是创建由高资源语言传输的标签培训组。在本文中, 我们提出了将标签从多种高资源源转换到低资源目标语言的新颖方法。我们正式将 POS 标签投影作为基于图形的标签传播。如果用多种语言翻译一个句子, 我们通过协调所有语言配对的单词, 创建一个以词为节点和校对链接的边緣的图表。我们随后将节点标签从源传播到目标方, 使用由变压器层放大的图象神经网络。我们显示我们的传播创建培训组使我们能够为多种语言培训 POS 标签员。当与强化背景化嵌入器相结合时, 我们的方法可以实现一个新的状态, 用于对低资源语言进行不受监控的 POS 标记。

0

相关内容

词性标注

词性（part-of-speech）是词汇基本的语法属性，通常也称为词类。词性标注就是在给定句子中判定每个词的语法范畴，确定其词性并加以标注的过程，是中文信息处理面临的重要基础性问题。在语料库语言学中，词性标注（POS标注或PoS标注或POST），也称为语法标注，是将文本（语料库）中的单词标注为与特定词性相对应的过程，[1] 基于其定义和上下文。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

lnc-CENPQ-2在颞叶内侧型癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

太平洋长时间序列遥感数据的参数化模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

Olig2通过SHH信号通路调控早产儿缺氧缺血脑损伤神经修复的机制

国家自然科学基金

0+阅读 · 2013年12月31日

斜硅石(moganite)高温晶体结构和相变的固体光谱学研究

国家自然科学基金

0+阅读 · 2012年12月31日

肌纤形成调节因子1以微丝为靶点保护再灌注心肌的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于双曲树的分支分类信息的可视分析模型与方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于光谱特征分析的喀斯特石漠化信息遥感提取研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于C-PolInSAR和PolInSAR的森林垂直结构参数反演

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Arxiv

0+阅读 · 2022年11月28日

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

Arxiv

0+阅读 · 2022年11月28日

Improving Low-Resource Question Answering using Active Learning in Multiple Stages

Arxiv

0+阅读 · 2022年11月27日

MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding

Arxiv

1+阅读 · 2022年11月27日

Multitask Learning for Low Resource Spoken Language Understanding

Arxiv

0+阅读 · 2022年11月24日

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

Arxiv

0+阅读 · 2022年11月23日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction

Arxiv

26+阅读 · 2020年12月29日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Arxiv

0+阅读 · 2022年11月28日

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

Arxiv

0+阅读 · 2022年11月28日

Improving Low-Resource Question Answering using Active Learning in Multiple Stages

Arxiv

0+阅读 · 2022年11月27日

MNER-QG: An End-to-End MRC framework for Multimodal Named Entity Recognition with Query Grounding

Arxiv

1+阅读 · 2022年11月27日

Multitask Learning for Low Resource Spoken Language Understanding

Arxiv

0+阅读 · 2022年11月24日

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

Arxiv

0+阅读 · 2022年11月23日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction

Arxiv

26+阅读 · 2020年12月29日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

相关基金

lnc-CENPQ-2在颞叶内侧型癫痫发病机制中的作用

国家自然科学基金

0+阅读 · 2016年12月31日

Decorin对急性缺血性卒中后血脑屏障中ZO-1蛋白的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

太平洋长时间序列遥感数据的参数化模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

Olig2通过SHH信号通路调控早产儿缺氧缺血脑损伤神经修复的机制

国家自然科学基金

0+阅读 · 2013年12月31日

斜硅石(moganite)高温晶体结构和相变的固体光谱学研究

国家自然科学基金

0+阅读 · 2012年12月31日

肌纤形成调节因子1以微丝为靶点保护再灌注心肌的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于双曲树的分支分类信息的可视分析模型与方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于光谱特征分析的喀斯特石漠化信息遥感提取研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于C-PolInSAR和PolInSAR的森林垂直结构参数反演

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员