适用于低资源多语言情感分析的自适应预训练和源语言选择： NLNDE 在 SemEval-2023 任务12 中的应用 (NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis) - 专知论文

会员服务 ·

0

情感分析 · semeval · 低资源 · 自适应 · 预训练 ·

2023 年 4 月 28 日

NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis

翻译：适用于低资源多语言情感分析的自适应预训练和源语言选择： NLNDE 在 SemEval-2023 任务12 中的应用

Mingyang Wang,Heike Adel,Lukas Lange,Jannik Strötgen,Hinrich Schütze

This paper describes our system developed for the SemEval-2023 Task 12 "Sentiment Analysis for Low-resource African Languages using Twitter Dataset". Sentiment analysis is one of the most widely studied applications in natural language processing. However, most prior work still focuses on a small number of high-resource languages. Building reliable sentiment analysis systems for low-resource languages remains challenging, due to the limited training data in this task. In this work, we propose to leverage language-adaptive and task-adaptive pretraining on African texts and study transfer learning with source language selection on top of an African language-centric pretrained language model. Our key findings are: (1) Adapting the pretrained model to the target language and task using a small yet relevant corpus improves performance remarkably by more than 10 F1 score points. (2) Selecting source languages with positive transfer gains during training can avoid harmful interference from dissimilar languages, leading to better results in multilingual and cross-lingual settings. In the shared task, our system wins 8 out of 15 tracks and, in particular, performs best in the multilingual evaluation.

翻译：本文描述了我们为 SemEval-2023 任务 12“使用推特数据集的低资源非洲语言情感分析”开发的系统。情感分析是自然语言处理中研究最广泛的应用之一，然而，大部分前期工作仍然集中在少数高资源语言上。为低资源语言构建可靠的情感分析系统仍然具有挑战性，因为在该任务中训练数据数量有限。在这项工作中，我们建议利用语言自适应和任务自适应的预训练模型在非洲文本上进行研究，并研究使用非洲语言为中心的预训练语言模型的源语言选择上的转移学习。我们的主要发现是：(1) 采用适合目标语言和任务的小型但相关的语料库对预训练模型进行调整可以显著提高性能超过10个F1分数点。(2)在训练期间选择具有积极转移收益的源语言可以避免来自不相似语言的有害干扰，在多语言和跨语言设置中导致更好的结果。在共享任务中，我们的系统赢得了15个赛道中的8个，并且在多语言评估中表现最佳。

0

相关内容

情感分析

狭义的情感分析（sentiment analysis）是指利用计算机实现对文本数据的观点、情感、态度、情绪等的分析挖掘。广义的情感分析则包括对图像视频、语音、文本等多模态信息的情感计算。简单地讲，情感分析研究的目标是建立一个有效的分析方法、模型和系统，对输入信息中某个对象分析其持有的情感信息，例如观点倾向、态度、主观观点或喜怒哀乐等情绪表达。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

「知识增强预训练语言模型」最新研究综述

「知识增强预训练语言模型」最新研究综述

专知会员服务

62+阅读 · 2022年11月18日

【NAACL2022】自然语言处理的对比数据与学习

【NAACL2022】自然语言处理的对比数据与学习

专知会员服务

46+阅读 · 2022年7月10日

【EMNLP2020】自然语言分类任务的自监督元学习

专知会员服务

30+阅读 · 2020年9月18日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

EMNLP 2022 | ClidSum: 跨语言对话摘要

EMNLP 2022 | ClidSum: 跨语言对话摘要

PaperWeekly

3+阅读 · 2022年11月25日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

构建面向Web的、以实体为中心的知识库的关键技术研究

国家自然科学基金

7+阅读 · 2012年12月31日

维、哈、柯跨语言内容过滤关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

弱分类器的选择与集成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

汉语文本推理的资源建设和统计分析研究

国家自然科学基金

0+阅读 · 2011年12月31日

跨语言文本自动分类关键技术研究

国家自然科学基金

2+阅读 · 2008年12月31日

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Arxiv

0+阅读 · 2023年6月15日

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Arxiv

0+阅读 · 2023年6月12日

EriBERTa: A Bilingual Pre-Trained Language Model for Clinical Natural Language Processing

Arxiv

0+阅读 · 2023年6月12日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

17+阅读 · 2018年12月10日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Arxiv

15+阅读 · 2018年5月24日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

VIP会员

文章信息

相关主题

相关VIP内容

「知识增强预训练语言模型」最新研究综述

「知识增强预训练语言模型」最新研究综述

专知会员服务

62+阅读 · 2022年11月18日

【NAACL2022】自然语言处理的对比数据与学习

【NAACL2022】自然语言处理的对比数据与学习

专知会员服务

46+阅读 · 2022年7月10日

【EMNLP2020】自然语言分类任务的自监督元学习

专知会员服务

30+阅读 · 2020年9月18日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

【AAAI2020接受论文】Emu:使用语义专门化增强多语言句子嵌入，Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

专知会员服务

26+阅读 · 2019年11月11日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

EMNLP 2022 | ClidSum: 跨语言对话摘要

EMNLP 2022 | ClidSum: 跨语言对话摘要

PaperWeekly

3+阅读 · 2022年11月25日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

相关论文

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Arxiv

0+阅读 · 2023年6月15日

Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili

Arxiv

0+阅读 · 2023年6月12日

EriBERTa: A Bilingual Pre-Trained Language Model for Clinical Natural Language Processing

Arxiv

0+阅读 · 2023年6月12日

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Arxiv

22+阅读 · 2023年5月3日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Learning Embedding Adaptation for Few-Shot Learning

Learning Embedding Adaptation for Few-Shot Learning

Arxiv

17+阅读 · 2018年12月10日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

DSGAN: Generative Adversarial Training for Distant Supervision Relation Extraction

Arxiv

15+阅读 · 2018年5月24日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

相关基金

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于海量语料自然标注信息的汉语自然语块分析

国家自然科学基金

0+阅读 · 2013年12月31日

构建面向Web的、以实体为中心的知识库的关键技术研究

国家自然科学基金

7+阅读 · 2012年12月31日

维、哈、柯跨语言内容过滤关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

弱分类器的选择与集成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于半监督结构化学习的跨语言映射研究

国家自然科学基金

2+阅读 · 2011年12月31日

汉语文本推理的资源建设和统计分析研究

国家自然科学基金

0+阅读 · 2011年12月31日

跨语言文本自动分类关键技术研究

国家自然科学基金

2+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员