潜在多语表达(PIE)-英文:中语类Corpus (Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms) - 专知论文

会员服务 ·

0

类别 · 词义消歧 · 数据集 · NLP · 知识 (knowledge) ·

2022 年 4 月 23 日

Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms

翻译：潜在多语表达(PIE)-英文:中语类Corpus

Tosin P. Adewumi,Roshanak Vadoodi,Aparajita Tripathy,Konstantina Nikolaidou,Foteini Liwicki,Marcus Liwicki

from arxiv, Accepted at the International Conference on Language Resources and Evaluation (LREC) 2022

We present a fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English. The challenges with NLP systems with regards to tasks such as Machine Translation (MT), word sense disambiguation (WSD) and information retrieval make it imperative to have a labelled idioms dataset with classes such as it is in this work. To the best of the authors' knowledge, this is the first idioms corpus with classes of idioms beyond the literal and the general idioms classification. In particular, the following classes are labelled in the dataset: metaphor, simile, euphemism, parallelism, personification, oxymoron, paradox, hyperbole, irony and literal. We obtain an overall inter-annotator agreement (IAA) score, between two independent annotators, of 88.89%. Many past efforts have been limited in the corpus size and classes of samples but this dataset contains over 20,100 samples with almost 1,200 cases of idioms (with their meanings) from 10 classes (or senses). The corpus may also be extended by researchers to meet specific needs. The corpus has part of speech (PoS) tagging from the NLTK library. Classification experiments performed on the corpus to obtain a baseline and comparison among three common models, including the BERT model, give good results. We also make publicly available the corpus and the relevant codes for working with it for NLP tasks.

翻译：我们用英文为自然语言处理(NLP)提供了相当大的潜在单词表达(PIE)数据集。由于NLP系统在机器翻译(MT)、词感分辨(WSD)和信息检索等任务方面面临的挑战,我们必须有一个标有标签的单词表达(Idomm)数据集,该数据集的类别如在这项工作中的类别。据作者所知,这是第一组单词表达(PIE)数据集,其类别在字形和一般语分类之外。特别是,以下类别在数据集中标有标签:隐喻、硅、电子化、平行主义、个性化、氧素moron、自相矛盾、超音调、讽刺和字形。我们获得一个总体的双独立说明(IAA)协议(IAAA)分,其中88.89%是作者所知的。过去许多努力在物质模型和样本类别方面是有限的,但这一数据集包含近20 100个样本,其中含有近1 200个类型(含其含义的)样本,从10个类(或感官意义)类中标、平行、个个个人个个个个个类(或感官标)平行的比。该数据库还可以将一个用于三个普通数据库的样本的、一个具体数据库的、一个标定、一个标、一个标、一个标化的标、一个标定(我们的标定的标)的标的标的比。该数据库可以进行到一个具体的样本,用于用于用于用于用于三个的图书馆的图书馆的图书馆的图书馆的样本,用于一个特定的图书馆的样本,用于一个特定的检索。

0

相关内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

elavl1a在斑马鱼心脏发育中的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

IRF家族乙酰化修饰在心脏重构中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Crif1调控Nrf2-ARE信号通路促进BMSCs抗辐射损伤机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAF1在心肌梗死后心室重构中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

炎症微环境在高血压心脏纤维化发病机制中的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Folbp1在胚胎期多氯联苯暴露致子代心脏发育缺陷中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

斑马鱼心脏发育

国家自然科学基金

0+阅读 · 2009年12月31日

Ezrin和TRIP-1蛋白的相互作用机制、信号转导通路及其对肿瘤转移作用的研究

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

NAGphormer: Neighborhood Aggregation Graph Transformer for Node Classification in Large Graphs

Arxiv

0+阅读 · 2022年6月10日

Comparison Study of Inertial Sensor Signal Combination for Human Activity Recognition based on Convolutional Neural Networks

Comparison Study of Inertial Sensor Signal Combination for Human Activity Recognition based on Convolutional Neural Networks

Arxiv

0+阅读 · 2022年6月9日

Privacy-Aware Compression for Federated Data Analysis

Arxiv

0+阅读 · 2022年6月9日

The Open corpus of the Veps and Karelian languages: overview and applications

Arxiv

0+阅读 · 2022年6月8日

Quantifying the Effects of Working in VR for One Week

Arxiv

0+阅读 · 2022年6月8日

CitySpec: An Intelligent Assistant System for Requirement Specification in Smart Cities

Arxiv

0+阅读 · 2022年6月7日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

Arxiv

59+阅读 · 2020年1月20日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《科研智能：人工智能赋能工业仿真研究报告（2025年）》

具身智能中的世界模型：全面综述

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【博士论文】用于排序与扩散模型的安全、高效与鲁棒强化学习

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

NAGphormer: Neighborhood Aggregation Graph Transformer for Node Classification in Large Graphs

Arxiv

0+阅读 · 2022年6月10日

Comparison Study of Inertial Sensor Signal Combination for Human Activity Recognition based on Convolutional Neural Networks

Comparison Study of Inertial Sensor Signal Combination for Human Activity Recognition based on Convolutional Neural Networks

Arxiv

0+阅读 · 2022年6月9日

Privacy-Aware Compression for Federated Data Analysis

Arxiv

0+阅读 · 2022年6月9日

The Open corpus of the Veps and Karelian languages: overview and applications

Arxiv

0+阅读 · 2022年6月8日

Quantifying the Effects of Working in VR for One Week

Arxiv

0+阅读 · 2022年6月8日

CitySpec: An Intelligent Assistant System for Requirement Specification in Smart Cities

Arxiv

0+阅读 · 2022年6月7日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

Arxiv

59+阅读 · 2020年1月20日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

相关基金

elavl1a在斑马鱼心脏发育中的调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

IRF家族乙酰化修饰在心脏重构中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Crif1调控Nrf2-ARE信号通路促进BMSCs抗辐射损伤机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRAF1在心肌梗死后心室重构中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

炎症微环境在高血压心脏纤维化发病机制中的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Folbp1在胚胎期多氯联苯暴露致子代心脏发育缺陷中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

斑马鱼心脏发育

国家自然科学基金

0+阅读 · 2009年12月31日

Ezrin和TRIP-1蛋白的相互作用机制、信号转导通路及其对肿瘤转移作用的研究

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员