蒙太奇印地-英文表情包情感分析数据集Memotion 3 (Memotion 3: Dataset on Sentiment and Emotion Analysis of Codemixed Hindi-English Memes)

Shreyash Mishra,S Suryavardan,Parth Patwa,Megha Chakraborty,Anku Rani,Aishwarya Reganti,Aman Chadha,Amitava Das,Amit Sheth,Manoj Chinnakotla,Asif Ekbal,Srijan Kumar

from arxiv, Defactify2 @AAAI

Memes are the new-age conveyance mechanism for humor on social media sites. Memes often include an image and some text. Memes can be used to promote disinformation or hatred, thus it is crucial to investigate in details. We introduce Memotion 3, a new dataset with 10,000 annotated memes. Unlike other prevalent datasets in the domain, including prior iterations of Memotion, Memotion 3 introduces Hindi-English Codemixed memes while prior works in the area were limited to only the English memes. We describe the Memotion task, the data collection and the dataset creation methodologies. We also provide a baseline for the task. The baseline code and dataset will be made available at https://github.com/Shreyashm16/Memotion-3.0

翻译：表情包是在社交媒体上传递幽默的新时代传递机制。表情包常常包含一个图像和一些文本。表情包可以用来宣传不实信息或仇恨，因此有必要进行详细调查。我们介绍了 Memotion 3，一个新的数据集，具有 10,000 个带注释的表情包。与该领域中其他普遍存在的数据集（包括之前的 Memotion 数据集）不同，Memotion 3 引入了印地 - 英文混合表情包，而先前的工作仅局限于英语表情包。我们描述了 Memotion 任务，数据收集和数据集创建方法。我们还为任务提供了基准。基准代码和数据集可在https://github.com/Shreyashm16/Memotion-3.0上获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【牛津大学-DeepMind 】上下文嵌入综述，A Survey on Contextual Embeddings

专知会员服务

42+阅读 · 2020年3月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日