We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/umich.edu/semeval-2023-tweet-intimacy).
翻译:我们提议建立一个新的多语种信息技术分析数据集,涵盖英文、法文、西班牙文、意大利文、葡萄牙文、韩文、荷兰文、中文、印地文和阿拉伯文等10种语言的13 372个推特。我们为一组广受欢迎的多语种培训前语言模型进行了基准测试。数据集与SemEval 2023任务9:多语种Tweet亲密分析(https://sites.google.com/umich.edu/semeval-2023-tweet-Intimacty)一起发布。