Since their inception, transformer-based language models have led to impressive performance gains across multiple natural language processing tasks. For Arabic, the current state-of-the-art results on most datasets are achieved by the AraBERT language model. Notwithstanding these recent advancements, sarcasm and sentiment detection persist to be challenging tasks in Arabic, given the language's rich morphology, linguistic disparity and dialectal variations. This paper proffers team SPPU-AASM's submission for the WANLP ArSarcasm shared-task 2021, which centers around the sarcasm and sentiment polarity detection of Arabic tweets. The study proposes a hybrid model, combining sentence representations from AraBERT with static word vectors trained on Arabic social media corpora. The proposed system achieves a F1-sarcastic score of 0.62 and a F-PN score of 0.715 for the sarcasm and sentiment detection tasks, respectively. Simulation results show that the proposed system outperforms multiple existing approaches for both the tasks, suggesting that the amalgamation of context-free and context-dependent text representations can help capture complementary facets of word meaning in Arabic. The system ranked second and tenth in the respective sub-tasks of sarcasm detection and sentiment identification.
翻译:自建立以来,基于变压器的语文模型在多种自然语言处理任务中取得了令人印象深刻的业绩成果。对于阿拉伯语而言,大多数数据集目前最先进的成果是通过AraBERT语言模型取得的。尽管最近取得了这些进步,但讽刺和情绪检测在阿拉伯语方面仍然是具有挑战性的任务,因为阿拉伯语的形态、语言差异和方言差异丰富。本文是SPPU-ASS团队为WANLP ArSarcam 共享任务提交的SPPU-AMS提交文件,该任务围绕阿拉伯推特的讽刺和情绪极极度探测。该研究提出了一个混合模型,将AraBERT的语句表述与在阿拉伯社会媒体 Corposa 上受过训练的静态文字矢量相结合。拟议系统分别实现了0.62的F1-Sarcast分和0.715的F-PN分,用于沙卡和情绪检测任务。模拟结果显示,拟议的系统在任务中超越了多种现有方法,表明背景和情绪对阿拉伯推特的极极地探测方法的结合和背景对程度进行合并,这可以帮助在阿拉伯第十级文本演示中获取补充性文字标识。