The presence of sarcasm in conversational systems and social media like chatbots, Facebook, Twitter, etc. poses several challenges for downstream NLP tasks. This is attributed to the fact that the intended meaning of a sarcastic text is contrary to what is expressed. Further, the use of code-mix language to express sarcasm is increasing day by day. Current NLP techniques for code-mix data have limited success due to the use of different lexicon, syntax, and scarcity of labeled corpora. To solve the joint problem of code-mixing and sarcasm detection, we propose the idea of capturing incongruity through sub-word level embeddings learned via fastText. Empirical results shows that our proposed model achieves F1-score on code-mix Hinglish dataset comparable to pretrained multilingual models while training 10x faster and using a lower memory footprint
翻译:在谈话系统和社交媒体(如聊天机器人、Facebook、推特等)中存在的讽刺言论给下游国家语言平台的任务带来了若干挑战。 这是因为讽刺文字的预定含义与表达的内容相反。 此外,使用代码混合语言表达讽刺讽刺言论的现象日复一日增加。 当前的代码混合数据NLP技术由于使用不同的词汇、 语法、 标签化公司稀缺, 其成功有限。 为了解决代码混合和讽刺检测的共同问题, 我们提出通过通过快速图文学习的次字层嵌入来捕捉融合的想法。 经验性结果显示,我们提议的模型在代码混合 Hingish数据集上取得了F1核心, 与事先训练过的多语种模型相比, 培训速度更快, 使用较低的记忆足迹 。