使用谨慎数据集进行讽刺探测的微调 (Finetuning for Sarcasm Detection with a Pruned Dataset)

Sarcasm is a form of irony that involves saying or writing something that is opposite or opposite to what one really means, often in a humorous or mocking way. It is often used to mock or mock someone or something, or to be humorous or amusing. Sarcasm is usually conveyed through tone of voice, facial expressions, or other forms of nonverbal communication, but it can also be indicated by the use of certain words or phrases that are typically associated with irony or humor. Sarcasm detection is difficult because it relies on context and non-verbal cues. It can also be culturally specific, subjective and ambiguous. In this work, we fine-tune the RoBERTa based sarcasm detection model presented in Abaskohi et al. [2022] to get to within 0.02 F1 of the state-of-the-art (Hercog et al. [2022]) on the iSarcasm dataset (Oprea and Magdy [2019]). This performance is achieved by augmenting iSarcasm with a pruned version of the Self Annotated Reddit Corpus (SARC) (Khodak et al. [2017]). Our pruned version is 100 times smaller than the subset of SARC used to train the state-of-the-art model.

翻译：讽刺是一种具有讽刺意味的形式,它涉及说或写一些与真正含义相反或相反的东西,通常是幽默或嘲弄的方式,常常用来嘲弄或嘲笑某人或某物,或幽默或幽默。讽刺通常通过声音、面部表达或其他非口头交流方式的语调来传达,但也可以通过使用通常与讽刺或幽默相关的某些词或词句来表示。讽刺感的探测很困难,因为它依赖于上下文和非语言提示。它也可能是文化上具体、主观和模糊的。在这项工作中,我们微调了Abaskohi等人(2022年)所展示的RoBERTA基于讽刺物的探测模型,以达到目前艺术状态(Hercog等人([2022年])的0.02 F1范围内,但也可以通过在 iSarcasam数据集(Oprea和Magdy)上使用某些词汇来表示。这种表现是通过用Sarcasm(SARC)的精细版本来增加(SARcam)的。在SARdus(SARC) 2017号(Sharak-Sharna prent)中比我们的40年代使用了。