Fallacies are used as seemingly valid arguments to support a position and persuade the audience about its validity. Recognizing fallacies is an intrinsically difficult task both for humans and machines. Moreover, a big challenge for computational models lies in the fact that fallacies are formulated differently across the datasets with differences in the input format (e.g., question-answer pair, sentence with fallacy fragment), genre (e.g., social media, dialogue, news), as well as types and number of fallacies (from 5 to 18 types per dataset). To move towards solving the fallacy recognition task, we approach these differences across datasets as multiple tasks and show how instruction-based prompting in a multitask setup based on the T5 model improves the results against approaches built for a specific dataset such as T5, BERT or GPT-3. We show the ability of this multitask prompting approach to recognize 28 unique fallacies across domains and genres and study the effect of model size and prompt choice by analyzing the per-class (i.e., fallacy type) results. Finally, we analyze the effect of annotation quality on model performance, and the feasibility of complementing this approach with external knowledge.
翻译:误差被作为一种看似有效的论据,用来支持一个位置,说服观众相信其有效性。承认误差对于人类和机器来说都是一项固有的困难任务。此外,计算模型的一大挑战在于:在不同的数据集中,误差的形成方式不同,输入格式不同(例如问答配对、带有谬误的句子)、类型(例如社交媒体、对话、新闻),以及误差的类型和数量(从每套数据集5至18种),以及误差的类型和数量(从每套数据集5至18种),为了解决误差识别任务,我们将这些差异作为多重任务对待,并表明在基于T5模型的多任务设置中,基于指令的提示如何在为特定数据集(如T5、BERT或GPT-3)所建立的方法上改进结果。我们展示了这种多任务促进方法的能力,以识别28种不同领域和基因的独特误差,并通过分析每类(i.、误差类型)的结果来研究模型规模和迅速选择的影响。最后,我们用外部质量的效果分析这一方法对每个模型(i.)的结果和可行性和外部效果进行补充。