Argument Mining is defined as the task of automatically identifying and extracting argumentative components (e.g., premises, claims, etc.) and detecting the existing relations among them (i.e., support, attack, rephrase, no relation). One of the main issues when approaching this problem is the lack of data, and the size of the publicly available corpora. In this work, we use the recently annotated US2016 debate corpus. US2016 is the largest existing argument annotated corpus, which allows exploring the benefits of the most recent advances in Natural Language Processing in a complex domain like Argument (relation) Mining. We present an exhaustive analysis of the behavior of transformer-based models (i.e., BERT, XLNET, RoBERTa, DistilBERT and ALBERT) when predicting argument relations. Finally, we evaluate the models in five different domains, with the objective of finding the less domain dependent model. We obtain a macro F1-score of 0.70 with the US2016 evaluation corpus, and a macro F1-score of 0.61 with the Moral Maze cross-domain corpus.
翻译:争议采矿的定义是自动识别和提取争议性组成部分(如房地、索赔等)和发现它们之间现有关系(即支持、攻击、重新措辞、无关系)的任务。在处理这一问题时,主要问题之一是缺乏数据以及公开提供的公司规模。在这项工作中,我们使用最近附加说明的US2016辩论文体。 US2016是现有最大的附加说明的论据,它允许探索在像Argument(关系)采矿这样的复杂领域,自然语言处理的最新进展的好处。我们详尽分析了基于变异器模型(即BERT、XLNET、ROBERTA、DistillBERT和ALBERT)在预测争议关系时的行为。最后,我们评估了五个不同领域的模型,目的是寻找不太依赖域域的模型。我们从US2016评估文集获得了0.70的宏观F1核心,从Moral Maz 交叉体中获得了0.61的宏观F1核心。